What is bottom up parsing

Compiler-designConceptBeginner · 4 min read

Bottom Up Parsing: Definition, How It Works, and Examples

Bottom up parsing is a method used by compilers to analyze a string of code by starting from the input symbols and gradually building up to the start symbol of the grammar. It works by reducing the input step-by-step until the entire input is recognized as a valid sentence using shift and reduce actions.

⚙️

How It Works

Bottom up parsing works like solving a puzzle from the small pieces to the big picture. Imagine you have a sentence made of words, and you want to understand its structure. Instead of starting from the sentence's main idea, bottom up parsing starts from the words (the smallest parts) and tries to combine them into bigger parts until it forms the whole sentence.

In technical terms, the parser reads the input from left to right and uses a stack to keep track of symbols. It performs two main actions: shift, which means reading the next input symbol and pushing it onto the stack, and reduce, which means replacing a group of symbols on the stack with a higher-level symbol according to grammar rules. This process continues until the stack contains only the start symbol, meaning the input is successfully parsed.

💻

Example

This example shows a simple bottom up parser simulation for the expression id + id using a grammar with rules for identifiers and addition.

python

grammar = {
    'E': ['E + E', 'id']
}

input_tokens = ['id', '+', 'id']
stack = []

# Simple function to check if top of stack matches a grammar rule's right side

def can_reduce(stack):
    for lhs, rhs_list in grammar.items():
        for rhs in rhs_list:
            rhs_tokens = rhs.split()
            if len(stack) >= len(rhs_tokens) and stack[-len(rhs_tokens):] == rhs_tokens:
                return lhs, len(rhs_tokens)
    return None, 0

print('Parsing steps:')
for token in input_tokens:
    stack.append(token)  # shift
    print(f'Shift: stack = {stack}')
    while True:
        lhs, length = can_reduce(stack)
        if lhs:
            # reduce
            for _ in range(length):
                stack.pop()
            stack.append(lhs)
            print(f'Reduce: stack = {stack}')
        else:
            break

if stack == ['E']:
    print('Input successfully parsed as E')
else:
    print('Parsing failed')

Output

Parsing steps: Shift: stack = ['id'] Reduce: stack = ['E'] Shift: stack = ['E', '+'] Shift: stack = ['E', '+', 'id'] Reduce: stack = ['E', '+', 'E'] Reduce: stack = ['E'] Input successfully parsed as E

🎯

When to Use

Bottom up parsing is used when you want a parser that can handle a wide range of programming languages and complex grammar rules. It is especially useful in compiler design because it can detect syntax errors as soon as possible and can parse languages that have ambiguous or complicated structures.

Real-world use cases include compilers for languages like C, C++, and Java, where bottom up parsers such as LR parsers are common. It is also used in tools that analyze or transform code, like interpreters and static analyzers.

✅

Key Points

Bottom up parsing builds the parse tree from the leaves (input symbols) up to the root (start symbol).
It uses a stack and two main actions: shift (read input) and reduce (apply grammar rules).
It can handle a larger class of grammars than top down parsing.
Common bottom up parsers include LR, SLR, and LALR parsers.

✅

Key Takeaways

Bottom up parsing starts from input symbols and builds up to the start symbol using shift and reduce actions.

It is powerful for parsing complex and ambiguous grammars in programming languages.

Bottom up parsers use a stack to manage symbols and apply grammar rules in reverse.

This parsing method is widely used in real-world compilers and language tools.

Understanding bottom up parsing helps in grasping how compilers analyze code syntax.