Compiler-designConceptBeginner · 4 min read

What is First and Follow Set in Compiler Design

In compiler design, the first set of a grammar symbol is the set of terminals that begin the strings derivable from that symbol. The follow set is the set of terminals that can appear immediately to the right of that symbol in some sentential form. These sets help parsers decide which grammar rule to apply next.

⚙️

How It Works

Imagine you are reading a recipe and want to know what ingredients might come first or what steps follow after a certain instruction. Similarly, in compiler design, the first set tells us what symbols can appear at the start of a part of the code, while the follow set tells us what symbols can come right after it.

The first set helps the parser predict which rule to use by looking at the next input symbol. The follow set helps decide when a rule has finished and what can come next. Together, they guide the parser to read the code correctly without confusion.

💻

Example

This example shows how to compute first and follow sets for a simple grammar.

python

grammar = {
    'E': ['T E\''],
    'E\'': ['+ T E\'', 'ε'],
    'T': ['F T\''],
    'T\'': ['* F T\'', 'ε'],
    'F': ['( E )', 'id']
}

first = {}
follow = {}

# Initialize first and follow sets
for non_terminal in grammar:
    first[non_terminal] = set()
    follow[non_terminal] = set()

# Function to compute first set

def compute_first(symbol):
    if symbol not in grammar:  # Terminal
        return {symbol}
    result = set()
    for production in grammar[symbol]:
        if production == 'ε':
            result.add('ε')
        else:
            for char in production.split():
                first_of_char = compute_first(char)
                result.update(first_of_char - {'ε'})
                if 'ε' not in first_of_char:
                    break
            else:
                result.add('ε')
    return result

# Compute first sets
for non_terminal in grammar:
    first[non_terminal] = compute_first(non_terminal)

# Initialize follow of start symbol
follow['E'].add('$')  # $ is end marker

# Function to compute follow sets
changed = True
while changed:
    changed = False
    for head in grammar:
        for production in grammar[head]:
            symbols = production.split()
            for i, symbol in enumerate(symbols):
                if symbol in grammar:
                    follow_before = follow[symbol].copy()
                    # Look at symbols after current
                    rest = symbols[i+1:]
                    if rest:
                        first_rest = set()
                        for r in rest:
                            first_r = compute_first(r)
                            first_rest.update(first_r - {'ε'})
                            if 'ε' not in first_r:
                                break
                        else:
                            first_rest.add('ε')
                        follow[symbol].update(first_rest - {'ε'})
                        if 'ε' in first_rest:
                            follow[symbol].update(follow[head])
                    else:
                        follow[symbol].update(follow[head])
                    if follow_before != follow[symbol]:
                        changed = True

print('First sets:')
for k, v in first.items():
    print(f'{k}: {v}')

print('\nFollow sets:')
for k, v in follow.items():
    print(f'{k}: {v}')

Output

First sets: E: {'(', 'id'} E': {'+', 'ε'} T: {'(', 'id'} T': {'*', 'ε'} F: {'(', 'id'} Follow sets: E: {'$', ')'} E': {'$', ')'} T: {'+', '$', ')'} T': {'+', '$', ')'} F: {'*', '+', '$', ')'}

🎯

When to Use

First and follow sets are essential in building predictive parsers, which read code from left to right and decide which grammar rule to apply without backtracking. They help detect syntax errors early and make parsing efficient.

In real-world compilers, these sets are used in LL(1) parsers to create parsing tables that guide the parsing process. They are also useful in designing new programming languages or tools that analyze code structure.

✅

Key Points

First set shows possible starting terminals of a grammar symbol.
Follow set shows terminals that can appear immediately after a grammar symbol.
They help parsers decide which rule to apply next.
Used mainly in predictive parsing and syntax analysis.
Computing them involves careful handling of empty (epsilon) productions.

✅

Key Takeaways

First sets identify which terminals can start strings from a grammar symbol.

Follow sets identify which terminals can come immediately after a grammar symbol.

They are crucial for building efficient predictive parsers without backtracking.

Used to create parsing tables in LL(1) parsers for syntax analysis.

Handling epsilon (empty) productions is important when computing these sets.