What is First and Follow Set in Compiler Design
first set of a grammar symbol is the set of terminals that begin the strings derivable from that symbol. The follow set is the set of terminals that can appear immediately to the right of that symbol in some sentential form. These sets help parsers decide which grammar rule to apply next.How It Works
Imagine you are reading a recipe and want to know what ingredients might come first or what steps follow after a certain instruction. Similarly, in compiler design, the first set tells us what symbols can appear at the start of a part of the code, while the follow set tells us what symbols can come right after it.
The first set helps the parser predict which rule to use by looking at the next input symbol. The follow set helps decide when a rule has finished and what can come next. Together, they guide the parser to read the code correctly without confusion.
Example
This example shows how to compute first and follow sets for a simple grammar.
grammar = {
'E': ['T E\''],
'E\'': ['+ T E\'', 'ε'],
'T': ['F T\''],
'T\'': ['* F T\'', 'ε'],
'F': ['( E )', 'id']
}
first = {}
follow = {}
# Initialize first and follow sets
for non_terminal in grammar:
first[non_terminal] = set()
follow[non_terminal] = set()
# Function to compute first set
def compute_first(symbol):
if symbol not in grammar: # Terminal
return {symbol}
result = set()
for production in grammar[symbol]:
if production == 'ε':
result.add('ε')
else:
for char in production.split():
first_of_char = compute_first(char)
result.update(first_of_char - {'ε'})
if 'ε' not in first_of_char:
break
else:
result.add('ε')
return result
# Compute first sets
for non_terminal in grammar:
first[non_terminal] = compute_first(non_terminal)
# Initialize follow of start symbol
follow['E'].add('$') # $ is end marker
# Function to compute follow sets
changed = True
while changed:
changed = False
for head in grammar:
for production in grammar[head]:
symbols = production.split()
for i, symbol in enumerate(symbols):
if symbol in grammar:
follow_before = follow[symbol].copy()
# Look at symbols after current
rest = symbols[i+1:]
if rest:
first_rest = set()
for r in rest:
first_r = compute_first(r)
first_rest.update(first_r - {'ε'})
if 'ε' not in first_r:
break
else:
first_rest.add('ε')
follow[symbol].update(first_rest - {'ε'})
if 'ε' in first_rest:
follow[symbol].update(follow[head])
else:
follow[symbol].update(follow[head])
if follow_before != follow[symbol]:
changed = True
print('First sets:')
for k, v in first.items():
print(f'{k}: {v}')
print('\nFollow sets:')
for k, v in follow.items():
print(f'{k}: {v}')When to Use
First and follow sets are essential in building predictive parsers, which read code from left to right and decide which grammar rule to apply without backtracking. They help detect syntax errors early and make parsing efficient.
In real-world compilers, these sets are used in LL(1) parsers to create parsing tables that guide the parsing process. They are also useful in designing new programming languages or tools that analyze code structure.
Key Points
- First set shows possible starting terminals of a grammar symbol.
- Follow set shows terminals that can appear immediately after a grammar symbol.
- They help parsers decide which rule to apply next.
- Used mainly in predictive parsing and syntax analysis.
- Computing them involves careful handling of empty (epsilon) productions.