How to Calculate First Set in Compiler Design
To calculate the
First set of a grammar symbol, find all terminals that can appear at the start of strings derived from that symbol. For a terminal, its First set is itself; for a non-terminal, recursively include First sets of its productions, adding ε if the production can derive an empty string.Syntax
The First set is defined for grammar symbols (terminals and non-terminals) as follows:
- First(X): The set of terminals that begin strings derived from symbol
X. - If
Xis a terminal,First(X) = {X}. - If
Xis a non-terminal,First(X)includes terminals from the start of its productions. - If a production can derive
ε(empty string), thenεis included inFirst(X).
plaintext
First(X) = {
if X is terminal: {X}
else if X is non-terminal:
for each production X -> Y1 Y2 ... Yn:
add First(Y1) excluding ε to First(X)
if First(Y1) contains ε, add First(Y2) excluding ε, and so on
if all Yi contain ε, add ε to First(X)
}Example
This example shows how to calculate First sets for a simple grammar:
Grammar: S -> A B A -> a | ε B -> b
Calculate First(S), First(A), and First(B).
python
grammar = {
'S': [['A', 'B']],
'A': [['a'], ['ε']],
'B': [['b']]
}
first = {}
# Initialize First sets for terminals
first['a'] = {'a'}
first['b'] = {'b'}
first['ε'] = {'ε'}
# Function to compute First sets
def compute_first(X):
if X in first:
return first[X]
first[X] = set()
for production in grammar.get(X, []):
for symbol in production:
symbol_first = compute_first(symbol)
first[X].update(symbol_first - {'ε'})
if 'ε' not in symbol_first:
break
else:
first[X].add('ε')
return first[X]
# Compute First sets
compute_first('S')
compute_first('A')
compute_first('B')
# Print results
for symbol in ['S', 'A', 'B']:
print(f"First({symbol}) = {{ {', '.join(sorted(first[symbol]))} }}")Output
First(S) = { a, b }
First(A) = { a, ε }
First(B) = { b }
Common Pitfalls
Common mistakes when calculating First sets include:
- Not including
εwhen a production can derive an empty string. - Failing to check all symbols in a production when earlier symbols can produce
ε. - Confusing
Firstsets withFollowsets. - Not handling terminals and non-terminals differently.
plaintext
Incorrect approach:
First(A) = { a } # Missing ε even though A -> ε
Correct approach:
First(A) = { a, ε } # Includes ε because A can be emptyQuick Reference
Summary tips for calculating First sets:
- Start with terminals:
First(terminal) = {terminal}. - For non-terminals, add
Firstof the first symbol in each production. - If the first symbol can produce
ε, includeFirstof the next symbol. - Include
εinFirstif all symbols in a production can produceε.
Key Takeaways
First sets contain terminals that can appear at the start of strings derived from a symbol.
Include ε in First sets if a production can derive the empty string.
Check all symbols in a production if earlier ones can produce ε.
Terminals have First sets containing only themselves.
Carefully distinguish First sets from Follow sets.