0
0
Compiler-designHow-ToBeginner · 4 min read

How to Calculate First Set in Compiler Design

To calculate the First set of a grammar symbol, find all terminals that can appear at the start of strings derived from that symbol. For a terminal, its First set is itself; for a non-terminal, recursively include First sets of its productions, adding ε if the production can derive an empty string.
📐

Syntax

The First set is defined for grammar symbols (terminals and non-terminals) as follows:

  • First(X): The set of terminals that begin strings derived from symbol X.
  • If X is a terminal, First(X) = {X}.
  • If X is a non-terminal, First(X) includes terminals from the start of its productions.
  • If a production can derive ε (empty string), then ε is included in First(X).
plaintext
First(X) = {
  if X is terminal: {X}
  else if X is non-terminal:
    for each production X -> Y1 Y2 ... Yn:
      add First(Y1) excluding ε to First(X)
      if First(Y1) contains ε, add First(Y2) excluding ε, and so on
      if all Yi contain ε, add ε to First(X)
}
💻

Example

This example shows how to calculate First sets for a simple grammar:

Grammar:
S -> A B
A -> a | ε
B -> b

Calculate First(S), First(A), and First(B).

python
grammar = {
  'S': [['A', 'B']],
  'A': [['a'], ['ε']],
  'B': [['b']]
}

first = {}

# Initialize First sets for terminals
first['a'] = {'a'}
first['b'] = {'b'}
first['ε'] = {'ε'}

# Function to compute First sets

def compute_first(X):
    if X in first:
        return first[X]
    first[X] = set()
    for production in grammar.get(X, []):
        for symbol in production:
            symbol_first = compute_first(symbol)
            first[X].update(symbol_first - {'ε'})
            if 'ε' not in symbol_first:
                break
        else:
            first[X].add('ε')
    return first[X]

# Compute First sets
compute_first('S')
compute_first('A')
compute_first('B')

# Print results
for symbol in ['S', 'A', 'B']:
    print(f"First({symbol}) = {{ {', '.join(sorted(first[symbol]))} }}")
Output
First(S) = { a, b } First(A) = { a, ε } First(B) = { b }
⚠️

Common Pitfalls

Common mistakes when calculating First sets include:

  • Not including ε when a production can derive an empty string.
  • Failing to check all symbols in a production when earlier symbols can produce ε.
  • Confusing First sets with Follow sets.
  • Not handling terminals and non-terminals differently.
plaintext
Incorrect approach:
First(A) = { a }  # Missing ε even though A -> ε

Correct approach:
First(A) = { a, ε }  # Includes ε because A can be empty
📊

Quick Reference

Summary tips for calculating First sets:

  • Start with terminals: First(terminal) = {terminal}.
  • For non-terminals, add First of the first symbol in each production.
  • If the first symbol can produce ε, include First of the next symbol.
  • Include ε in First if all symbols in a production can produce ε.

Key Takeaways

First sets contain terminals that can appear at the start of strings derived from a symbol.
Include ε in First sets if a production can derive the empty string.
Check all symbols in a production if earlier ones can produce ε.
Terminals have First sets containing only themselves.
Carefully distinguish First sets from Follow sets.