0
0
Compiler Designknowledge~10 mins

Implementing a lexical analyzer in Compiler Design - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Implementing a lexical analyzer
Start Input Source Code
Read Next Character
Is Character Whitespace?
YesIgnore and Read Next
No
Is Character Part of Token?
NoError or Skip
Yes
Build Token String
Is Token Complete?
NoRead Next Character
Yes
Classify Token Type
Output Token
More Characters?
YesRead Next Character
No
End
The lexical analyzer reads characters one by one, groups them into tokens, classifies each token, and outputs tokens until the input ends.
Execution Sample
Compiler Design
input = "int x = 10;"
pos = 0
while pos < len(input):
  ch = input[pos]
  if ch.isalpha():
    token = ch
    pos += 1
This code reads characters from input, detects if a character is a letter, and starts building a token.
Analysis Table
StepposchConditionActionToken BuiltOutput Token
10ich.isalpha()=trueStart token with 'i'i
21nch.isalpha()=trueAdd 'n' to tokenin
32tch.isalpha()=trueAdd 't' to tokenint
43 ch.isalpha()=falseToken completeintint (keyword)
54xch.isalpha()=trueStart token with 'x'x
65 ch.isalpha()=falseToken completexx (identifier)
76=ch.isalpha()=falseSingle char token==
87 WhitespaceIgnore
981ch.isdigit()=trueStart token with '1'1
1090ch.isdigit()=trueAdd '0' to token10
1110;ch.isdigit()=falseToken complete1010 (number)
1211;ch.isalpha()=falseSingle char token;;
1312pos >= len(input)End of input
💡 pos reaches input length 12, no more characters to read
State Tracker
VariableStartAfter 1After 2After 3After 4After 5After 6After 7After 8After 9After 10After 11Final
pos0123456789101112
chint x = 10;
tokeniinintintxx=11010
Key Insights - 3 Insights
Why does the token 'int' complete when a space character is read?
Because the space is not part of an identifier, the lexical analyzer recognizes the token is complete at step 4 in the execution_table.
How does the analyzer handle single character tokens like '=' and ';'?
At steps 7 and 12, the analyzer treats these characters as complete tokens immediately since they are operators or punctuation.
Why is whitespace ignored and not output as a token?
Whitespace is used only to separate tokens and is skipped as shown at steps 4, 6, 8, and 13 in the execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3. What is the token built so far?
Ai
Bin
Cint
Dn
💡 Hint
Check the 'Token Built' column at step 3 in the execution_table.
At which step does the lexical analyzer output the token 'x'?
AStep 5
BStep 6
CStep 7
DStep 4
💡 Hint
Look for 'Output Token' column showing 'x (identifier)' in the execution_table.
If the input had no spaces, how would the token completion change?
ATokens complete only on non-alphanumeric characters
BTokens would never complete
CTokens complete on whitespace only
DTokens complete after fixed length
💡 Hint
Refer to the 'Condition' and 'Action' columns in the execution_table to see when tokens complete.
Concept Snapshot
Lexical Analyzer Basics:
- Reads input character by character
- Skips whitespace
- Groups characters into tokens
- Recognizes token types (keywords, identifiers, numbers, symbols)
- Outputs tokens sequentially
- Stops at end of input
Full Transcript
A lexical analyzer reads source code one character at a time. It skips spaces and groups letters or digits into tokens like keywords, identifiers, or numbers. When it encounters a character that can't be part of the current token, it finishes that token and outputs it. Single characters like '=' or ';' are tokens by themselves. This process repeats until all input is read. The execution table shows each step with the current character, token being built, and when tokens are output. Variables like position and token string change as the analyzer moves through the input.