Which of the following best describes the relationship between tokens, patterns, and lexemes in lexical analysis?
Think about what each term represents in the scanning process.
A token is a type or category (like 'identifier' or 'number'), defined by a pattern (a rule or regular expression). A lexeme is the actual substring from the source code that matches the pattern.
Given the source code snippet: int count = 42;, which of the following is NOT a lexeme?
Lexemes are actual substrings from the source code.
"number" is not present in the source code snippet, so it cannot be a lexeme. The others are exact substrings.
Consider the code snippet: sum = a + b * 10;. Which sequence of tokens correctly represents this snippet?
Focus on the categories of tokens, not the exact characters.
The snippet contains identifiers (sum, a, b), operators (assignment '=', plus '+', multiply '*'), a number (10), and a semicolon. Option A correctly names these token categories. Option A incorrectly labels 'sum' as a keyword. Option A uses character names instead of token categories. Option A misses the semicolon token.
Which of the following patterns correctly describes a typical identifier in many programming languages?
Think about common rules for variable names.
Identifiers usually start with a letter or underscore and can contain letters, digits, or underscores afterward. Starting with a digit or special character is generally invalid.
Suppose a lexical analyzer has two patterns: one for keywords (e.g., "if", "while") and one for identifiers (any letter followed by letters or digits). If the input is "if", what will the lexical analyzer output and why?
Consider how lexical analyzers resolve conflicts between overlapping patterns.
Lexical analyzers typically assign higher priority to keywords over identifiers. Even though "if" matches the identifier pattern, it is recognized as a keyword token.