Which of the following best describes the primary purpose of using regular expressions in defining token patterns during lexical analysis?
Think about how lexical analyzers identify meaningful units in code.
Regular expressions define patterns that describe valid tokens, such as identifiers, keywords, or numbers, allowing the lexical analyzer to recognize these tokens in the source code.
Which regular expression correctly matches a decimal integer literal consisting of one or more digits?
Consider that the integer must have at least one digit.
The pattern "[0-9]+" matches one or more digits, which correctly represents decimal integer literals. Other options either match letters or allow zero digits.
Given the regular expression "a(b|c)?d", which of the following strings will be matched by this pattern?
Analyze the pattern step-by-step: it starts with 'a', followed by zero or one 'b' or 'c', and ends with 'd'.
The pattern "a(b|c)?d" matches strings starting with 'a', optionally followed by exactly one 'b' or 'c', and ending with 'd'. "ad" matches because the optional part is absent. "abcd" ('bc'), "abbd" ('bb'), and "acbd" ('cb') have two characters between 'a' and 'd', which do not fit the pattern.
Which regular expression correctly matches identifiers that start with a letter and are followed by any combination of letters and digits?
Remember that identifiers cannot start with digits.
Option A requires the first character to be a letter, followed by zero or more letters or digits, which matches typical identifier rules. Option A starts with a digit, which is invalid. Option A allows starting with a digit. Option A requires ending with digits and allows zero letters at start, which is invalid.
Consider two token patterns defined by regular expressions: ID = "[a-zA-Z][a-zA-Z0-9]*" and KEYWORD = "if|int|for". When scanning the input string "int", which token should the lexer produce and why?
Think about how lexers resolve conflicts when multiple patterns match the same input.
Although "int" matches the identifier pattern, lexers typically assign higher priority to keywords to ensure reserved words are recognized correctly. Therefore, "int" is tokenized as a KEYWORD.