This project provides an automatic lexical analyzer generator that processes lists of tokens defined by regular expressions. It is designed to facilitate the parsing and tokenization of strings according to user-defined patterns, making it a versatile tool for compiler construction, data parsing, and other applications requiring lexical analysis.
- Token Definition Parsing: Parse a list of token definitions provided in a specific format, including token names and their corresponding regular expressions.
- Input String Lexical Analysis: Perform lexical analysis on a given input string, breaking it down into a sequence of token-lexeme pairs based on the provided token definitions.
- Error Handling: Identify and report various types of errors, including syntax errors in the input, duplicate token names, and regular expressions that generate the empty string.
The input to the program consists of two parts:
- A list of token definitions, each comprising a token name and a token description (regular expression), separated by commas and terminated with a hash (
#) symbol. - An input string composed of letters, digits, and space characters.
Example:
token1 reg_exp1, token2 reg_exp2, ... tokenN reg_expN #
The program outputs one of the following based on the input:
- A sequence of tokens and their corresponding lexemes if the input is correctly formatted and matches the token definitions.
- An error message if there are syntax errors, duplicate token names, or if a token's regular expression can generate the empty string.
Compile the program using GCC with C++11 support:
g++ -std=c++11 your_program.cpp -o lexerRun the program and redirect input from a file:
./lexer < input_file.txt