This project performs lexical and syntactic analysis for a source input file using:
- Regular Expressions → Syntax Tree → AFD
- Lexer to tokenize source code
- SLR Parser to check syntactic validity via a context-free grammar (CFG)
Make sure you're inside the \T2-Formais\src directory and run:
python main.pyPython 3.8+ is recommended.
The behavior of the program is controlled by the variable ONLY_PARSER_MODE located in main.py:
ONLY_PARSER_MODE = False # Full lexer + parser pipeline
ONLY_PARSER_MODE = True # Only run the SLR parser from token_list_output.txt
Runs from scratch: converts regex → builds AFDs → lexes source text → builds parsing table → parses.
Skips lexical analysis and uses the pre-existing file:
./output/token_list_output.txt
These files must exist before running the program:
-
./input/example_input_RE.txt – contains regular expression definitions, one per line (e.g., ID: (a|b)*abb#)
-
./input/example_test_input.txt – the source text to be analyzed by the lexer
-
./input/example_input_grammar.txt - The context-free grammar (CFG) in the format, one production per line (e.g., S ::= A B)
All paths are hardcoded in main.py via global variables:
INPUT_RE_FILE = "../example_input_RE.txt"
INPUT_USER_FILE = "../example_test_input.txt"
INPUT_GRAMMAR_FILE = "./input/example_input_grammar.txt"
The following files are generated automatically:
-
./output/afd_output_{i}.txt – AFD representation for the i-th regular expression
-
./output/token_list_output.txt – Token list resulting from lexical analysis of the input source
-
./output/first_output.txt – FIRST sets for all non-terminals.
-
./output/follow_output.txt – FOLLOW sets for all non-terminals.
-
./output/lr0_states.txt – LR(0) items (states) generated by the canonical collection algorithm.
-
./output/lr0_transitions.txt – State transitions for the LR(0) items.
-
./output/slr_action_table.txt – ACTION table for the SLR parser.
-
./output/slr_goto_table.txt – GOTO table for the SLR parser.
Global output file variable in main.py:
OUTPUT_TOKEN_LIST_FILE = "../token_list_output.txt"
-
Parse Regular Expressions Converts each line of example_input_RE.txt into a syntax tree.
-
Build AFDs and Merge Them Each regex → AFD → merged into a single AFD with ε-transitions.
-
Run Lexer Tokenizes example_test_input.txt into token_list_output.txt.
-
Load Grammar CFG is read from example_input_grammar.txt.
-
Compute FIRST and FOLLOW Sets Saves to first_output.txt and follow_output.txt.
-
Build LR(0) Item Sets Closure and goto operations → lr0_states.txt, lr0_transitions.txt.
-
Build SLR Table ACTION and GOTO tables built → slr_action_table.txt, slr_goto_table.txt.
-
SLR Parsing Parses the token list and prints:
-
Sentence Accepted!
-
Sentence Rejected!
-
Regular Expressions (example_input_RE.txt)
for: for
if: if
else: else
id: [a-zA-Z][a-zA-Z0-9]*
Source Text (example_test_input.txt)
for
x
if
y
else
z
Grammar (example_input_grammar.txt)
S ::= for E
S ::= if E else E
S ::= id
E ::= id
C:\Users\samsung\Desktop\Formais\T2-Formais\src>python main.py
Starting regular expression to AFD conversion...
Parsing regEx 1: for: for
Parsing regEx 2: if: if
Parsing regEx 3: else: else
Parsing regEx 4: id: [a-zA-Z][a-zA-Z0-9]*
#1. Tokenize and create postfix format for regular expression
#2. Build syntax tree
#3. Computing nullable, firstpos, lastpos, and followpos
#4. Build AFD
File saved: ./output/afd_output_0.txt
#1. Tokenize and create postfix format for regular expression
#2. Build syntax tree
#3. Computing nullable, firstpos, lastpos, and followpos
#4. Build AFD
File saved: ./output/afd_output_1.txt
#1. Tokenize and create postfix format for regular expression
#2. Build syntax tree
#3. Computing nullable, firstpos, lastpos, and followpos
#4. Build AFD
File saved: ./output/afd_output_2.txt
#1. Tokenize and create postfix format for regular expression
#2. Build syntax tree
#3. Computing nullable, firstpos, lastpos, and followpos
#4. Build AFD
File saved: ./output/afd_output_3.txt
#5. Union with epsilon transitions
#6. Lexer Analysis
File saved: ./output/token_list_output.txt
#7. Interpret grammar
#8. Calculate FIRST and FOLLOW sets
File saved: ./output/first_output.txt
File saved: ./output/follow_output.txt
#9. Build set of LR(0) items (Closure & Goto)
File saved: ./output/lr0_states.txt
File saved: ./output/lr0_transitions.txt
#10. Build SLR parsing table
File saved: ./output/slr_action_table.txt
File saved: ./output/slr_goto_table.txt
#11. Run SLR parsing
Sentence accepted!
------ Symbol Table ------
1: for (PR)
2: x (ID)
Sentence Accepted!