Regular Expression to AFD Converter, Lexer, and SLR Parser

This project performs lexical and syntactic analysis for a source input file using:

Regular Expressions → Syntax Tree → AFD
Lexer to tokenize source code
SLR Parser to check syntactic validity via a context-free grammar (CFG)

How to Run

Make sure you're inside the \T2-Formais\src directory and run:

python main.py

Python 3.8+ is recommended.

Mode selection

The behavior of the program is controlled by the variable ONLY_PARSER_MODE located in main.py:

ONLY_PARSER_MODE = False  # Full lexer + parser pipeline
ONLY_PARSER_MODE = True   # Only run the SLR parser from token_list_output.txt

Full Pipeline Mode (False)

Runs from scratch: converts regex → builds AFDs → lexes source text → builds parsing table → parses.

Parser-Only Mode (True)

Skips lexical analysis and uses the pre-existing file:

./output/token_list_output.txt

Input Files

These files must exist before running the program:

./input/example_input_RE.txt – contains regular expression definitions, one per line (e.g., ID: (a|b)*abb#)
./input/example_test_input.txt – the source text to be analyzed by the lexer
./input/example_input_grammar.txt - The context-free grammar (CFG) in the format, one production per line (e.g., S ::= A B)

All paths are hardcoded in main.py via global variables:

INPUT_RE_FILE = "../example_input_RE.txt"
INPUT_USER_FILE = "../example_test_input.txt"
INPUT_GRAMMAR_FILE = "./input/example_input_grammar.txt"

Output files

The following files are generated automatically:

./output/afd_output_{i}.txt – AFD representation for the i-th regular expression
./output/token_list_output.txt – Token list resulting from lexical analysis of the input source
./output/first_output.txt – FIRST sets for all non-terminals.
./output/follow_output.txt – FOLLOW sets for all non-terminals.
./output/lr0_states.txt – LR(0) items (states) generated by the canonical collection algorithm.
./output/lr0_transitions.txt – State transitions for the LR(0) items.
./output/slr_action_table.txt – ACTION table for the SLR parser.
./output/slr_goto_table.txt – GOTO table for the SLR parser.

Global output file variable in main.py:

OUTPUT_TOKEN_LIST_FILE = "../token_list_output.txt"

Process Overview

Phase 1: Lexical Analysis (skipped if ONLY_PARSER_MODE = True)

Parse Regular Expressions Converts each line of example_input_RE.txt into a syntax tree.
Build AFDs and Merge Them Each regex → AFD → merged into a single AFD with ε-transitions.
Run Lexer Tokenizes example_test_input.txt into token_list_output.txt.

Phase 2: Syntactic Analysis

Load Grammar CFG is read from example_input_grammar.txt.
Compute FIRST and FOLLOW Sets Saves to first_output.txt and follow_output.txt.
Build LR(0) Item Sets Closure and goto operations → lr0_states.txt, lr0_transitions.txt.
Build SLR Table ACTION and GOTO tables built → slr_action_table.txt, slr_goto_table.txt.
SLR Parsing Parses the token list and prints:
- Sentence Accepted!
- Sentence Rejected!

Example Inputs

Regular Expressions (example_input_RE.txt)

for: for
if: if
else: else
id: [a-zA-Z][a-zA-Z0-9]*

Source Text (example_test_input.txt)

for 
x 
if 
y 
else 
z

Grammar (example_input_grammar.txt)

S ::= for E
S ::= if E else E
S ::= id
E ::= id

Example Output

C:\Users\samsung\Desktop\Formais\T2-Formais\src>python main.py

 Starting regular expression to AFD conversion...


      Parsing regEx 1: for: for


      Parsing regEx 2: if: if


      Parsing regEx 3: else: else


      Parsing regEx 4: id: [a-zA-Z][a-zA-Z0-9]*


 #1. Tokenize and create postfix format for regular expression


 #2. Build syntax tree


 #3. Computing nullable, firstpos, lastpos, and followpos


 #4. Build AFD

 File saved: ./output/afd_output_0.txt


 #1. Tokenize and create postfix format for regular expression


 #2. Build syntax tree


 #3. Computing nullable, firstpos, lastpos, and followpos


 #4. Build AFD

 File saved: ./output/afd_output_1.txt


 #1. Tokenize and create postfix format for regular expression


 #2. Build syntax tree


 #3. Computing nullable, firstpos, lastpos, and followpos


 #4. Build AFD

 File saved: ./output/afd_output_2.txt


 #1. Tokenize and create postfix format for regular expression


 #2. Build syntax tree


 #3. Computing nullable, firstpos, lastpos, and followpos


 #4. Build AFD

 File saved: ./output/afd_output_3.txt


 #5. Union with epsilon transitions


 #6. Lexer Analysis

 File saved: ./output/token_list_output.txt


 #7. Interpret grammar


 #8. Calculate FIRST and FOLLOW sets

 File saved: ./output/first_output.txt

 File saved: ./output/follow_output.txt


 #9. Build set of LR(0) items (Closure & Goto)

 File saved: ./output/lr0_states.txt

 File saved: ./output/lr0_transitions.txt


 #10. Build SLR parsing table

 File saved: ./output/slr_action_table.txt

 File saved: ./output/slr_goto_table.txt


 #11. Run SLR parsing

Sentence accepted!

------ Symbol Table ------
1: for (PR)
2: x (ID)

Sentence Accepted!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regular Expression to AFD Converter, Lexer, and SLR Parser

How to Run

Mode selection

Full Pipeline Mode (False)

Parser-Only Mode (True)

Input Files

Output files

Process Overview

Phase 1: Lexical Analysis (skipped if ONLY_PARSER_MODE = True)

Phase 2: Syntactic Analysis

Example Inputs

Example Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Regular Expression to AFD Converter, Lexer, and SLR Parser

How to Run

Mode selection

Full Pipeline Mode (False)

Parser-Only Mode (True)

Input Files

Output files

Process Overview

Phase 1: Lexical Analysis (skipped if ONLY_PARSER_MODE = True)

Phase 2: Syntactic Analysis

Example Inputs

Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages