Regular Expression to AFD Converter and Lexer

This project parses regular expressions, constructs their deterministic finite automata (AFDs), and performs lexical analysis over input text using the generated automaton.

How to Run

Make sure you're inside the \T1\src directory and run:

python main.py

Python 3.8+ is recommended.

Input Files

These files must exist before running the program:

../example_input_RE.txt – contains regular expression definitions, one per line (e.g., ID: (a|b)*abb#)
../example_test_input.txt – the source text to be analyzed by the lexer

Both paths are hardcoded in main.py via global variables:

INPUT_RE_FILE = "../example_input_RE.txt"
INPUT_USER_FILE = "../example_test_input.txt"

Output files

The following files are generated automatically:

../afd_output_{i}.txt – AFD representation for the i-th regular expression
../token_list_output.txt – Token list resulting from lexical analysis of the input source

Global output file variable in main.py:

OUTPUT_TOKEN_LIST_FILE = "../token_list_output.txt"

Process Overview

Read and Parse Regular Expressions

Parses each line of example_input_RE.txt into a RegularExpression object.
*No outputs for this.*

Convert to Postfix Notation

Tokenizes and converts each regular expression to postfix.
*No outputs for this. If you want to print out the postfix, uncomment line 35 in main.py.*

Build Syntax Tree

Constructs a syntax tree using the postfix expression.
*No outputs for this. If you want to print out the syntax tree, uncomment line 41 in main.py. However, the bigger the input RegEx, the bigger the tree. Be aware.*

Compute Tree Properties

Calculates nullable, firstpos, lastpos, and followpos values.
*No outputs for this. If you want to print out the followpos, uncomment line 46 in main.py. However, the bigger the input RegEx, the bigger the followpos. Be aware.*

Build AFD

Converts the syntax tree into a deterministic finite automaton (AFD).
*AFD is saved as afd_output_{i}.txt.*

Create Union of AFDs

Unifies all AFDs into one.
*No outputs for this. If you want to print out the afd, uncomment line 61 in main.py. However, the bigger the input RegEx, the bigger the afd. Be aware.*

Run Lexer

Uses the unified AFD to tokenize the input text from example_test_input.txt.
*Writes the token list to token_list_output.txt.*

Example output

Starting regular expression to AFD conversion...

Parsing regEx 1: id: [a-zA-Z]([a-zA-Z] | [0-9])*

#1.Tokenize and create postfix format for regular expression

     RegEx to Postfix done.

#2.Build syntax tree

     Build Syntax Tree done.

#3.Computing nullable, firstpos, lastpos, and followpos

     Nullable, firstpos, lastpos, and followpos done.

#4.Build AFD

     AFD built.

     File saved: ../afd_output_0.txt

#5.Union with epsilon transitions

     Union done.

#6.Lexer Analysis

     Token list built.

     File saved: ../token_list_output.txt

Input Examples

RegEx to be included in example_input_RE.txt

id: [a-zA-Z]([a-zA-Z] | [0-9])*
num: 1-9* | 0
er1: a?(a | b)+
er2: b?(a | b)+

Source text to be include in example_test_input.txt

a1
0
teste2
21
alpha123
3444
a43teste
aa
bbbba
ababab
bbbbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regular Expression to AFD Converter and Lexer

How to Run

Input Files

Output files

Process Overview

Read and Parse Regular Expressions

Convert to Postfix Notation

Build Syntax Tree

Compute Tree Properties

Build AFD

Create Union of AFDs

Run Lexer

Example output

Input Examples

RegEx to be included in example_input_RE.txt

Source text to be include in example_test_input.txt

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
README.md		README.md
example_input_RE.txt		example_input_RE.txt
example_test_input.txt		example_test_input.txt
token_list_output.txt		token_list_output.txt

Folders and files

Latest commit

History

Repository files navigation

Regular Expression to AFD Converter and Lexer

How to Run

Input Files

Output files

Process Overview

Read and Parse Regular Expressions

Convert to Postfix Notation

Build Syntax Tree

Compute Tree Properties

Build AFD

Create Union of AFDs

Run Lexer

Example output

Input Examples

RegEx to be included in example_input_RE.txt

Source text to be include in example_test_input.txt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages