Skip to content

Conversation

@AbhishekRai456
Copy link
Member

@AbhishekRai456 AbhishekRai456 commented Jan 26, 2026

Adds regex tokenizer with support for:

  • Literals, operators, grouping and anchors (^, $)
  • Character classes with ranges and shorthands (\d, \w, \s and negations)
  • Quantifiers {m,n}, {m,}, {,n}, {m}
  • Implicit concatenation insertion
  • Error reporting with position tracking

Adds infix → postfix (RPN) conversion for regex tokens:

  • Enforces operator precedence and associativity
  • Performs syntax validation during conversion (invalid operators, mismatched parentheses, trailing operators)

Adds ε-NFA core representation (Nfa.hpp) and Thompson-style NFA construction (NfaBuilder.hpp):

  • Manages state ownership internally with safe lifetime guarantees

This is a draft for review.

Revert accidental formatting changes

Revert accidental formatting changes in exact module

Final fixes

l
…at_tokens()

std::move in add_concat_tokens() in RegexTokenizer.cpp
@AbhishekRai456 AbhishekRai456 changed the title Regex_Search : Add tokenizer Regex_Search: Add tokenizer, postfix conversion, and NFA construction Jan 30, 2026
Copy link
Member

@Ovetsarilish Ovetsarilish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the match method which the user uses as an API seems to be missing. Is the code for that done? Make sure that the match can find multiple instances of the pattern in sample text.

I am struggling to visualize the save_id part, once that is clear will review the rest.

@Ovetsarilish Ovetsarilish marked this pull request as ready for review February 1, 2026 04:40
@Ovetsarilish Ovetsarilish merged commit f5b21be into Programming-Club-Org:dev Feb 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants