-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
I want to use interegular for tokenization in a C-like programming language. In particular, I use the regex \/\*.*?\*\/ to match the block comments. By using ?, I expect the lexer can immediately return a block comment when it see some string constitutes one.
from interegular import parse_pattern
fsm = parse_pattern(r'(?s:\/\*.*?\*\/)').to_fsm()
assert '/**/*/' not in fsmHowever, this pattern is treated the same way as if we removed ?. This causes the string /**/*/ to wrongly match the regex, and by the 'most vexing' principle, this pattern eats every character even after a block comment finishes.
This can be seen from the source code that ? are simply ignored.
interegular/interegular/patterns.py
Lines 601 to 612 in 758f837
| if self.static_b("*"): | |
| if self.static_b("?"): | |
| pass | |
| return _Repeated(base, 0, None) | |
| elif self.static_b("+"): | |
| if self.static_b("?"): | |
| pass | |
| return _Repeated(base, 1, None) | |
| elif self.static_b("?"): | |
| if self.static_b("?"): | |
| pass | |
| return _Repeated(base, 0, 1) |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels