Description
Currently, the tool uses a simple wrapping approach for user-provided regex patterns to maintain compatibility between JSON Schema and Lark CFG outputs.
Specifically, user-supplied regexes are wrapped as:
- JSON Schema:
^(<user_regex>)$
- Lark CFG:
/<user_regex>/
Anchors (^, $) are disallowed in user input to avoid conflicts or semantic drift between the two representations. This pragmatic solution works for most common regex patterns but does not yet handle advanced cases.
Problem
Some edge cases may cause semantic inconsistencies or invalid syntax:
- Regexes with inline flags (
(?i), (?m), etc.)
- Partial anchors or subexpression anchoring
- Alternations (
|) without explicit grouping
- Lookarounds or multiline modifiers
Example:
should ideally translate to (?i)^(foo|bar)$, not ^((?i)foo|bar)$.