Skip to content

perf: Improve Regex Handling for Cross-Syntax Compatibility (JSON Schema ↔ Lark CFG) #124

@Ki-Seki

Description

@Ki-Seki

Description

Currently, the tool uses a simple wrapping approach for user-provided regex patterns to maintain compatibility between JSON Schema and Lark CFG outputs.
Specifically, user-supplied regexes are wrapped as:

  • JSON Schema: ^(<user_regex>)$
  • Lark CFG: /<user_regex>/

Anchors (^, $) are disallowed in user input to avoid conflicts or semantic drift between the two representations. This pragmatic solution works for most common regex patterns but does not yet handle advanced cases.

Problem

Some edge cases may cause semantic inconsistencies or invalid syntax:

  • Regexes with inline flags ((?i), (?m), etc.)
  • Partial anchors or subexpression anchoring
  • Alternations (|) without explicit grouping
  • Lookarounds or multiline modifiers

Example:

(?i)foo|bar

should ideally translate to (?i)^(foo|bar)$, not ^((?i)foo|bar)$.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions