Skip to content

[Feat] Unicode categories support #11

@bulatovv

Description

@bulatovv

Context

This library is used in the SGLang inference server to constrain LLM outputs to specific grammars. This allows for cool features like generating deterministic and parsable outputs from LLMs.

Use-case

One useful thing it could do is fix sudden language switches in some LLMs. For example, using a regex like [^\p{Han}]+ could help prevent unwanted fallbacks to Chinese in Chinese-focused LLMs.

However, Unicode categories are not currently supported in this library. Adding support for them would make it easier to handle such cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions