Skip to content

Compound words tokenization failure #8

@cathxiao

Description

@cathxiao

Expected Behavior

Compound words (e.g. pick-me-up, hand-me-down, know-it-all, etc.) should be tokenized as single tokens.

Actual Behavior

hyphens are treated as separators, and the components are tokenized separately.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions