Skip to content

[Robustness] Create Farsi Tokenizer Robustness Dataset #32

@Malikeh97

Description

@Malikeh97
  • Create Canonical datasets in Farsi (with similar complexity level and overall format/category to the Turkish version)

  • Generate Farsi-specific perturbations for the canonical examples

  • Add them the dataset sheet

  • Create a HF dataset for the Farsi tokenizer robustness dataset and push to R3 HF space (the associated collection)

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions