[Robustness] Create Farsi Tokenizer Robustness Dataset

- [ ] Create Canonical datasets in Farsi (with similar complexity level and overall format/category to the Turkish version)

- [ ] Generate Farsi-specific perturbations for the canonical examples

- [ ] Add them the dataset sheet

- [ ] Create a HF dataset for the Farsi tokenizer robustness dataset and push to R3 HF space (the associated collection)