We should implement a few training configuration files to train small models to do:
- String repetition (
--dataset_text_template "<text> {text} <again> {text}")
- String repetition + (
--bytes_encoder_model_name_or_path None)
- Word deconstructions ("Strawberry S1 T1 R3 A1 W1 B1 E1 Y1") - lots of words, maybe only english
and evaluate actually in generation.