### ❓ The question Hi! Is the raw data for the 50B, 100B and 300B compositions available for [Dolmino Mix 1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124)? I can see the tokenized `.npy` files are available to download [here](https://github.com/allenai/OLMo/blob/main/configs/official-0425/OLMo2-1B-stage2-seed42.yaml), but these have already been tokenized. Thanks!