(260209) We are open-sourcing our codebase one aspect at a time, today and in the following days.
A codebase for pretraining multi-billion-scale sparse GPTs.
To set up the environment, see ./environment_setup.md. To set up the dataset, see ./standalone/fineweb_edu_10b/.
If you find this repository helpful, please cite our work:
@misc{cui2026multiheadlatentmoeheadparallel,
title={Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism},
author={Chenwei Cui and Rockwell Jackson and Benjamin Joseph Herrera and Ana María Tárano and Hannah Kerner},
year={2026},
eprint={2602.04870},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.04870},
}
@software{cui2025sparse,
title = {Sparse-GPT-Pretraining: A codebase for pretraining multi-billion-scale sparse GPTs},
author = {Cui, Chenwei and Herrera, Benjamin Joseph and Jackson, Rockwell and Kerner, Hannah},
url = {https://github.com/kerner-lab/Sparse-GPT-Pretraining},
month = nov,
year = {2025}
}