Sparse-GPT-Pretraining (Open-source in progress)

(260209) We are open-sourcing our codebase one aspect at a time, today and in the following days.

A codebase for pretraining multi-billion-scale sparse GPTs.

To set up the environment, see ./environment_setup.md. To set up the dataset, see ./standalone/fineweb_edu_10b/.

Citation

If you find this repository helpful, please cite our work:

@misc{cui2026multiheadlatentmoeheadparallel,
      title={Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism}, 
      author={Chenwei Cui and Rockwell Jackson and Benjamin Joseph Herrera and Ana María Tárano and Hannah Kerner},
      year={2026},
      eprint={2602.04870},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.04870}, 
}

@software{cui2025sparse,
  title  = {Sparse-GPT-Pretraining: A codebase for pretraining multi-billion-scale sparse GPTs},
  author = {Cui, Chenwei and Herrera, Benjamin Joseph and Jackson, Rockwell and Kerner, Hannah},
  url    = {https://github.com/kerner-lab/Sparse-GPT-Pretraining},
  month  = nov,
  year   = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
src		src
standalone/fineweb_edu_10b		standalone/fineweb_edu_10b
third_party		third_party
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_setup.md		environment_setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse-GPT-Pretraining (Open-source in progress)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sparse-GPT-Pretraining (Open-source in progress)

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages