Skip to content

kerner-lab/Sparse-GPT-Pretraining

Repository files navigation

Sparse-GPT-Pretraining (Open-source in progress)

(260209) We are open-sourcing our codebase one aspect at a time, today and in the following days.

A codebase for pretraining multi-billion-scale sparse GPTs.

To set up the environment, see ./environment_setup.md. To set up the dataset, see ./standalone/fineweb_edu_10b/.

Citation

If you find this repository helpful, please cite our work:

@misc{cui2026multiheadlatentmoeheadparallel,
      title={Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism}, 
      author={Chenwei Cui and Rockwell Jackson and Benjamin Joseph Herrera and Ana María Tárano and Hannah Kerner},
      year={2026},
      eprint={2602.04870},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.04870}, 
}

@software{cui2025sparse,
  title  = {Sparse-GPT-Pretraining: A codebase for pretraining multi-billion-scale sparse GPTs},
  author = {Cui, Chenwei and Herrera, Benjamin Joseph and Jackson, Rockwell and Kerner, Hannah},
  url    = {https://github.com/kerner-lab/Sparse-GPT-Pretraining},
  month  = nov,
  year   = {2025}
}

About

A codebase for pretraining multi-billion-scale sparse GPTs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors