LLM built from scratch

Read this in other languages: English, 中文.

LLM built from scratch

Continuously optimizing and updating...

🗓️ Changelog

2025-06-22

📝 Added reward model training, PPO training, GRPO training

Click for more details

2025-06-19

📝 Pretokenized corpus and updated cold start code.

2025-06-05

📝 Uploaded tokenizer training weights (1.56 chars/token), adjusted tokenizer format and training method to align with the style of the Qwen2 tokenizer.

2025-05-26

📝 Added LLM documentation covering full lifecycle technical points of LLMs.

2025-05-22

📝 Added support for MoE models. Training resource usage is unstable. Tested model with Experts=8, experts_per_tok=4. GPU memory fluctuated from 60% to 94%, and single GPU utilization fluctuated from 0% to 100%. After reducing the batch_size, memory usage stabilized at around 90%, single GPU utilization stabilized above 90%, occasionally dropping to around 30%.

2025-05-20

📝 Added support for training with jsonl files.

2025-05-19

📝 Added support for training code using DeepSpeed. Testing showed a maximum batch_size increase of 38% and a training speed increase of 9.6%.

2025-05-16

📝 Added pre-training code for dolly_llm. Conducted a pre-training test: model 0.6B, corpus 500M, using 46GB * 4 GPUs.

2025-05-14

✅ Released dolly_llm as a pip package for installation.

2025-05-09

📝 Optimized code for RMSNorm, MLP, and RoPE.

2025-05-07

📝 Standardized modeling and configuration using the transformers format, and designed/modified modeling_dolly v0.1 with 11.5B parameters.

2025-05-06

📝 Implemented the configuration_dolly class and added modeling_dolly v0.0.

2025-04-30

📝 Added BBPE method training for the Tokenizer.

2025-04-29

📝 Added tokenizer construction code, supporting sentencepiece and transformers' BPE. Supports building from text and from existing tokenizers.

2025-04-24

✅ Tested constructing custom LLM model architectures from transformers.

Acknowledgments

Special thanks to the following resources and articles for their assistance:

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
deepspeed_config		deepspeed_config
documents		documents
src		src
src_dev		src_dev
test		test
utils		utils
weights/weights_llm		weights/weights_llm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
requirements_version.txt		requirements_version.txt
setup.py		setup.py
start from 2025.04.23		start from 2025.04.23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM built from scratch

2025-06-22

2025-06-19

2025-06-05

2025-05-26

2025-05-22

2025-05-20

2025-05-19

2025-05-16

2025-05-14

2025-05-09

2025-05-07

2025-05-06

2025-04-30

2025-04-29

2025-04-24

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

IP127000/LLM-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLM built from scratch

2025-06-22

2025-06-19

2025-06-05

2025-05-26

2025-05-22

2025-05-20

2025-05-19

2025-05-16

2025-05-14

2025-05-09

2025-05-07

2025-05-06

2025-04-30

2025-04-29

2025-04-24

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages