QuantGPT is a smaller GPT-2 (124M) variant with optimized training scripts.
It comes with training code and Colab-friendly settings for fast prototyping on limited resources.
We provide a compact GPT-2-style language model (~124M parameters) optimized for speed and efficiency.
- Framework: PyTorch
- Dataset: FineWeb-Mini
- Training Tokens: ~100M
- Precision: Uses
fp16for Colab compatibility. (bf16requires NVIDIA Ampere GPUs and above)
For training, we use the FineWeb-MINI dataset.
from datasets import load_dataset
dataset = load_dataset("AryanNsc/FineWeb-Mini")This dataset is already tokenized and sharded for optimized training.
- First, prepare data shards:
python data_prep.py- Then, start training:
python train/gpt-2-arch.pyColab uses a single T4 GPU; fp16 precision is enabled by default.
In addition to QuantGPT, we are also training QuantMobile-17M β
a small, highly efficient, and deployable transformer-based language model.
Key highlights of MobileLLM:
- Uses RMSNorm and Rotary Position Embeddings (RoPE) for better efficiency.
- Implements multi-query attention for faster inference.
- Integrated safetensors for optimized checkpointing.
- Uses PyArrow Parquet data loaders for high-throughput token streaming.
- Fully supports DDP (Distributed Data Parallel) training out of the box.
- Integrated with Weights & Biases (
wandb) for live monitoring.
python train/mobile_llm-arch.pyWe save model checkpoints using safetensors format for fast loading and secure storage.
- Final trained model β
mobile_llm_final.safetensors - Intermediate checkpoints β
checkpoints/mobile_llm_step_<step>.safetensors
This project is licensed under the MIT License.
- Model: QuantGPT on HuggingFace
- QuantMobile: QuantMobile on HuggingFace
- Dataset: FineWeb-MINI