Skip to content

Guney-olu/Quantgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

QuantGPT

Hugging Face

QuantGPT is a smaller GPT-2 (124M) variant with optimized training scripts.
It comes with training code and Colab-friendly settings for fast prototyping on limited resources.


πŸš€ Base Model

We provide a compact GPT-2-style language model (~124M parameters) optimized for speed and efficiency.

  • Framework: PyTorch
  • Dataset: FineWeb-Mini
  • Training Tokens: ~100M
  • Precision: Uses fp16 for Colab compatibility. (bf16 requires NVIDIA Ampere GPUs and above)

πŸ“‚ Dataset

For training, we use the FineWeb-MINI dataset.

from datasets import load_dataset

dataset = load_dataset("AryanNsc/FineWeb-Mini")

This dataset is already tokenized and sharded for optimized training.


πŸ’» Training on Colab

  1. First, prepare data shards:
python data_prep.py
  1. Then, start training:
python train/gpt-2-arch.py

Colab uses a single T4 GPU; fp16 precision is enabled by default.


🧩 New Models Training

In addition to QuantGPT, we are also training QuantMobile-17M β€”
a small, highly efficient, and deployable transformer-based language model.

Key highlights of MobileLLM:

  • Uses RMSNorm and Rotary Position Embeddings (RoPE) for better efficiency.
  • Implements multi-query attention for faster inference.
  • Integrated safetensors for optimized checkpointing.
  • Uses PyArrow Parquet data loaders for high-throughput token streaming.
  • Fully supports DDP (Distributed Data Parallel) training out of the box.
  • Integrated with Weights & Biases (wandb) for live monitoring.

Training Example

python train/mobile_llm-arch.py

πŸ“¦ Model Checkpoints

We save model checkpoints using safetensors format for fast loading and secure storage.

  • Final trained model β†’ mobile_llm_final.safetensors
  • Intermediate checkpoints β†’ checkpoints/mobile_llm_step_<step>.safetensors

πŸ“œ License

This project is licensed under the MIT License.


🌐 Links

About

A smaller version of GPT-2(124M) training code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages