NanoGPT-JAX

A high-performance JAX/Flax implementation of NanoGPT optimized for TPU training.

Overview

This project reimplements Andrej Karpathy's NanoGPT in JAX, focusing on performance and scalability. It leverages JAX's automatic differentiation and compilation capabilities along with Flax's neural network layers to create an efficient and maintainable codebase that runs distributedly on TPUs.

Core Features

🚀 Full JAX/Flax implementation optimized for TPUs
📈 Distributed training with @pmap
🔄 Gradient accumulation for larger effective batch sizes
📊 Integrated Weights & Biases logging
💾 Support for inference straight from pretrained weights
🎯 Cosine learning rate schedule with warmup

Training Results

We reach a validation loss of 3.17 after 270k steps, at which point the model had converged. This took roughly 18 hours on a TPU v3-8.

Additionally, when training on a TPU, we hit an average duty cycle of 77%, indicating good accelerator utilization.

Installation

Clone the repository:

git clone https://github.com/plippmann/nanogpt-jax && cd nanogpt-jax

Set up the environment:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/miniconda3/bin/activate
conda create -n myenv python=3.10
conda activate myenv

Install JAX (TPU version):

pip install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

Install dependencies:

pip install -r requirements.txt

Structure

nanogpt-jax/
├── nanogpt/
│   ├── model.py      # Core GPT-2 implementation
│   ├── train.py      # Training loop and configuration
│   ├── inference.py  # Text generation utilities
│   ├── pretrained.py # Load pretrained weights
│   └── tests.py      # Sanity checks for model.py
├── data/
│   └── openwebtext/
│       └── prepare.py # Get data
│   └── shakespeare/
│       └── prepare.py # Get data
└── requirements.txt

Implementation Details

Model Architecture

Implements the GPT-2 architecture using Flax's neural network modules
Supports configurable model sizes
Own implementation of causal self-attention as well as Flax's version

Training

Distributed training across TPU cores using @pmap
Gradient accumulation for higher effective batch sizes
Learning rate scheduling with warmup and cosine decay
AdamW optimizer for now
Integrated W&B logging for training metrics

Project Status

~~Implement the model in JAX~~
~~Write tests~~
~~Load pretrained weights~~
~~Perform inference from pretrained weights~~
~~Train the model on TPUs~~
~~Make it fast with @pmap/@jit~~
~~Run inference on the trained model~~
Post-training fun
~~Implement RoPE, Muon optimizer, and other improvements~~

TPU Training Guide

Prepare training data:

python data/openwebtext/prepare.py

Upload the resulting train.bin and val.bin to your GCP storage bucket.

Create a TPU VM:

ZONE=europe-west4-a
TPU_TYPE=v3-8
VM_NAME=jax-gpt-v3-8

gcloud alpha compute tpus tpu-vm create $VM_NAME \
    --zone=$ZONE \
    --accelerator-type=$TPU_TYPE \
    --version=tpu-ubuntu2204-base \
    --preemptible

SSH into the VM:

gcloud alpha compute tpus tpu-vm ssh $VM_NAME --zone=$ZONE

Start training:

python train.py

Generate text from best checkpoint:

python inference.py --init_from resume --checkpoint_type best

Clean up:

gcloud alpha compute tpus tpu-vm delete $VM_NAME --zone=$ZONE

Useful Resources

NanoGPT by Karpathy
gpt-jax by Penn Jenks
PyTorch to JAX blog by Douglas Jia

Contact

Feel free to reach out at p.lippmann@tudelft.nl.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
assets		assets
data		data
nanogpt		nanogpt
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NanoGPT-JAX

Overview

Core Features

Training Results

Installation

Structure

Implementation Details

Model Architecture

Training

Project Status

TPU Training Guide

Useful Resources

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PLippmann/nanogpt-jax

Folders and files

Latest commit

History

Repository files navigation

NanoGPT-JAX

Overview

Core Features

Training Results

Installation

Structure

Implementation Details

Model Architecture

Training

Project Status

TPU Training Guide

Useful Resources

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages