First, install the required packages:
pip install torchao liger_kernel pyarrow tensorboardπ‘ Note:
torchaoandliger_kernelmay require a recent version of PyTorch (β₯2.3) and a CUDA-enabled environment for optimal performance.
- Download all files from this repository.
- Place them in a single working directory.
- Inside this directory, create a subfolder named
128. - Download the training data (Parquet files) into the
128/folder:
π TinyCorpus-v2
Your directory should look like this:
your-training-folder/
βββ trainGPT-token.py
βββ fast_self_attn_model.py
βββ data_utils.py
βββ dev_optim.py
βββ 128/
βββ tinycorpus-000-of-128.parquet
βββ tinycorpus-001-of-128.parquet
βββ ... # all shard files
Run the training script from inside your-training-folder:
python trainGPT-token.pyThis will replicate MiniModel with 12 layers, the original model used 24 layers. Please change 'layers': 24 in trainGPT-token.py if you wish to replicate the original model.
By default, the script logs training loss and other metrics to a directory called
runs/using PyTorchβsSummaryWriter.
While training is running (or after it finishes), launch TensorBoard to visualize the loss curve:
tensorboard --logdir=runsThen open your browser and go to:
π http://localhost:6006
Youβll see a real-time plots of the training loss (refreshes every 30s).
If you encounter memory issues, open trainGPT-token.py and adjust one or both of the following:
- Reduce model size:
'input_dims': 512 # default 768
- Reduce batch size:
batch_size = 32 # default 64
Smaller values will lower VRAM usage at the cost of training speed or stability.