__ .__ .__ ________ __________ ___________
_/ |_ _____ _____ |__|| | / _____/ \______ \\__ ___/
\ __\\__ \ / \ | || | / \ ___ | ___/ | |
| | / __ \_| Y Y \| || |__\ \_\ \ | | | |
|__| (____ /|__|_| /|__||____/ \______ / |____| |____|
\/ \/ \/
Training a GPT in 4 hours on tamil tokens.
Implement Andrej Karptathy's nanoGPT
This is a repo trying to train nanoGPT under 3 mins from scratch. Using this repo as a reference, we apply these changes to nanoGPT
- Rotary embedding
- Normalize Q,K
- ReLu^2
- Uniform weight initialization
- Skip connections(Encoding/Decoding)
- Embeddings to the closest multiple of 128(2^7)
Now you can train a GPT on a cheap NVIDIA chip.
Download ai4bharat's dataset(ta.txt) and place it under data/. Run src/clean.py and finally src/train.py. Modify batch size based on your VRAM(src/model.py).
You can find the weights here.
- Zero weight initialization for lm_head and c_proj
lm_head -> 0 slows the gradient flow to deeper layers. Idk why it has been used in moddedGPT. If you know the answer: https://x.com/_smoke_y/status/1891013258032611364