GitHub - smoke-y/tamilGPT: GPT-2 emitting tamil tokens

  __                    .__ .__     ________ __________ ___________ 
_/  |_ _____     _____  |__||  |   /  _____/ \______   \\__    ___/ 
\   __\\__  \   /     \ |  ||  |  /   \  ___  |     ___/  |    |    
 |  |   / __ \_|  Y Y  \|  ||  |__\    \_\  \ |    |      |    |    
 |__|  (____  /|__|_|  /|__||____/ \______  / |____|      |____|    
            \/       \/                   \/

Training a GPT in 4 hours on tamil tokens.

MODEL

nanoGPT

Implement Andrej Karptathy's nanoGPT

modded-nanoGPT

This is a repo trying to train nanoGPT under 3 mins from scratch. Using this repo as a reference, we apply these changes to nanoGPT

Rotary embedding
Normalize Q,K
ReLu^2
Uniform weight initialization
Skip connections(Encoding/Decoding)
Embeddings to the closest multiple of 128(2^7)

Now you can train a GPT on a cheap NVIDIA chip.

GETTING STARTED

Download ai4bharat's dataset(ta.txt) and place it under data/. Run src/clean.py and finally src/train.py. Modify batch size based on your VRAM(src/model.py).

You can find the weights here.

FAILED EXPERIMENT

Zero weight initialization for lm_head and c_proj
lm_head -> 0 slows the gradient flow to deeper layers. Idk why it has been used in moddedGPT. If you know the answer: https://x.com/_smoke_y/status/1891013258032611364

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MODEL

nanoGPT

modded-nanoGPT

GETTING STARTED

FAILED EXPERIMENT

About

Uh oh!

Languages

License

smoke-y/tamilGPT

Folders and files

Latest commit

History

Repository files navigation

MODEL

nanoGPT

modded-nanoGPT

GETTING STARTED

FAILED EXPERIMENT

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages