gpt

This project attempts to build the transformer architecture described in the Attention Is All You Need [1] paper, using pytorch. This project currently only builds the decoder architecture of the transformer–I am working on building the encoder block and adding additional functionality to this. This was only an exercise for me to learn and understand the architecture behind the transformer, in practice, this code is slow and while it is able to make output that is more meaningful than random characters, the output is not usable. bigram.py contains the code and comments, and gpt-trial.ipynb is the colab notebook I used to train the model, to make use of GPU acceleration.

The important equation from this paper which is implemented in the code is as follows:

Attention(Q,K,V) = softmax(QK^T / sqrt(d_k) )V

This is used to make each token that is being queried pay attention to the keys and the values of all previous tokens, so it can take into account in ranging degrees how relavant something before it is.

This project relied heavily on Andrej Karpathy's building GPT youtube video, but I am continually learning and implementing new things to build on his project.

References: [1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
bigram.py		bigram.py
gpt_trial.ipynb		gpt_trial.ipynb
input.txt		input.txt
make_model.py		make_model.py
model.pth		model.pth
output.txt		output.txt
use_model.py		use_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpt

About

Uh oh!

Releases

Packages

Languages

kaushal2m2/gpt

Folders and files

Latest commit

History

Repository files navigation

gpt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages