Skip to content

This repository contains the implementation of transformers architecture from scratch in python.

License

Notifications You must be signed in to change notification settings

Pranav910/transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer architecture from scratch in Python using pytorch

Overview :

This repository contains the implementation of the Transformer architecture inspired from the 'Attention is All You Need Paper' from scratch using the pytorch library. This model was trained on the Game of Thrones screenplay dataset with the corpus containing total of 50000 words, for the task of next word prediction.

Model Details :

  • Corpus size : 50000 words(tokens)
  • Final loss : 1.73
  • Total epochs : 70
  • Learning Rate : 0.005
  • Optimizer : Adam
  • Loss Function : Cross Entropy
  • Total Prameters : 25.1 Million

Model results after tranining :

alt text

Note :

Although the result is good enough, still there's a room for improvement. We can optimize the accuracy by adding dropout layers, reducing the number of total parameters to avoid overfitting or by implementing the kv cache for efficient use of hardware and much more. May be I will do it later, but for now it's good enough.

Thanks!!

About

This repository contains the implementation of transformers architecture from scratch in python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published