This will be a naive implementation of the Transformer Architecture in PyTorch. Transformers address problems in natural language processing through the use of self-attention. This allows the Transformer to understand patterns in linear sequences and reason about them by learning long-range patterns in text.
How to use
Experiment for code-generation in train_code.py
Code
To run the experiment for NMT for code-generation, run train_code.py
To test your model, run run_inference.py and change your model weights on this line:
transformer.load_state_dict(torch.load(f"weights/transformer_code_50.pth"))
Examples:
Preliminary Plan
-
Read the following reference material on Github for implementation specifics by 9/17
- nanoGPT, Andrej Karpathy's GPT2 Transformer implementation with a custom vocab
- X-Transformers, A transformer library built with many custom tooling from a variety of different papers
- HuggingFace Transformer's BERT, HugginFace's implementation for BERT
-
Write a system design document for classes and necessary methods for implementation 9/20
-
Build a custom text vocab dataset for training and validation. 10/1
- Use tiktoken library to tokenize sentences and phrases.
-
Complete core transformer classes and methods 10/10
-
Make a dataset loader module for running experiments to train our transformer on the custom dataset 10/20
-
Wrap-up experiment and document experimental results such as training performance 11/1
Papers
- Attention is All you Need
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Model are unsupervised learners
- Evaluating Large Language Models Trained on Code
- StarCoder
- CodeLlama
- CodeT5
- Auto-Formalization w/ LLMs
Adapting for Code-Generation
Online Resources

