Training a small transformer to learn 3-digit addition with carry balanced data.
data_gen.py— Tokenizer and carry-balanced data generationtrain.py— Model config and training loop (logs to wandb)eval.py— Evaluation on in-distribution and out-of-distribution digit lengthsLearning_Addition_Elizabeth_Pavlova.pdf— Full write-up with results and discussion
pip install -r requirements.txt
python train.py
python eval.pyTested on Google Colab (T4 GPU)