Skip to content

t0n4r/gpt2-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tiny GPT-2 Implementation

A "tiny" implementation of GPT-2 trained on a Haiku dataset, designed to run on limited resources (like a single T4 GPU on Google Colab).

Project Description

This project demonstrates how to build and train a small-scale GPT-2 model from scratch. The goal is to understand the architecture and mechanics of Transformers by implementing them, rather than just using pre-trained models.

Key features:

  • Byte-Level BPE Tokenization: Handling text efficiently.
  • Transformer Architecture: Implementing Self-Attention, LayerNorm, and FeedForward networks.
  • Training: Autoregressive training on a dataset of haikus.
  • Generation: Sampling new haikus from the trained model.

Links

Usage

The easiest way to run this project is via the Google Colab link above.

If you wish to run it locally, ensure you have the necessary dependencies installed (PyTorch, Transformers, etc.) and run the notebook GPT_2_Implementation.ipynb.

References

  1. Attention Is All You Need
  2. OpenAI GPT-2 Paper

About

Tiny GPT2 Implementation on Limited GPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published