An educational from-scratch implementation of the Transformer architecture in PyTorch, based on the paper "Attention Is All You Need".
This repository demonstrates how the encoderβdecoder Transformer works by building all of its core components step by step.
Transformers-Implementation/
βββ src/
β βββ add_and_norm.py # Add & Norm layer
β βββ attention.py # Scaled Dot-Product Attention + Multi-Head Attention
β βββ decoder.py # Transformer Decoder
β βββ encoder.py # Transformer Encoder
β βββ positional_encoding.py # Sinusoidal Positional Encoding
β βββ position_wise_fnn.py # Position-wise Feed-Forward Network
β βββ transformer.py # Full Transformer model
βββ test/ # Unit tests
βββ pytest.ini # Pytest configuration
βββ .gitignore
βββ README.md
- Add & Norm Layer β residual connection + layer normalization.
- Scaled Dot-Product Attention with optional causal masking.
- Multi-Head Attention β parallel attention across multiple heads.
- Position-wise Feed-Forward Network β two linear layers with ReLU.
- Encoder & Decoder Layers β stacked with attention and feed-forward sublayers.
- Positional Encoding β sinusoidal encodings to inject sequence order.
- Full Transformer Model β combines encoder, decoder, and positional encoding.
- Python 3.7+
- PyTorch
- pytest (for running tests)
Install dependencies:
pip install -r requirements.txt