Transformer: Attention Is All You Need

Educational implementation of the Transformer architecture from the "Attention Is All You Need" paper, built with PyTorch.

🚀 Features

Complete Encoder-Decoder Architecture with cross-attention
Modular Design - each component can be studied independently
Multiple Tokenizers - SentencePiece and word-level tokenization
WMT14 Dataset Integration - German-English translation
Educational Focus - well-documented code with comprehensive docstrings
Production Ready - proper error handling, logging, and testing

🛠️ Installation

git clone https://github.com/MayukhSobo/Transformer.git
cd Transformer

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

📖 Usage

Basic Model Creation

from model import build_transformer
from config import Config

config = Config(config_file="config.toml")
transformer, dataset = build_transformer(config)

# Forward pass
output = transformer.forward(src_batch, tgt_batch, src_pad_mask, tgt_pad_mask)

Training

python main.py                    # Train with default config
python main.py --config custom.toml  # Train with custom config

Testing

python test_runner.py             # Run all tests
python test_runner.py pytest     # Run with pytest
python test_runner.py coverage   # Generate coverage report

📁 Project Structure

Transformer/
├── arch/                    # Core transformer modules
│   ├── attentions/         # Self, multi-head, and cross-attention
│   ├── encoder/            # Encoder components
│   ├── decoder/            # Decoder components  
│   ├── embedding.py        # Token embeddings
│   ├── positional_encoding.py
│   ├── feed_forward.py
│   └── residual_add_norm.py
├── tokenizer/              # Tokenization utilities
├── tests/                  # Test suite
├── data/                   # Dataset directory
├── config.toml             # Model configuration
├── model.py               # Model creation and orchestration
├── train.py               # Training implementation
├── dataset.py             # Dataset loading and preprocessing
└── main.py                # CLI entry point

⚙️ Configuration

Default model configuration (~101 million parameters, using distinct embeddings):

[model]
vocab_size = 37000
hidden_size = 512
max_seq_len = 512
n_heads = 8
n_layers = 6
ff_hidden_size = 2048
dropout_pe = 0.1

[tokenizer]
kind = "sentencepiece"    # or "word"
algorithm = "bpe"         # or "unigram"
vocab_size = 32000

[training]
batch_size = 32
epochs = 10
learning_rate = 0.0005

[dataset]
path = "./data"

🎯 Architecture Highlights

Multi-Head Attention: 8 heads with 64 dimensions each
Positional Encoding: Sinusoidal encoding with non-learnable/fixed parameters
Feed-Forward: Two-layer MLP (512 → 2048 → 512)
Residual Connections: Post-norm architecture with LayerNorm
Cross-Attention: Full encoder-decoder interaction

📊 Current Status

✅ Complete Architecture: Encoder, decoder, and cross-attention implemented
✅ Tokenization: SentencePiece and word-level tokenizers
✅ Dataset Integration: WMT14 German-English with streaming support
⚠️ Training Pipeline: Forward pass implemented, optimization in progress
✅ Testing: Comprehensive test suite with 10.00/10 pylint score

🔧 Development

# Run tests
python test_runner.py

# Run with coverage
python test_runner.py coverage

# Check code quality
pylint $(git ls-files '*.py')

# Format code
black .

📚 References

Attention Is All You Need - Original paper
The Illustrated Transformer - Visual explanation
The Annotated Transformer - Implementation guide

📄 License

MIT License - Free to use for educational purposes.

Educational transformer implementation with complete encoder-decoder architecture and cross-attention, ready for sequence-to-sequence tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
arch		arch
data		data
experiment_logs		experiment_logs
models		models
notebooks		notebooks
reports		reports
tests		tests
tokenizer		tokenizer
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
__init__.py		__init__.py
config.py		config.py
config.toml		config.toml
config_small.toml		config_small.toml
dataset.py		dataset.py
log_config.py		log_config.py
loss.py		loss.py
main.py		main.py
model.py		model.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py
test_runner.py		test_runner.py
train.py		train.py
utils.py		utils.py
wnb.py		wnb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer: Attention Is All You Need

🚀 Features

🛠️ Installation

📖 Usage

Basic Model Creation

Training

Testing

📁 Project Structure

⚙️ Configuration

🎯 Architecture Highlights

📊 Current Status

🔧 Development

📚 References

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

MayukhSobo/Transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer: Attention Is All You Need

🚀 Features

🛠️ Installation

📖 Usage

Basic Model Creation

Training

Testing

📁 Project Structure

⚙️ Configuration

🎯 Architecture Highlights

📊 Current Status

🔧 Development

📚 References

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages