Skip to content

andresveraf/Build-GPT-model-with-Python

Repository files navigation

πŸ€– Build a GPT Model from Scratch in Pure Python

Python License Education Status

A complete educational implementation of GPT (Generative Pre-trained Transformer) in pure Python

Features β€’ Quick Start β€’ Documentation β€’ Examples β€’ Study Guide


πŸ“š Overview

This repository contains three progressive versions of a GPT model implementation, each designed for different learning needs:

Version Lines Description Best For
Original 243 Andrej Karpathy's minimal implementation Quick reference
Refactored 850 Well-structured with Mermaid diagrams Understanding architecture
Educational 1,200 Professor-style teaching with detailed prints Learning from scratch

All versions maintain 100% functional equivalence while progressively improving readability and educational value.


✨ Features

  • 🧠 Complete GPT Implementation: Multi-head attention, transformer layers, autograd
  • πŸ“– Educational Focus: Every component explained with intuition and math
  • 🎨 Visual Diagrams: 11 Mermaid diagrams for visual understanding
  • πŸ”¬ Workflow Visualization: Detailed prints showing data flow during training
  • πŸ“Š Study Guide: Comprehensive component deep dives with examples
  • πŸš€ Pure Python: No dependencies beyond standard library
  • πŸ“ Well-Documented: Extensive comments and docstrings

🎯 What You'll Learn

By studying this code, you will understand:

  • βœ… How neural networks compute (forward pass)
  • βœ… How they learn (backward pass & automatic differentiation)
  • βœ… How to optimize them (Adam optimizer)
  • βœ… The transformer architecture (attention is all you need)
  • βœ… Language modeling and text generation
  • βœ… Why each design decision was made

πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/andresveraf/Build-GPT-model-with-Python.git
cd Build-GPT-model-with-Python

# No additional dependencies needed! (Pure Python)

Run the Educational Version (Recommended for Learning)

python3 script_gpt_educational.py

Expected Output:

================================================================================
DEEP LEARNING FROM SCRATCH: GPT Implementation
================================================================================

πŸ“š Welcome! Let's build a GPT model step by step, understanding every detail.

βœ“ Random seed set to 42 for reproducibility

================================================================================
PART 2: CONFIGURING THE MODEL - HYPERPARAMETERS
================================================================================

πŸ“ MODEL ARCHITECTURE:
   β€’ Embedding dimension: 16
   β€’ Attention heads: 4 (each with 4 dimensions)
   β€’ Transformer layers: 1
   β€’ Context window: 16 tokens

...

🎲 GENERATING 20 SAMPLES:
Sample  1: kamon
Sample  2: ann
Sample  3: karai
Sample  4: jaire
Sample  5: vialan
...

Run Other Versions

# Refactored version (clean, documented)
python3 script_gpt_refactored.py

# Original version (compact)
python3 script_gpt.py

πŸ“– Documentation

Core Documents

  1. REFACTORING_SUMMARY.md - Complete study guide with:

    • Component deep dives (11 major components)
    • Step-by-step examples with actual numbers
    • Mathematical formulations
    • Formula cheat sheet
    • Dimension tracking guide
    • Study checklist (4 levels)
  2. This README - Quick start and overview

  3. Inline Documentation - Each Python file contains extensive comments

Component Explanations

Each component is thoroughly explained:

  • Multi-Head Attention: How the model learns relationships between tokens
  • RMS Normalization: Why we normalize activations
  • Softmax: Converting logits to probabilities (with numerical stability)
  • Adam Optimizer: Adaptive learning with momentum
  • Matrix Multiplication: The fundamental operation in neural networks
  • Loss Calculation: Cross-entropy and perplexity
  • Training Loop: How the model learns from data

πŸ’‘ Examples

Example 1: Understanding Attention

# Multi-head attention allows the model to focus on different aspects
# Head 0: Previous character dependency
# Head 1: Position-based patterns
# Head 2: Consonant clusters
# Head 3: Vowel patterns

Example 2: Training Progress

Step   1 / 1000 | Loss: 3.3660 | Perplexity: 28.94
Step 100 / 1000 | Loss: 2.8945 | Perplexity: 18.07
Step 200 / 1000 | Loss: 2.7123 | Perplexity: 15.07
Step 500 / 1000 | Loss: 2.6543 | Perplexity: 14.22
Step 1000/ 1000 | Loss: 2.6501 | Perplexity: 14.16

Example 3: Generated Names

After training on 32,033 names, the model generates realistic names:

kamon, ann, karai, jaire, vialan, mari, jalen, etc.

πŸŽ“ Study Guide

Learning Path

Level 1: Beginner (1-2 days)

  • Read the README
  • Run the educational version
  • Understand the basic flow
  • Read hyperparameter explanations

Level 2: Intermediate (1 week)

  • Study component deep dives in REFACTORING_SUMMARY.md
  • Understand attention mechanism
  • Learn about normalization and softmax
  • Follow the training loop

Level 3: Advanced (2-3 weeks)

  • Implement components from scratch
  • Experiment with hyperparameters
  • Debug training issues
  • Modify the architecture

Level 4: Expert (ongoing)

  • Read original papers (Attention Is All You Need, GPT-2, GPT-3)
  • Implement from memory
  • Design experiments
  • Contribute to research

Key Concepts

Concept Importance Difficulty
Tokenization ⭐⭐⭐ Easy
Embeddings ⭐⭐⭐⭐ Medium
Attention ⭐⭐⭐⭐⭐ Hard
Normalization ⭐⭐⭐⭐ Medium
Backpropagation ⭐⭐⭐⭐⭐ Hard
Optimization ⭐⭐⭐⭐ Medium

πŸ“‚ Project Structure

Build-GPT-model-with-Python/
β”‚
β”œβ”€β”€ README.md                           # This file
β”œβ”€β”€ REFACTORING_SUMMARY.md              # Complete study guide
β”œβ”€β”€ input.txt                           # Training data (names)
β”‚
β”œβ”€β”€ script_gpt.py                       # Original (243 lines)
β”œβ”€β”€ script_gpt_refactored.py            # Refactored (850 lines)
└── script_gpt_educational.py           # Educational (1,200 lines)

πŸ”§ Customization

Change Model Architecture

# In script_gpt_educational.py, modify:

N_EMBD = 16          # Try: 32, 64, 128
N_HEAD = 4           # Try: 2, 8
N_LAYER = 1          # Try: 2, 3, 4
BLOCK_SIZE = 16      # Try: 32, 64

Adjust Training

LEARNING_RATE = 0.01  # Try: 0.001, 0.005, 0.02
NUM_STEPS = 1000      # Try: 500, 2000, 5000
TEMPERATURE = 0.5     # Try: 0.3 (conservative), 0.8 (creative)

Use Your Own Data

Replace input.txt with your own text file (one item per line):

word1
word2
word3
...

πŸ“Š Model Architecture

Input Token
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Embedding Layer                     β”‚
β”‚  β€’ Token Embedding                  β”‚
β”‚  β€’ Position Embedding               β”‚
β”‚  β€’ RMS Normalization                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Transformer Layer                  β”‚
β”‚  β€’ Multi-Head Self-Attention        β”‚
β”‚  β€’ Residual Connection              β”‚
β”‚  β€’ Feed-Forward Network (MLP)       β”‚
β”‚  β€’ Residual Connection              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Output Projection                  β”‚
β”‚  β€’ Linear to Vocabulary Size        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
Logits β†’ Softmax β†’ Probabilities

πŸŽ“ Educational Resources

Papers

Videos

Courses


🀝 Contributing

This is an educational project. Contributions are welcome!

Ways to Contribute

  1. Add More Examples: Create new training datasets
  2. Improve Documentation: Clarify explanations
  3. Add Visualizations: Create more diagrams
  4. Fix Bugs: Report and fix issues
  5. Share Your Learning: Write blog posts or tutorials

Development

# Run tests (if you add them)
python3 -m pytest tests/

# Format code
black script_gpt_*.py

# Check style
flake8 script_gpt_*.py

πŸ“ License

This project is open source and available under the MIT License.


πŸ™ Acknowledgments

  • Andrej Karpathy - Original minimal GPT implementation
  • OpenAI - GPT architecture and research
  • Google Brain - Transformer architecture
  • DeepLearning.AI - Educational resources

πŸ“§ Contact

Have questions? Feel free to:

  • Open an issue on GitHub
  • Start a discussion
  • Contact me directly

Made with ❀️ for educational purposes

⬆ Back to Top

About

Bulding a Gpt model to understand

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages