"If you really learn all of these, you'll know 90% of what matters today" - Ilya Sutskever
Complete implementations • Interactive notebooks • Beginner-friendly • 100% Free
30u30 is an open-source study guide that takes you through the 30 foundational AI papers recommended by Ilya Sutskever — one paper per day.
Each paper comes with a full implementation from scratch, detailed notes, interactive exercises with solutions, and visualizations.
Think of it as Rustlings, but for deep learning fundamentals. You read the paper, understand the math, and build it yourself.
Day 1: "The Unreasonable Effectiveness of Recurrent Neural Networks"
- Character-level RNN from scratch (pure NumPy)
- Interactive Jupyter notebook with visualizations
- 5 progressive exercises + solutions
Start Day 1 →
Day 2: "Understanding LSTM Networks"
- Complete LSTM with 4 gates from scratch
- Gate activation analysis & visualizations
- 5 exercises including LSTM vs GRU comparison
Start Day 2 →
Day 3: "RNN Regularization"
- Dropout, Layer Norm, Weight Decay, Early Stopping
- Complete regularization pipeline from scratch
- 5 exercises on preventing overfitting
Start Day 3 →
Day 4: "Minimizing Description Length"
- Bayesian / Noisy-Weight networks, MDL intuition
- Uncertainty envelopes, compression analysis, pareto frontier
- 5 exercises that demonstrate gaps, beta tuning, MC inference
Start Day 4 →
Day 5: "MDL Principle Tutorial"
- Two-Part Codes, Prequential MDL, NML Complexity
- MDL vs AIC vs BIC comparison, compression analysis
- 5 exercises: from basic MDL to model selection showdown
Start Day 5 →
Day 6: "The First Law of Complexodynamics"
- Information equilibration, channel capacity, evolutionary dynamics
- Complete complexity evolution simulator with 7 visualizations
- 5 exercises: from Shannon entropy to real genome analysis
Start Day 6 →
Day 7: "The Coffee Automaton"
- Cellular automaton complexity, heat diffusion, emergent behavior
- Chaos theory, Lyapunov exponents, information flow analysis
- 5 exercises: edge of chaos, pattern classification, neural network initialization
Start Day 7 →
Day 8: "ImageNet Classification with Deep CNNs (AlexNet)"
- The paper that sparked the deep learning revolution
- GPU-accelerated training, ReLU activations, dropout regularization
- 5 exercises: GPU impact analysis, activation functions, data augmentation
Start Day 8 →
Day 9: "Deep Residual Learning for Image Recognition (ResNet)"
- Skip connections that enable 100+ layer networks
- Identity mappings, residual blocks, gradient highway
- 5 exercises: vanishing gradients, skip connection ablation, depth analysis
Start Day 9 →
Day 10: "Identity Mappings in Deep Residual Networks (ResNet v2)"
- Pre-activation design: BN → ReLU → Conv
- Why order matters for 1000+ layer networks
- 5 exercises: pre vs post activation, information flow, extreme depth
Start Day 10 →
Day 11: "Multi-Scale Context Aggregation by Dilated Convolutions"
- Exponentially expanding receptive fields without pooling
- Dense prediction, semantic segmentation, WaveNet foundations
- 5 exercises: receptive field analysis, dilation patterns, context modules
Start Day 11 →
Day 12: "Dropout: A Simple Way to Prevent Overfitting"
- The standard regularization technique for neural networks
- Inverted dropout, MC Dropout for uncertainty, ensemble interpretation
- 5 exercises: implement dropout, rate sweep, spatial dropout, MC uncertainty
Start Day 12 →
Day 13: "Attention Is All You Need"
- The paper that revolutionized NLP and beyond - the Transformer
- Self-attention, multi-head attention, positional encoding
- 5 exercises: from scaled dot-product to full Transformer + interactive visualization
Start Day 13 →
Day 14: "The Annotated Transformer"
- Code-level understanding of the Transformer - from math to PyTorch
- Production-quality implementation with all training infrastructure
- 5 exercises: attention, multi-head, encoder, training, inference
Start Day 14 →
Day 15: "Neural Machine Translation by Jointly Learning to Align and Translate"
- The original attention mechanism - before Transformers existed!
- Bahdanau (additive) attention, bidirectional encoder, alignment visualization
- 5 exercises: attention from scratch, encoder-decoder, beam search, visualization
Start Day 15 →
Day 16: "Order Matters: Sequence to Sequence for Sets"
- Pointer Networks - process sets, output sequences by pointing!
- Order-invariant encoding, Read-Process-Write framework
- 5 exercises: pointer attention, set encoder, sorting, convex hull, TSP
Start Day 16 →
Day 17: "Neural Turing Machines"
- Differentiable external memory for neural networks
- Addressing mechanics: Content-based, Interpolation, Shift, Sharpening
- 5 exercises: addressing logic, circular convolution, memory updates
Start Day 17 →
Day 18: "Pointer Networks"
- Networks that can "point" to their input (essential for combinatorial problems)
- Laser pointer attention, sampling without replacement, combinatorial optimization
- 5 exercises: pointer attention, convex hull formatting, TSP cost analysis
Start Day 18 →
Day 19: "Relational Reasoning"
- Pairwise object processing for VQA and physical reasoning
- g_theta and f_phi modules, set-based inductive bias
- 5 exercises: pair generation, sort-of-CLEVR logic, masking
Start Day 19 →
Day 20: "Relational Recurrent Neural Networks"
- Multi-head dot-product attention inside a recurrent cell (MHDPA)
- Relational memory core: memory slots interact via self-attention at each timestep
- 5 exercises: memory attention, slot interactions, sequence modeling
Start Day 20 →
Day 21: "Neural Message Passing for Quantum Chemistry"
- Unifying framework for graph neural networks: message, update, readout
- Edge networks, GRU update, Set2Set readout, QM9 benchmark
- 5 exercises: message functions, graph construction, property prediction
Start Day 21 →
Day 22: "Deep Speech 2: End-to-End Speech Recognition"
- End-to-end speech recognition replacing traditional ASR pipelines
- Conv + bidirectional GRU + CTC loss, sequence-wise BatchNorm, SortaGrad
- 5 exercises: spectrogram features, CTC decoding, RNN BatchNorm, curriculum learning, full pipeline
Start Day 22 →
Day 23: "Variational Lossy Autoencoder"
- Curing posterior collapse in VAEs with powerful decoders
- Restricted receptive field (PixelCNN) + Inverse Autoregressive Flows (IAF)
- 5 exercises: from masked convolutions to full flow priors
Start Day 23 →
Day 24: "GPipe: Efficient Training of Giant Neural Networks"
- Pipeline parallelism + micro-batching + activation checkpointing
- Training giant 6B+ parameter models on limited hardware
- 5 exercises: from micro-batching to full pipeline integration
Start Day 24 →
Day 25: "Scaling Laws for Neural Language Models"
- The power-law relationships between model size, compute, and performance
- Scaling compute budget vs. model size vs. dataset size
- 5 exercises: scaling law calculations, compute-optimal training, dataset scaling
Start Day 25 →
Day 26: "Kolmogorov Complexity and Algorithmic Randomness"
- The mathematical bedrock of information theory: Compression = Intelligence
- From-scratch implementation of Huffman and Arithmetic coding
- 5 exercises: entropy comparison, NCD similarity clustering, and incompressibility
Start Day 26 →
Day 27: "Machine Super Intelligence (Shane Legg)"
- Universal Intelligence (Υ), Kolmogorov Complexity proxies, and the Agent-Environment loop
- Formal benchmarking of Random vs. RL vs. Predictive agents
- 5 exercises on Upsilon calculation, environment design, and complexity invariance
Start Day 27 →
Day 28: "CS231n: CNNs for Visual Recognition"
- Conv layers (naive + im2col), pooling, ReLU, FC — full CNN from scratch in NumPy
- VGGNet-16 parameter analysis, spatial dimension progression, architecture case studies
- 5 exercises: conv forward, pooling backprop, output sizes, parameter counting, feature viz
Start Day 28 →
Day 29: "Proximal Policy Optimization (PPO)"
- The algorithm behind ChatGPT (RLHF)
- Clipped surrogate objective & GAE from scratch
- 5 exercises on policy constraints and stability
Start Day 29 →
Day 30: "Deep Reinforcement Learning from Human Feedback (RLHF)"
- Aligning AI with human preferences
- Reward Modeling, Preference Loss, and PPO integration
- Synthetic Oracle and full training loop from scratch
Start Day 30 →
This is the most comprehensive, beginner-friendly, open-source journey through the papers that defined modern AI. No paywalls. No gatekeeping. Just pure knowledge.
Whether you're pivoting to AI, a student, or a curious mind - this is your roadmap.
Each paper gets the full treatment:
- 📖 Deep-dive README - Complete explanations with real-world analogies
- 💡 ELI5 Notes - "Explain Like I'm 5" summaries
- 💻 Implementation - Clean, commented, CPU-friendly code
- 🎨 Visualizations - See the concepts come alive
- 🏋️ Exercises - Build it yourself (with solutions)
- 📓 Notebooks - Interactive Jupyter walkthroughs
- ⚡ Quick-start - Minimal training scripts that run in minutes
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 1 | The Unreasonable Effectiveness of RNNs | 🚀 LIVE | Why predicting = intelligence |
| 2 | Understanding LSTM Networks | 🚀 LIVE | The mechanics of memory |
| 3 | RNN Regularization | 🚀 LIVE | Making RNNs generalize |
| 4 | Minimizing Description Length | 🚀 LIVE | Compression = Intelligence |
| 5 | MDL Principle Tutorial | 🚀 LIVE | Math of compression |
| 6 | The First Law of Complexodynamics | 🚀 LIVE | Physics of complexity |
| 7 | The Coffee Automaton | 🚀 LIVE | Why intelligence exists |
Vision, depth, and the techniques that changed everything
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 8 | ImageNet Classification (AlexNet) | 🚀 LIVE | Deep learning revolution |
| 9 | Deep Residual Learning (ResNet) | 🚀 LIVE | Skip connections |
| 10 | Identity Mappings in ResNets | 🚀 LIVE | Pre-activation design |
| 11 | Multi-Scale Context (Dilated Conv) | 🚀 LIVE | Dilated convolutions |
| 12 | Dropout (Srivastava et al.) | 🚀 LIVE | Preventing overfitting |
The architecture that ate the world
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 13 | Attention Is All You Need | 🚀 LIVE | Self-attention, Transformer |
| 14 | The Annotated Transformer | 🚀 LIVE | Code-level Transformer |
| 15 | Bahdanau Attention (NMT) | 🚀 LIVE | Original attention mechanism |
| 16 | Order Matters (Pointer Networks) | 🚀 LIVE | Set-to-sequence problems |
Memory, graphs, and reasoning
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 17 | Neural Turing Machines | 🚀 LIVE | Differentiable external memory |
| 18 | Pointer Networks | 🚀 LIVE | Selecting input via attention |
| 19 | Relational Reasoning | 🚀 LIVE | Pairwise object relations; g_theta & f_phi modules |
| 20 | Relational RNNs | 🚀 LIVE | Self-attention inside recurrence |
| 21 | Neural Message Passing | 🚀 LIVE | MPNN framework for graph neural networks |
| 22 | Deep Speech 2 | 🚀 LIVE | End-to-end speech recognition with CTC |
From theory to massive models
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 23 | Variational Lossy Autoencoder | 🚀 LIVE | Curing posterior collapse with IAF |
| 24 | GPipe: Efficient Training of Giant Neural Networks | 🚀 LIVE | Pipeline parallelism |
| 25 | Scaling Laws for Neural Language Models | 🚀 LIVE | The physics of AI scaling |
| 26 | Kolmogorov Complexity | 🚀 LIVE | Math of compression & randomness |
| 27 | Machine Super Intelligence | 🚀 LIVE | Safety & intelligence definitions |
| 28 | CS231n: CNNs for Visual Recognition | 🚀 LIVE | CNN layers from scratch, VGGNet analysis |
From Policy Gradients to RLHF
| Day | Paper | Status | Core Concept |
|---|---|---|---|
| 29 | Proximal Policy Optimization (PPO) | 🚀 LIVE | The algorithm behind ChatGPT (RLHF) |
| 30 | Deep Reinforcement Learning from Human Feedback | 🚀 LIVE | The birth of "Human Feedback" (RLHF) |
The modern era of LLMs
| Paper | Core Concept |
|---|---|
| BERT: Pre-training of Deep Bidirectional Transformers | Coming Soon |
| GPT-2: Language Models are Unsupervised Multitask Learners | Coming Soon |
| GPT-3: Language Models are Few-Shot Learners | Coming Soon |
| Chinchilla: Training Compute-Optimal Large Language Models | Coming Soon |
Complete paper list with links →
# Clone the repo
git clone https://github.com/yourusername/30u30.git
cd 30u30
# Start with Day 1
cd papers/01_Unreasonable_EffectivenessFor beginners: Basic Python knowledge. We'll teach you the rest.
For practitioners: Jump to any paper that interests you.
🎯 The 30-Day Challenge:
- One paper per day
- Read the README
- Run the code
- Complete exercises
- Share your progress with #30u30
🔀 Choose Your Path:
- Theory-First: README → Notes → Code
- Code-First: Notebook → Implementation → README
- Practice-First: Exercises → Solutions → Deep-dive
There are many paper summaries online. But this is different:
- You build everything - No "import magic_ai_library"
- Multiple learning paths - Theory-first, code-first, or interactive
- Production-quality - Code that actually works and teaches
- Beginner-friendly - Real-world analogies + rigorous math
- Community-driven - Your feedback shapes future days
Goal: The best free resource for learning AI fundamentals.
If this helps you, ⭐ star the repo and share it with others!
We'd love your help making this better!
- 🐛 Found a bug? Open an issue
- 💡 Have an idea? Open an issue with the "enhancement" label or Start a discussion
- 📝 Want to contribute code? See CONTRIBUTING.md
Every contribution helps thousands of learners.
CC BY-NC-ND 4.0 — Free to read, learn, and share with attribution. Not for commercial use.
- Ilya Sutskever for the original reading list
- All paper authors for advancing the field
- You for taking this journey
- 🐦 Twitter: Share progress with #30u30
- 📧 Issues: Report bugs or request features
- 💬 Discussions: Join the conversation
- ⭐ Star the repo to stay updated on new releases!
Ready to start?
→ Day 1: Character-Level RNN
→ Day 2: Understanding LSTMs
→ Day 3: RNN Regularization
→ Day 4: Minimizing Description Length
→ Day 5: MDL Principle Tutorial
→ Day 6: The First Law of Complexodynamics
→ Day 7: The Coffee Automaton
→ Day 8: ImageNet Classification (AlexNet)
→ Day 9: Deep Residual Learning (ResNet)
→ Day 10: Identity Mappings (ResNet v2)
→ Day 11: Dilated Convolutions
→ Day 12: Dropout
→ Day 13: Attention Is All You Need
→ Day 14: The Annotated Transformer
→ Day 15: Bahdanau Attention (NMT)
→ Day 16: Order Matters (Pointer Networks)
→ Day 17: Neural Turing Machines
→ Day 18: Pointer Networks
→ Day 19: Relational Reasoning
→ Day 20: Relational RNNs
→ Day 21: Neural Message Passing
→ Day 22: Deep Speech 2
→ Day 23: Variational Lossy Autoencoder
→ Day 24: GPipe (Giant Neural Networks)
→ Day 25: Scaling Laws for Neural Language Models
→ Day 26: Kolmogorov Complexity
→ Day 27: Machine Super Intelligence
→ Day 28: CS231n — CNNs for Visual Recognition
→ Day 29: Proximal Policy Optimization (PPO)
→ Day 30: RLHF 🆕 ← Start here!
Let's build something amazing together! 🚀