A neural network playground for understanding character-level language models through name generation.
NeuralNameForge is an educational project that implements various neural network architectures to generate names. It serves as a practical introduction to different types of neural networks, from simple bigram models to complex transformers. By focusing on the specific task of name generation, it provides a concrete way to understand how different neural architectures process and generate sequential data.
The project implements several neural network architectures, each representing a different approach to sequence modeling:
-
Bigram Model
- Simplest form of statistical language model
- Based on character pair probabilities
-
Multi-Layer Perceptron (MLP)
- Implementation based on Bengio et al. 2003
- Demonstrates basic neural network concepts
-
Recurrent Neural Networks (RNN)
- Vanilla RNN
- GRU (Gated Recurrent Unit)
- LSTM (Long Short-Term Memory)
-
Transformer
- Based on the architecture used in GPT
- Implements self-attention mechanism
-
Bag of Words (BoW)
- Alternative approach to sequence modeling
- Demonstrates context aggregation
-
Setup
git clone https://github.com/yourusername/NeuralNameForge cd NeuralNameForge pip install torch tensorboard -
Basic Usage
# Train a transformer model (default) python makemore.py -i names.txt -o out # Try different architectures python makemore.py -i names.txt -o out --type lstm python makemore.py -i names.txt -o out --type gru python makemore.py -i names.txt -o out --type bigram # Generate names without training python makemore.py -i names.txt -o out --sample-only
-
Monitor Training
tensorboard --logdir out
-
Embeddings
- Convert discrete characters into continuous vectors
- Learned representation of characters
-
Sequential Processing
- RNN: Processes data one step at a time, maintaining hidden state
- LSTM/GRU: Advanced RNNs with gating mechanisms
- Transformer: Processes entire sequence using attention
-
Attention Mechanism
- Self-attention in transformer
- Allows model to focus on relevant parts of input
- Parallel processing advantage over RNNs
-
Loss Functions
- Cross-entropy loss for character prediction
- Measures prediction accuracy
-
Model Architecture Components
- Linear layers
- Activation functions (Tanh, ReLU, Sigmoid)
- Layer normalization
- Residual connections
-
Training Dynamics
- Gradient descent optimization
- Learning rate importance
- Batch processing
- Model evaluation and validation
- Bigram: Simplest model, uses counting statistics
- MLP: Feedforward network, fixed context window
- RNN: Processes sequences recursively
- LSTM/GRU: Solves vanishing gradient problem
- Transformer: Uses attention for global context
MIT License - See LICENSE file for details.
This project is a learning-focused reimagining of Andrej Karpathy's makemore project. Key papers that influenced this implementation:
- Bengio et al. 2003 (Neural Probabilistic Language Models)
- Graves et al. 2014 (LSTM)
- Cho et al. 2014 (GRU)
- Vaswani et al. 2017 (Transformer)
Special thanks to the PyTorch team for their excellent deep learning framework.
Note: This project is designed for learning purposes. While it can generate names, its primary value lies in understanding neural network architectures and their implementations.