Train a 1M parameter LLM to play chess!
Design and train a transformer-based language model to predict chess moves. Your model must:
- Stay under 1M parameters - This is the hard constraint!
- Use a custom tokenizer - Design an efficient move-level tokenizer
- Play legal chess - The model should learn to generate valid moves
- Beat Stockfish - Your ELO will be measured against Stockfish Level 1
We use the Lichess dataset: dlouapre/lichess_2025-01_1M
The dataset uses an extended UCI notation:
W/Bprefix for White/Black- Piece letter:
P=Pawn,N=Knight,B=Bishop,R=Rook,Q=Queen,K=King - Source and destination squares (e.g.,
e2e4) - Special suffixes:
(x)=capture,(+)=check,(+*)=checkmate,(o)/(O)=castling
Example game:
WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...
# Clone the template
git clone https://github.com/nathanael-fijalkow/ChessChallengeTemplate.git
cd ChessChallengeTemplate
# Create virtual environment
uv venv
source .venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
uv pip install -e .# Build the image
docker build -t chess-challenge .
# Run with GPU (if available) and mount your workspace
docker run --rm -it --gpus all \
-u $(id -u):$(id -g) \
-v "$PWD:/workspace" -w /workspace \
chess-challenge bash# Basic training
python -m src.train \
--output_dir ./my_model \
--num_train_epochs 3 \
--per_device_train_batch_size 32Evaluation happens in two phases:
# Phase 1: Legal Move Evaluation (quick sanity check)
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode legal \
--n_positions 500
# Phase 2: Win Rate Evaluation (full games against Stockfish)
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode winrate \
--n_games 100 \
--stockfish_level 1
# Or run both phases:
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode bothYou can fine-tune a trained baseline with Stockfish-guided rewards using RL.py:
python RL.py \
--model_path ./my_model/final_model \
--output_dir ./rl_model \
--stockfish_path /path/to/stockfish \
--steps 500 \
--batch_size 8 \
--group_size 4This requires python-chess and a Stockfish binary on your PATH (or --stockfish_path).
Use the utility function to check your budget:
from src import ChessConfig, print_parameter_budget
config = ChessConfig(
vocab_size=1200,
n_embd=128,
n_layer=4,
n_head=4,
)
print_parameter_budget(config)- Weight Tying: The default config ties the embedding and output layer weights, saving ~154k parameters
- Vocabulary Size: Keep it small! ~1200 tokens covers all moves
- Depth vs Width: With limited parameters, experiment with shallow-but-wide vs deep-but-narrow
The template provides a move-level tokenizer that builds vocabulary from the actual dataset. Feel free to try different approaches!
Modify the model in src/model.py:
from src import ChessConfig, ChessForCausalLM
# Customize configuration
config = ChessConfig(
vocab_size=1200,
n_embd=128, # Try 96, 128, or 192
n_layer=4, # Try 3, 4, or 6
n_head=4, # Try 4 or 8
n_inner=384, # Feed-forward dimension (default: 3*n_embd)
dropout=0.1,
tie_weights=True,
)
model = ChessForCausalLM(config)Tests if your model generates valid chess moves:
| Metric | Description |
|---|---|
| Legal Rate (1st try) | % of legal moves on first attempt |
| Legal Rate (with retry) | % of legal moves within 3 attempts |
Target: >90% legal rate before proceeding to Phase 2
Full games against Stockfish to measure playing strength:
| Metric | Description |
|---|---|
| Win Rate | % of games won against Stockfish |
| ELO Rating | Estimated rating based on game results |
| Avg Game Length | Average number of moves per game |
| Illegal Move Rate | % of illegal moves during games |
- Train your model
- Log in to Hugging Face:
hf auth login - Submit your model using the submission script:
python submit.py --model_path ./my_model/final_model --model_name your-model-nameThe script will:
- Upload your model to the LLM-course organization
- Include your HF username in the model card for tracking