Chess Challenge

Train a 1M parameter LLM to play chess!

Objective

Design and train a transformer-based language model to predict chess moves. Your model must:

Stay under 1M parameters - This is the hard constraint!
Use a custom tokenizer - Design an efficient move-level tokenizer
Play legal chess - The model should learn to generate valid moves
Beat Stockfish - Your ELO will be measured against Stockfish Level 1

Dataset

We use the Lichess dataset: dlouapre/lichess_2025-01_1M

The dataset uses an extended UCI notation:

W/B prefix for White/Black
Piece letter: P=Pawn, N=Knight, B=Bishop, R=Rook, Q=Queen, K=King
Source and destination squares (e.g., e2e4)
Special suffixes: (x)=capture, (+)=check, (+*)=checkmate, (o)/(O)=castling

Example game:

WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...

Quick Start

Installation

# Clone the template
git clone https://github.com/nathanael-fijalkow/ChessChallengeTemplate.git
cd ChessChallengeTemplate

# Create virtual environment
uv venv
source .venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
uv pip install -e .

Train a Model

# Basic training
python -m src.train \
    --output_dir ./my_model \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32

Evaluate Your Model

Evaluation happens in two phases:

# Phase 1: Legal Move Evaluation (quick sanity check)
python -m src.evaluate \
    --model_path ./my_model/final_model \
    --mode legal \
    --n_positions 500

# Phase 2: Win Rate Evaluation (full games against Stockfish)
python -m src.evaluate \
    --model_path ./my_model/final_model \
    --mode winrate \
    --n_games 100 \
    --stockfish_level 1

# Or run both phases:
python -m src.evaluate \
    --model_path ./my_model/final_model \
    --mode both

Parameter Budget

Use the utility function to check your budget:

from src import ChessConfig, print_parameter_budget

config = ChessConfig(
    vocab_size=1200,
    n_embd=128,
    n_layer=4,
    n_head=4,
)
print_parameter_budget(config)

Pro Tips

Weight Tying: The default config ties the embedding and output layer weights, saving ~154k parameters
Vocabulary Size: Keep it small! ~1200 tokens covers all moves
Depth vs Width: With limited parameters, experiment with shallow-but-wide vs deep-but-narrow

Customization

Custom Tokenizer

The template provides a move-level tokenizer that builds vocabulary from the actual dataset. Feel free to try different approaches!

Custom Architecture

Modify the model in src/model.py:

from src import ChessConfig, ChessForCausalLM

# Customize configuration
config = ChessConfig(
    vocab_size=1200,
    n_embd=128,      # Try 96, 128, or 192
    n_layer=4,       # Try 3, 4, or 6
    n_head=4,        # Try 4 or 8
    n_inner=384,     # Feed-forward dimension (default: 3*n_embd)
    dropout=0.1,
    tie_weights=True,
)

model = ChessForCausalLM(config)

Evaluation Metrics

Phase 1: Legal Move Evaluation

Tests if your model generates valid chess moves:

Metric	Description
Legal Rate (1st try)	% of legal moves on first attempt
Legal Rate (with retry)	% of legal moves within 3 attempts

Target: >90% legal rate before proceeding to Phase 2

Phase 2: Win Rate Evaluation

Full games against Stockfish to measure playing strength:

Metric	Description
Win Rate	% of games won against Stockfish
ELO Rating	Estimated rating based on game results
Avg Game Length	Average number of moves per game
Illegal Move Rate	% of illegal moves during games

Submission

Train your model
Log in to Hugging Face: hf auth login
Submit your model using the submission script:

python submit.py --model_path ./my_model/final_model --model_name your-model-name

The script will:

Upload your model to the LLM-course organization
Include your HF username in the model card for tracking

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
submit.py		submit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chess Challenge

Objective

Dataset

Quick Start

Installation

Train a Model

Evaluate Your Model

Parameter Budget

Pro Tips

Customization

Custom Tokenizer

Custom Architecture

Evaluation Metrics

Phase 1: Legal Move Evaluation

Phase 2: Win Rate Evaluation

Submission

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chess Challenge

Objective

Dataset

Quick Start

Installation

Train a Model

Evaluate Your Model

Parameter Budget

Pro Tips

Customization

Custom Tokenizer

Custom Architecture

Evaluation Metrics

Phase 1: Legal Move Evaluation

Phase 2: Win Rate Evaluation

Submission

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages