Chess AI Tutor

Work in progress. The web UI and coaching pipeline are functional. The two-stage training pipeline (Phase 2 line-gen SFT → Phase 3 GRPO) is actively being developed.

A pedagogical chess analysis system combining Stockfish 17 NNUE evaluations with fine-tuned LLM coaching via a two-stage training pipeline (SFT → GRPO).

Features

Move Analysis: Classify moves as Best/Great/Good/Inaccuracy/Mistake/Blunder
Natural Language Coaching: Human-like explanations of chess concepts
Web UI: Browser-based game review with chess.com integration
MCP Integration: Stockfish tools accessible via Model Context Protocol
Two-Stage Training: Line-generator SFT cold-start → GRPO with verifiable chess rewards

Installation

# Install dependencies
uv sync

# Install Stockfish 17
./scripts/install_stockfish.sh

# Run tests
./scripts/test.sh -v

Quick Start

# Game review (fetches chess.com games, opens browser UI)
uv run chess-review <username>

# CLI tutor
STOCKFISH_PATH="$HOME/.local/bin/stockfish" uv run python -c \
  "import sys; sys.path.insert(0,'src'); from tutor.cli import main; main()"

Project Structure

`src/chess_mcp/` — MCP Server & Stockfish Integration ✅

stockfish.py — Async Stockfish UCI wrapper
tools.py — 6 MCP tools: get_best_move, get_eval, get_threats, compare_moves, get_legal_moves, validate_move
server.py — MCP server entry point
representations.py — FEN / ASCII / SVG / PNG converters

`src/verification/` — Move Validation & GRPO Rewards ✅

legality.py — Move legality validation (UCI, SAN, LAN)
tactical_loop.py — LLM output verification against engine
rewards.py — GRPO reward functions: R1 legality, R3 eval accuracy, R4a annotations, R5 relevance

`src/tutor/` — Web UI & Coaching ✅

web.py — FastAPI server with model toggle and compare mode
prompts.py — Shared prompts: SYSTEM_PROMPT, LINE_GENERATOR_SYSTEM_PROMPT, formatting helpers
review.py — CLI entry: fetches chess.com games → starts web server → opens browser
chesscom.py — chess.com public API client + PGN parser

`data/pipeline/` — Training Data Pipeline ✅

prepare_datasets.py — Main pipeline: Stockfish analysis + LLM coaching annotations
convert_lines_to_sft.py — Converts lines_30k.jsonl to <line> tag SFT format
Outputs: data/processed/lines_sft.jsonl (28k train), data/processed/lines_sft_eval.jsonl (1.4k eval)

`training/` — Shared Training Utilities

train.py — Base SFT script (QLoRA, DDP, response-only masking)
lib.py — Dataset helpers: load_jsonl, load_jsonl_lines, format_dataset, strip_think_from_target

Training Pipeline

Three phases, all using Qwen/Qwen3-4B-Thinking-2507 + QLoRA 8-bit on 2× RTX 5090:

Phase 1 — Coach SFT (`recipes-train/qwen3-4b-phase1-coach-sft/`)

Teaches the model to explain chess moves in natural language.

./recipes-train/qwen3-4b-phase1-coach-sft/start.sh

Data: data/processed/train.jsonl (coach annotations)
Output: checkpoints/qwen3-4b-phase1-coach-sft/

Phase 2 — Line Generator SFT (`recipes-train/qwen3-4b-phase2-lines-sft/`)

Cold-starts the model on the <line> output format before RL.

./recipes-train/qwen3-4b-phase2-lines-sft/start.sh
./recipes-train/qwen3-4b-phase2-lines-sft/stop.sh
# Logs: /tmp/chess-lines-train.log

Data: data/processed/lines_sft.jsonl (28k samples)
Output: checkpoints/qwen3-4b-phase2-lines-sft/
Format: <line>LINE N: move (purpose) → move (purpose) | eval: <label></line>

Phase 3 — GRPO (`recipes-train/qwen3-4b-phase3-grpo/`)

Reinforcement learning with verifiable chess rewards. Starts from Phase 2 checkpoint.

./recipes-train/qwen3-4b-phase3-grpo/start.sh

Rewards: R1 legality (0.25), R3 eval accuracy (0.30), R4a annotations (0.15), R5 relevance (0.10)
Output: checkpoints/qwen3-4b-phase3-grpo/

Tests

./scripts/test.sh -v        # runs all 99 tests with STOCKFISH_PATH set

File	Tests
`test_stockfish.py`	16
`test_mcp_tools.py`	17
`test_verification.py`	40
`test_representations.py`	25
`test_chesscom.py`	1

Configuration

STOCKFISH_PATH=/home/zheng/.local/bin/stockfish
STOCKFISH_DEPTH=20
STOCKFISH_THREADS=4
STOCKFISH_HASH_MB=256

vLLM server (Qwen3.5-35B-A3B, port 8100) is used for coaching data generation and the web UI's LLM backend. See docker-compose.yml.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
data/pipeline		data/pipeline
docker		docker
recipes-inference		recipes-inference
recipes-train		recipes-train
scripts		scripts
src		src
static		static
tests		tests
training		training
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chess AI Tutor

Features

Installation

Quick Start

Project Structure

`src/chess_mcp/` — MCP Server & Stockfish Integration ✅

`src/verification/` — Move Validation & GRPO Rewards ✅

`src/tutor/` — Web UI & Coaching ✅

`data/pipeline/` — Training Data Pipeline ✅

`training/` — Shared Training Utilities

Training Pipeline

Phase 1 — Coach SFT (`recipes-train/qwen3-4b-phase1-coach-sft/`)

Phase 2 — Line Generator SFT (`recipes-train/qwen3-4b-phase2-lines-sft/`)

Phase 3 — GRPO (`recipes-train/qwen3-4b-phase3-grpo/`)

Tests

Configuration

About

Uh oh!

Contributors

Uh oh!

Languages

helloworld0909/chess-ai-tutor

Folders and files

Latest commit

History

Repository files navigation

Chess AI Tutor

Features

Installation

Quick Start

Project Structure

src/chess_mcp/ — MCP Server & Stockfish Integration ✅

src/verification/ — Move Validation & GRPO Rewards ✅

src/tutor/ — Web UI & Coaching ✅

data/pipeline/ — Training Data Pipeline ✅

training/ — Shared Training Utilities

Training Pipeline

Phase 1 — Coach SFT (recipes-train/qwen3-4b-phase1-coach-sft/)

Phase 2 — Line Generator SFT (recipes-train/qwen3-4b-phase2-lines-sft/)

Phase 3 — GRPO (recipes-train/qwen3-4b-phase3-grpo/)

Tests

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`src/chess_mcp/` — MCP Server & Stockfish Integration ✅

`src/verification/` — Move Validation & GRPO Rewards ✅

`src/tutor/` — Web UI & Coaching ✅

`data/pipeline/` — Training Data Pipeline ✅

`training/` — Shared Training Utilities

Phase 1 — Coach SFT (`recipes-train/qwen3-4b-phase1-coach-sft/`)

Phase 2 — Line Generator SFT (`recipes-train/qwen3-4b-phase2-lines-sft/`)

Phase 3 — GRPO (`recipes-train/qwen3-4b-phase3-grpo/`)