Work in progress. The web UI and coaching pipeline are functional. The two-stage training pipeline (Phase 2 line-gen SFT → Phase 3 GRPO) is actively being developed.
A pedagogical chess analysis system combining Stockfish 17 NNUE evaluations with fine-tuned LLM coaching via a two-stage training pipeline (SFT → GRPO).
- Move Analysis: Classify moves as Best/Great/Good/Inaccuracy/Mistake/Blunder
- Natural Language Coaching: Human-like explanations of chess concepts
- Web UI: Browser-based game review with chess.com integration
- MCP Integration: Stockfish tools accessible via Model Context Protocol
- Two-Stage Training: Line-generator SFT cold-start → GRPO with verifiable chess rewards
# Install dependencies
uv sync
# Install Stockfish 17
./scripts/install_stockfish.sh
# Run tests
./scripts/test.sh -v# Game review (fetches chess.com games, opens browser UI)
uv run chess-review <username>
# CLI tutor
STOCKFISH_PATH="$HOME/.local/bin/stockfish" uv run python -c \
"import sys; sys.path.insert(0,'src'); from tutor.cli import main; main()"stockfish.py— Async Stockfish UCI wrappertools.py— 6 MCP tools:get_best_move,get_eval,get_threats,compare_moves,get_legal_moves,validate_moveserver.py— MCP server entry pointrepresentations.py— FEN / ASCII / SVG / PNG converters
legality.py— Move legality validation (UCI, SAN, LAN)tactical_loop.py— LLM output verification against enginerewards.py— GRPO reward functions: R1 legality, R3 eval accuracy, R4a annotations, R5 relevance
web.py— FastAPI server with model toggle and compare modeprompts.py— Shared prompts:SYSTEM_PROMPT,LINE_GENERATOR_SYSTEM_PROMPT, formatting helpersreview.py— CLI entry: fetches chess.com games → starts web server → opens browserchesscom.py— chess.com public API client + PGN parser
prepare_datasets.py— Main pipeline: Stockfish analysis + LLM coaching annotationsconvert_lines_to_sft.py— Convertslines_30k.jsonlto<line>tag SFT format- Outputs:
data/processed/lines_sft.jsonl(28k train),data/processed/lines_sft_eval.jsonl(1.4k eval)
train.py— Base SFT script (QLoRA, DDP, response-only masking)lib.py— Dataset helpers:load_jsonl,load_jsonl_lines,format_dataset,strip_think_from_target
Three phases, all using Qwen/Qwen3-4B-Thinking-2507 + QLoRA 8-bit on 2× RTX 5090:
Teaches the model to explain chess moves in natural language.
./recipes-train/qwen3-4b-phase1-coach-sft/start.sh- Data:
data/processed/train.jsonl(coach annotations) - Output:
checkpoints/qwen3-4b-phase1-coach-sft/
Cold-starts the model on the <line> output format before RL.
./recipes-train/qwen3-4b-phase2-lines-sft/start.sh
./recipes-train/qwen3-4b-phase2-lines-sft/stop.sh
# Logs: /tmp/chess-lines-train.log- Data:
data/processed/lines_sft.jsonl(28k samples) - Output:
checkpoints/qwen3-4b-phase2-lines-sft/ - Format:
<line>LINE N: move (purpose) → move (purpose) | eval: <label></line>
Reinforcement learning with verifiable chess rewards. Starts from Phase 2 checkpoint.
./recipes-train/qwen3-4b-phase3-grpo/start.sh- Rewards: R1 legality (0.25), R3 eval accuracy (0.30), R4a annotations (0.15), R5 relevance (0.10)
- Output:
checkpoints/qwen3-4b-phase3-grpo/
./scripts/test.sh -v # runs all 99 tests with STOCKFISH_PATH set| File | Tests |
|---|---|
test_stockfish.py |
16 |
test_mcp_tools.py |
17 |
test_verification.py |
40 |
test_representations.py |
25 |
test_chesscom.py |
1 |
STOCKFISH_PATH=/home/zheng/.local/bin/stockfish
STOCKFISH_DEPTH=20
STOCKFISH_THREADS=4
STOCKFISH_HASH_MB=256vLLM server (Qwen3.5-35B-A3B, port 8100) is used for coaching data generation and the web UI's LLM backend. See docker-compose.yml.