Skip to content

brianzheng206/chess-engine

Repository files navigation

Chess Policy Engine (AlphaZero-Style) Hello

A Python implementation of a chess neural network engine with policy and value heads, trained using supervised learning from PGN games. Supports AlphaZero-style architecture with PUCT search using neural network value estimates.

Features

  • 4672 Move Encoding: 64 squares × 73 planes (56 queen-like, 8 knight, 9 underpromotion)
  • Board Encoding: 18 channels (12 pieces, side-to-move, castling rights, en passant)
  • Dual-Head ResNet Architecture:
    • Policy Head: Outputs move probabilities [4672] logits
    • Value Head: Outputs position evaluation [-1, 1] (win probability from current player's perspective)
  • Training:
    • Supervised learning from PGN games (policy + value from game outcomes)
    • Combined loss: Cross-entropy (policy) + MSE (value)
    • Label smoothing, mixed precision (AMP), gradient clipping
  • PUCT Search:
    • Uses neural network policy for move priors
    • Uses neural network value for position evaluation (replaces material heuristic)
    • Configurable simulations and exploration constant
  • UCI Interface: Standard chess engine protocol for GUI integration

Install

Using Poetry:

  • Install Poetry if needed, then install deps
    • pipx install poetry (or see poetry docs)
    • poetry install

Or using pip:

  • pip install -r <(printf "python-chess\ntorch\nnumpy\ntqdm\n") (Linux/macOS)

Python 3.9+ recommended.

Project Layout

  • src/chess_policy/encoding.py — Board → tensor encoding (18 channels) and legal move masks
  • src/chess_policy/move_index.py — Fixed 4672 index mapping and move conversions
  • src/chess_policy/model.py — ResNet models:
    • PolicyOnlyResNet: Policy-only (backward compatibility)
    • PolicyValueResNet: Policy + Value heads (AlphaZero-style, default)
  • src/chess_policy/data.py — PGN dataset with value targets extracted from game results
  • src/chess_policy/train.py — Supervised training (policy + value losses)
  • src/chess_policy/infer.py — Inference helpers (policy logits, value extraction)
  • src/chess_policy/puct.py — PUCT search with neural network value integration
  • src/chess_policy/uci.py — UCI engine interface
  • scripts/train_policy.py — Train on PGN files (policy + value)
  • scripts/play_uci.py — Run UCI engine with PUCT search

Architecture

Network Architecture

The default training setup uses a shared-residual ResNet with dual heads:

  • Input: [B, 18, 8, 8] tensor from encoding.board_to_tensor (board, side-to-move, castling, en passant)
  • Stem: Conv2d(18 → width, 3×3) + BatchNorm + ReLU
  • Trunk: n_blocks × ResidualBlock(width) (each block = 3×3 Conv → BN → ReLU → 3×3 Conv → BN + skip connection)
  • Policy head:
    • 1×1 Conv (width → width) → BN → ReLU
    • 1×1 Conv (width → 73) → reshape to [B, 73, 8, 8] → flatten to [B, 4672]
  • Value head:
    • 1×1 Conv (width → 32) → BN → ReLU
    • Global pooling: AdaptiveAvgPool2d(1) → flatten to [B, 32]
    • Linear (32 → 1) + tanh[B, 1] in [-1, 1]

The scripts/train_policy.py entrypoint builds a PolicyValueResNet with:

  • Default: width=64, blocks=8 (≈1–2M params)
  • Tunable via CLI: --width and --blocks

Model Output

The PolicyValueResNet model outputs both policy and value:

  • Policy: [B, 4672] logits - move probabilities
  • Value: [B, 1] in range [-1, 1] - position evaluation from current player's perspective
    • +1.0 = current player is winning
    • -1.0 = current player is losing
    • 0.0 = draw

PUCT Search

PUCT (Predictor + UCT) search uses:

  • Policy priors: From neural network policy head
  • Value estimates: From neural network value head (replaces material heuristic)
  • MCTS: Monte Carlo Tree Search with UCB formula

The search is configured via:

  • --sims: Number of simulations (default: 200-400)
  • --c_puct: Exploration constant (default: 1.2)
  • use_nn_value: Use neural network value (default: True)

Inference

Python API (single move)

Run the model directly from Python using the helpers in chess_policy:

import chess
import torch

from chess_policy.train import load_checkpoint
from chess_policy.infer import choose_move

device = "cuda" if torch.cuda.is_available() else "cpu"
model = load_checkpoint("models/best.pt")  # or your .pt file

board = chess.Board()  # start position (or set FEN)
move, probs, value = choose_move(board, model, device=device, temperature=1.0, sample=False)

print("Chosen move:", move.uci() if move else None)
print("Position value (current side):", f"{value:+.3f}")

choose_move takes care of:

  • Encoding the board to [1, 18, 8, 8]
  • Running the model (policy-only or policy+value)
  • Masking illegal moves and sampling/argmax over the 4672 logits

Quickstart

1. Train on PGN Games (Policy + Value)

Train a model with both policy and value heads on PGN game data:

poetry run python scripts/train_policy.py \
    games.pgn \
    --epochs 10 \
    --batch_size 256 \
    --width 64 \
    --blocks 8 \
    --lr 3e-4 \
    --value_loss_weight 1.0 \
    --out policy_value.pt

Key Parameters:

  • --value_loss_weight: Weight for value loss vs policy loss (default: 1.0)
  • The model automatically extracts value targets from PGN game results
  • All positions from a winning game get value +1.0 (from winner's perspective)

2. Run UCI Engine with PUCT Search

poetry run python scripts/play_uci.py \
    --model policy_value.pt \
    --puct \
    --sims 400 \
    --c_puct 1.2

The engine uses:

  • Neural network policy for move priors
  • Neural network value for position evaluation
  • PUCT search for move selection

UCI Engine

The engine supports two modes:

  1. Greedy Policy: Direct move selection from policy (fast, weaker)

    poetry run python scripts/play_uci.py --model policy_value.pt
  2. PUCT Search: Monte Carlo Tree Search with neural network guidance (slower, stronger)

    poetry run python scripts/play_uci.py --model policy_value.pt --puct --sims 400

PUCT Parameters:

  • --sims: Number of search simulations (more = stronger but slower)
  • --c_puct: Exploration constant (higher = more exploration)
  • --use_nn_value: Use neural network value (default: True)

UCI Commands:

  • uci → Engine info
  • isready → Check readiness
  • position startpos moves e2e4 e7e5 → Set position
  • go → Get best move
  • quit → Exit

Training Details

Value Head Learning

The value head learns from PGN game results:

  • Extracts game outcome from PGN headers ([Result "1-0"], etc.)
  • Converts to value target from each position's perspective:
    • If White won: +1.0 when White to move, -1.0 when Black to move
    • If Black won: -1.0 when White to move, +1.0 when Black to move
    • If draw: 0.0 for both sides
  • Trained with MSE loss: value_loss = MSE(value_pred, value_target)

Combined Training

Total loss combines policy and value:

loss = policy_loss + value_loss_weight * value_loss

Recommended Hyperparameters:

  • Model: width=64-128, blocks=8-12 (~1-3M params)
  • Optimizer: AdamW, lr=2e-4 to 3e-4
  • Batch size: 256-512
  • Value loss weight: 0.5-2.0 (default: 1.0)
  • Label smoothing: 0.0-0.1
  • PUCT: sims=200-800, c_puct=1.0-1.5

Limitations

  • Value Targets: Uses final game outcome for all positions (simplified but effective)
  • Search Depth: PUCT uses limited simulations compared to strong engines
  • No Self-Play: Currently supervised learning only (self-play coming soon)
  • Estimated Strength: ~1200-1800 Elo (depends on training data and model size)

See TRAINING_GUIDE.md for comprehensive documentation.

License

MIT (or adjust as you prefer)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •