Chess Policy Engine (AlphaZero-Style) Hello

A Python implementation of a chess neural network engine with policy and value heads, trained using supervised learning from PGN games. Supports AlphaZero-style architecture with PUCT search using neural network value estimates.

Features

4672 Move Encoding: 64 squares × 73 planes (56 queen-like, 8 knight, 9 underpromotion)
Board Encoding: 18 channels (12 pieces, side-to-move, castling rights, en passant)
Dual-Head ResNet Architecture:
- Policy Head: Outputs move probabilities [4672] logits
- Value Head: Outputs position evaluation [-1, 1] (win probability from current player's perspective)
Training:
- Supervised learning from PGN games (policy + value from game outcomes)
- Combined loss: Cross-entropy (policy) + MSE (value)
- Label smoothing, mixed precision (AMP), gradient clipping
PUCT Search:
- Uses neural network policy for move priors
- Uses neural network value for position evaluation (replaces material heuristic)
- Configurable simulations and exploration constant
UCI Interface: Standard chess engine protocol for GUI integration

Install

Using Poetry:

Install Poetry if needed, then install deps
- pipx install poetry (or see poetry docs)
- poetry install

Or using pip:

pip install -r <(printf "python-chess\ntorch\nnumpy\ntqdm\n") (Linux/macOS)

Python 3.9+ recommended.

Project Layout

src/chess_policy/encoding.py — Board → tensor encoding (18 channels) and legal move masks
src/chess_policy/move_index.py — Fixed 4672 index mapping and move conversions
src/chess_policy/model.py — ResNet models:
- PolicyOnlyResNet: Policy-only (backward compatibility)
- PolicyValueResNet: Policy + Value heads (AlphaZero-style, default)
src/chess_policy/data.py — PGN dataset with value targets extracted from game results
src/chess_policy/train.py — Supervised training (policy + value losses)
src/chess_policy/infer.py — Inference helpers (policy logits, value extraction)
src/chess_policy/puct.py — PUCT search with neural network value integration
src/chess_policy/uci.py — UCI engine interface
scripts/train_policy.py — Train on PGN files (policy + value)
scripts/play_uci.py — Run UCI engine with PUCT search

Architecture

Network Architecture

The default training setup uses a shared-residual ResNet with dual heads:

Input: [B, 18, 8, 8] tensor from encoding.board_to_tensor (board, side-to-move, castling, en passant)
Stem: Conv2d(18 → width, 3×3) + BatchNorm + ReLU
Trunk: n_blocks × ResidualBlock(width) (each block = 3×3 Conv → BN → ReLU → 3×3 Conv → BN + skip connection)
Policy head:
- 1×1 Conv (width → width) → BN → ReLU
- 1×1 Conv (width → 73) → reshape to [B, 73, 8, 8] → flatten to [B, 4672]
Value head:
- 1×1 Conv (width → 32) → BN → ReLU
- Global pooling: AdaptiveAvgPool2d(1) → flatten to [B, 32]
- Linear (32 → 1) + tanh → [B, 1] in [-1, 1]

The scripts/train_policy.py entrypoint builds a PolicyValueResNet with:

Default: width=64, blocks=8 (≈1–2M params)
Tunable via CLI: --width and --blocks

Model Output

The PolicyValueResNet model outputs both policy and value:

Policy: [B, 4672] logits - move probabilities
Value: [B, 1] in range [-1, 1] - position evaluation from current player's perspective
- +1.0 = current player is winning
- -1.0 = current player is losing
- 0.0 = draw

PUCT Search

PUCT (Predictor + UCT) search uses:

Policy priors: From neural network policy head
Value estimates: From neural network value head (replaces material heuristic)
MCTS: Monte Carlo Tree Search with UCB formula

The search is configured via:

--sims: Number of simulations (default: 200-400)
--c_puct: Exploration constant (default: 1.2)
use_nn_value: Use neural network value (default: True)

Inference

Python API (single move)

Run the model directly from Python using the helpers in chess_policy:

import chess
import torch

from chess_policy.train import load_checkpoint
from chess_policy.infer import choose_move

device = "cuda" if torch.cuda.is_available() else "cpu"
model = load_checkpoint("models/best.pt")  # or your .pt file

board = chess.Board()  # start position (or set FEN)
move, probs, value = choose_move(board, model, device=device, temperature=1.0, sample=False)

print("Chosen move:", move.uci() if move else None)
print("Position value (current side):", f"{value:+.3f}")

choose_move takes care of:

Encoding the board to [1, 18, 8, 8]
Running the model (policy-only or policy+value)
Masking illegal moves and sampling/argmax over the 4672 logits

Quickstart

1. Train on PGN Games (Policy + Value)

Train a model with both policy and value heads on PGN game data:

poetry run python scripts/train_policy.py \
    games.pgn \
    --epochs 10 \
    --batch_size 256 \
    --width 64 \
    --blocks 8 \
    --lr 3e-4 \
    --value_loss_weight 1.0 \
    --out policy_value.pt

Key Parameters:

--value_loss_weight: Weight for value loss vs policy loss (default: 1.0)
The model automatically extracts value targets from PGN game results
All positions from a winning game get value +1.0 (from winner's perspective)

2. Run UCI Engine with PUCT Search

poetry run python scripts/play_uci.py \
    --model policy_value.pt \
    --puct \
    --sims 400 \
    --c_puct 1.2

The engine uses:

Neural network policy for move priors
Neural network value for position evaluation
PUCT search for move selection

UCI Engine

The engine supports two modes:

Greedy Policy: Direct move selection from policy (fast, weaker)
```
poetry run python scripts/play_uci.py --model policy_value.pt
```
PUCT Search: Monte Carlo Tree Search with neural network guidance (slower, stronger)
```
poetry run python scripts/play_uci.py --model policy_value.pt --puct --sims 400
```

PUCT Parameters:

--sims: Number of search simulations (more = stronger but slower)
--c_puct: Exploration constant (higher = more exploration)
--use_nn_value: Use neural network value (default: True)

UCI Commands:

uci → Engine info
isready → Check readiness
position startpos moves e2e4 e7e5 → Set position
go → Get best move
quit → Exit

Training Details

Value Head Learning

The value head learns from PGN game results:

Extracts game outcome from PGN headers ([Result "1-0"], etc.)
Converts to value target from each position's perspective:
- If White won: +1.0 when White to move, -1.0 when Black to move
- If Black won: -1.0 when White to move, +1.0 when Black to move
- If draw: 0.0 for both sides
Trained with MSE loss: value_loss = MSE(value_pred, value_target)

Combined Training

Total loss combines policy and value:

loss = policy_loss + value_loss_weight * value_loss

Recommended Hyperparameters:

Model: width=64-128, blocks=8-12 (~1-3M params)
Optimizer: AdamW, lr=2e-4 to 3e-4
Batch size: 256-512
Value loss weight: 0.5-2.0 (default: 1.0)
Label smoothing: 0.0-0.1
PUCT: sims=200-800, c_puct=1.0-1.5

Limitations

Value Targets: Uses final game outcome for all positions (simplified but effective)
Search Depth: PUCT uses limited simulations compared to strong engines
No Self-Play: Currently supervised learning only (self-play coming soon)
Estimated Strength: ~1200-1800 Elo (depends on training data and model size)

See TRAINING_GUIDE.md for comprehensive documentation.

License

MIT (or adjust as you prefer)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
scripts		scripts
src/chess_policy		src/chess_policy
supervised		supervised
tests		tests
.gitignore		.gitignore
FIX_OPENING_PLAY.md		FIX_OPENING_PLAY.md
README.md		README.md
distill_stockfish_949.sh		distill_stockfish_949.sh
opening_book.pkl		opening_book.pkl
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run_uci_cutechess.sh		run_uci_cutechess.sh
slurm_batch_job.sh		slurm_batch_job.sh
train_small_dataset_with_cache.sh		train_small_dataset_with_cache.sh
train_with_cache_safe.sh		train_with_cache_safe.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chess Policy Engine (AlphaZero-Style) Hello

Features

Install

Project Layout

Architecture

Network Architecture

Model Output

PUCT Search

Inference

Python API (single move)

Quickstart

1. Train on PGN Games (Policy + Value)

2. Run UCI Engine with PUCT Search

UCI Engine

Training Details

Value Head Learning

Combined Training

Limitations

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

brianzheng206/chess-engine

Folders and files

Latest commit

History

Repository files navigation

Chess Policy Engine (AlphaZero-Style) Hello

Features

Install

Project Layout

Architecture

Network Architecture

Model Output

PUCT Search

Inference

Python API (single move)

Quickstart

1. Train on PGN Games (Policy + Value)

2. Run UCI Engine with PUCT Search

UCI Engine

Training Details

Value Head Learning

Combined Training

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages