Illustrated AlphaZero

A Python implementation of the AlphaZero algorithm for board games, with detailed illustrations and explanations.

TicTacToe (3×3)

Connect Four (7×6)

Overview

AlphaZero is a groundbreaking algorithm that combines Monte Carlo Tree Search (MCTS) with deep neural networks to achieve superhuman performance in board games. This project provides a detailed implementation of AlphaZero, including the core components and a user-friendly interface for interactive play.

This project implements the core concepts of the AlphaZero algorithm, featuring:

Monte Carlo Tree Search (MCTS)
Deep Neural Network Policy
Self-play Training
Board Game Environment (e.g., TicTacToe)

Project Structure

illustrated-alphazero/
├── src/
│   ├── agent.py        # MCTS Agent implementation
│   ├── config.py       # Configuration settings
│   ├── environment.py  # Game environment
│   ├── interact.py     # GUI for human interaction
│   ├── network.py      # Neural network architecture
│   ├── search.py       # Monte Carlo Tree Search
│   ├── transform.py    # Board state transformations
│   └── utils.py        # Utility functions
├── checkpoints/        # Model checkpoints
├── main.py             # Training entry point
├── playground.py       # Interactive game environment
└── README.md

Key Components

MCTS Search

Selection: Choose promising nodes using UCB scores
Expansion: Create child nodes for unexplored states
Evaluation: Use neural network to evaluate positions
Backpropagation: Update node statistics

Neural Network

Policy Head: Predicts move probabilities
Value Head: Estimates position value
Convolutional Features: Extracts board patterns

Self-play Training

Game simulation through MCTS
Data collection from self-play games
Neural network training with collected examples

Installation

git clone https://github.com/deepbiolab/illustrated-alphazero.git
cd illustrated-alphazero
pip install -r requirements.txt

Usage

Training AlphaZero

python main.py

The main.py script trains the AlphaZero model using the specified configuration. It simulates self-play games, collects data, and trains the neural network.

Playing Against Trained Model

python playground.py

The playground.py script provides a graphical interface for playing against the trained model. You can interact with the game using the provided GUI in different game modes:

Human vs Human: Two players take turns making moves.
Human vs Random: Play against a random AI opponent.
Human vs AI: Play against the trained AlphaZero model.
AI vs AI: Watch the trained AlphaZero model play against itself.

Available Game Modes:
1: Red: Human vs Blue: Human
2: Red: Human vs Blue: Random
3: Red: Human vs Blue: AI
4: Red: AI vs Blue: AI

Select game mode (1-4):

Configuration

Key parameters in config.py, here is an example for 3 x 3, 7 x 6 board.

Settings for `3 x 3` board and connect `3` to win

Environment settings

ENV_SETTINGS = {
	"size": (3, 3),  # Board size
	"N": 3,          # Number in a row to win
}

# Agent settings
LEARNING_RATE = 0.01
WEIGHT_DECAY = 1.0e-4

# Training settings
NUM_EPISODES = 400
SIMULATIONS_PER_MOVE = 100
EVAL_FREQUENCY = 50  # Episodes between evaluations

# MCTS settings
MCTS_SIMULATIONS = 50  # Number of MCTS simulations for action selection
TEMPERATURE = 0.1     # Temperature for move selection
...

Evaluating model performance during training

Settings for `7 x 6` board and connect `4` to win

Environment settings

ENV_SETTINGS = {
	"size": (7, 6),  # Board size
	"N": 4,          # Number in a row to win
}

# Agent settings
LEARNING_RATE = 0.01
WEIGHT_DECAY = 1.0e-4

# Training settings
NUM_EPISODES = 1000
SIMULATIONS_PER_MOVE = 150
EVAL_FREQUENCY = 50  # Episodes between evaluations

# MCTS settings
MCTS_SIMULATIONS = 150  # Number of MCTS simulations for action selection
TEMPERATURE = 0.2     # Temperature for move selection
...

Evaluating model performance during training

Blog

A detailed blog-style explanation of the AlphaZero algorithm can be found here.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by DeepMind's AlphaZero papers
Built with PyTorch deep learning framework

References

Silver, D., et al. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Silver, D., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Illustrated AlphaZero

Overview

Project Structure

Key Components

MCTS Search

Neural Network

Self-play Training

Installation

Usage

Training AlphaZero

Playing Against Trained Model

Configuration

Settings for `3 x 3` board and connect `3` to win

Settings for `7 x 6` board and connect `4` to win

Blog

Contributing

License

Acknowledgments

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
blog		blog
checkpoints		checkpoints
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
playground.py		playground.py
requirements.txt		requirements.txt

License

deepbiolab/illustrated-alphazero

Folders and files

Latest commit

History

Repository files navigation

Illustrated AlphaZero

Overview

Project Structure

Key Components

MCTS Search

Neural Network

Self-play Training

Installation

Usage

Training AlphaZero

Playing Against Trained Model

Configuration

Settings for 3 x 3 board and connect 3 to win

Settings for 7 x 6 board and connect 4 to win

Blog

Contributing

License

Acknowledgments

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Settings for `3 x 3` board and connect `3` to win

Settings for `7 x 6` board and connect `4` to win

Packages