♟️ AlphaZero Tic-Tac-Toe

An AlphaZero-inspired project that learns optimal Tic-Tac-Toe from scratch using self-play RL, Monte Carlo Tree Search (PUCT), and a ResNet policy-value network in PyTorch.

🚀 Overview

This mini-project reproduces the core ideas of DeepMind’s AlphaZero in a lightweight setting:

Self-Play — the agent improves by playing itself; no human labels needed.
MCTS (PUCT) — search balances exploration vs. exploitation using network priors.
Policy-Value Net (ResNet) — one network outputs move probabilities (policy) and win/loss estimate (value).

🧠 How it works (high-level)

Encode state → tensor (e.g., 2 or 3 planes for current player, opponent, empties).
Neural net forward pass → returns (policy_logits, value) for the position.
MCTS uses these priors to guide search and produce an improved policy.
Self-play game: sample moves from the MCTS policy; store (state, π, z) where π is the MCTS policy and z ∈ {-1, 0, +1} is the final outcome from the current player’s perspective.
Train the network to fit π (policy head with cross-entropy) and z (value head with MSE).
Repeat: newer nets generate stronger data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
AlphaTrain.ipynb		AlphaTrain.ipynb
README.md		README.md
model_0.pt		model_0.pt
model_1.pt		model_1.pt
model_2.pt		model_2.pt
optimizer_0.pt		optimizer_0.pt
optimizer_1.pt		optimizer_1.pt
optimizer_2.pt		optimizer_2.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

♟️ AlphaZero Tic-Tac-Toe

🚀 Overview

🧠 How it works (high-level)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

♟️ AlphaZero Tic-Tac-Toe

🚀 Overview

🧠 How it works (high-level)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages