Skip to content

TOMWANGZZ1236/MTCS-Algorithm

Repository files navigation

♟️ AlphaZero Tic-Tac-Toe

tictactoe

An AlphaZero-inspired project that learns optimal Tic-Tac-Toe from scratch using self-play RL, Monte Carlo Tree Search (PUCT), and a ResNet policy-value network in PyTorch.


🚀 Overview

This mini-project reproduces the core ideas of DeepMind’s AlphaZero in a lightweight setting:

  • Self-Play — the agent improves by playing itself; no human labels needed.
  • MCTS (PUCT) — search balances exploration vs. exploitation using network priors.
  • Policy-Value Net (ResNet) — one network outputs move probabilities (policy) and win/loss estimate (value).

🧠 How it works (high-level)

  1. Encode state → tensor (e.g., 2 or 3 planes for current player, opponent, empties).
  2. Neural net forward pass → returns (policy_logits, value) for the position.
  3. MCTS uses these priors to guide search and produce an improved policy.
  4. Self-play game: sample moves from the MCTS policy; store (state, π, z) where π is the MCTS policy and z ∈ {-1, 0, +1} is the final outcome from the current player’s perspective.
  5. Train the network to fit π (policy head with cross-entropy) and z (value head with MSE).
  6. Repeat: newer nets generate stronger data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors