Interpreting AlphaZero using Toy Games

Course project for Advanced Foundations of Machine Learning

Usage

Clone the repository

$ git clone https://github.com/SujayKarpur/AFML-project.git
$ cd AFML-project

Set up and activate a python virtual environment

$ python3 -m venv .venv
$ source .venv/bin/activate

Set up libraries and packages:

$ pip install -r requirements.txt 
$ pip install -U pip setuptools wheel
$ pip install -e .

Motivation

As AI systems are deployed in high-stakes domains from healthcare to autonomous systems, understanding how they make decisions is critical to ensure safety and reliability.

Though recent breakthroughs in mechanistic interpretability have made progress on understanding language models and vision systems, reinforcement learning agents remain particularly opaque due to their complex internal strategies emerging from environmental interaction.

This project aims to bridge some of that gap by exploring interpretability techniques for AlphaZero-style agents on toy games, which we believe are a great starting point as they provide a tractable testbed where we can validate our approaches against known ground truth before scaling up.
While we work with simplified domains, our goal is to develop rigorous methods for understanding RL decision-making that can eventually generalize to more complex systems.

References

Reinforcement Learning Course by David Silver
AlphaZero
AlphaZero Implementation tutorial
Introduction to Mechanistic Interpretability
Acquisition of Chess Knowledge in AlphaZero

Project Idea from 200 Concrete Open Problems in Mechanistic Interpretability (8.3)

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
assets/archive		assets/archive
deliverables		deliverables
experiments		experiments
games		games
interpretation		interpretation
picozero		picozero
.gitattributes		.gitattributes
.gitignore		.gitignore
PLAN_4week.md		PLAN_4week.md
README.md		README.md
interpretation.md		interpretation.md
playarena.py		playarena.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interpreting AlphaZero using Toy Games

Usage

Motivation

References

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

SujayKarpur/inside-zero

Folders and files

Latest commit

History

Repository files navigation

Interpreting AlphaZero using Toy Games

Usage

Motivation

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages