Skip to content

SujayKarpur/inside-zero

Repository files navigation

Interpreting AlphaZero using Toy Games

Course project for Advanced Foundations of Machine Learning

Usage

Clone the repository

$ git clone https://github.com/SujayKarpur/AFML-project.git
$ cd AFML-project

Set up and activate a python virtual environment

$ python3 -m venv .venv
$ source .venv/bin/activate

Set up libraries and packages:

$ pip install -r requirements.txt 
$ pip install -U pip setuptools wheel
$ pip install -e .

Motivation

As AI systems are deployed in high-stakes domains from healthcare to autonomous systems, understanding how they make decisions is critical to ensure safety and reliability.

Though recent breakthroughs in mechanistic interpretability have made progress on understanding language models and vision systems, reinforcement learning agents remain particularly opaque due to their complex internal strategies emerging from environmental interaction.

This project aims to bridge some of that gap by exploring interpretability techniques for AlphaZero-style agents on toy games, which we believe are a great starting point as they provide a tractable testbed where we can validate our approaches against known ground truth before scaling up.
While we work with simplified domains, our goal is to develop rigorous methods for understanding RL decision-making that can eventually generalize to more complex systems.

References

Reinforcement Learning Course by David Silver
AlphaZero
AlphaZero Implementation tutorial
Introduction to Mechanistic Interpretability
Acquisition of Chess Knowledge in AlphaZero

Project Idea from 200 Concrete Open Problems in Mechanistic Interpretability (8.3)

About

Interpreting AlphaZero using toy games

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •