Course project for Advanced Foundations of Machine Learning
Clone the repository
$ git clone https://github.com/SujayKarpur/AFML-project.git
$ cd AFML-projectSet up and activate a python virtual environment
$ python3 -m venv .venv
$ source .venv/bin/activateSet up libraries and packages:
$ pip install -r requirements.txt
$ pip install -U pip setuptools wheel
$ pip install -e .As AI systems are deployed in high-stakes domains from healthcare to autonomous systems, understanding how they make decisions is critical to ensure safety and reliability.
Though recent breakthroughs in mechanistic interpretability have made progress on understanding language models and vision systems, reinforcement learning agents remain particularly opaque due to their complex internal strategies emerging from environmental interaction.
This project aims to bridge some of that gap by exploring interpretability techniques for AlphaZero-style agents on toy games, which we believe are a great starting point as they provide a tractable testbed where we can validate our approaches against known ground truth before scaling up.
While we work with simplified domains, our goal is to develop rigorous methods for understanding RL decision-making that can eventually generalize to more complex systems.
Reinforcement Learning Course by David Silver
AlphaZero
AlphaZero Implementation tutorial
Introduction to Mechanistic Interpretability
Acquisition of Chess Knowledge in AlphaZero
Project Idea from 200 Concrete Open Problems in Mechanistic Interpretability (8.3)