This repository provides an implementation of Proximal Policy Optimization (PPO) applied to the classic Super Mario Bros environments. It enables training and evaluating reinforcement learning agents that learn to play Mario using PyTorch.
📖 For a brief introduction to PPO, see docs/PPO.md.
We recommend creating a virtual environment (e.g., venv):
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txtIf you plan to use CUDA for acceleration, make sure to install the appropriate PyTorch build for your GPU and CUDA toolkit.
To train an agent on Stage 1-1:
python train.py --name test1-1 --world 1 --stage 1 --device cuda:0 --version 0 --frame_size 64- The
--versionflag specifies the environment mode (see gym-super-mario-bros for details). - Training logs are stored in TensorBoard. To monitor progress:
tensorboard --logdir ./experiments/test1-1/runsAfter training, you can evaluate the trained agent:
python test.py --name test1-1 --ckpt best_modelThis opens a display window so you can watch your agent play Mario in real time.
We provide a trained checkpoint and TensorBoard logs for Stage 1-1. The following metrics are visualized: average return, clipped surrogate objective, and critic loss.
The gameplay result from the trained agent is shown at the top of this README. (He can clear Stage 1-1 faster than me!)



