Proximal Policy Optimization on Super Mario Bros

This repository provides an implementation of Proximal Policy Optimization (PPO) applied to the classic Super Mario Bros environments. It enables training and evaluating reinforcement learning agents that learn to play Mario using PyTorch.

📖 For a brief introduction to PPO, see docs/PPO.md.

Getting Started

Step 1. Set up the environment

We recommend creating a virtual environment (e.g., venv):

python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt

If you plan to use CUDA for acceleration, make sure to install the appropriate PyTorch build for your GPU and CUDA toolkit.

Step 2. Train on Stage 1-1

To train an agent on Stage 1-1:

python train.py --name test1-1 --world 1 --stage 1 --device cuda:0 --version 0 --frame_size 64

The --version flag specifies the environment mode (see gym-super-mario-bros for details).
Training logs are stored in TensorBoard. To monitor progress:

tensorboard --logdir ./experiments/test1-1/runs

Step 3. Run Inference

After training, you can evaluate the trained agent:

python test.py --name test1-1 --ckpt best_model

This opens a display window so you can watch your agent play Mario in real time.

Results (Stage 1-1)

We provide a trained checkpoint and TensorBoard logs for Stage 1-1. The following metrics are visualized: average return, clipped surrogate objective, and critic loss.

The gameplay result from the trained agent is shown at the top of this README. (He can clear Stage 1-1 faster than me!)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
config		config
docs		docs
env_mario		env_mario
experiments/test1-1		experiments/test1-1
networks		networks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Proximal Policy Optimization on Super Mario Bros

Getting Started

Step 1. Set up the environment

Step 2. Train on Stage 1-1

Step 3. Run Inference

Results (Stage 1-1)

About

Uh oh!

Releases

Packages

Languages

License

BSP36/ppo_mario

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization on Super Mario Bros

Getting Started

Step 1. Set up the environment

Step 2. Train on Stage 1-1

Step 3. Run Inference

Results (Stage 1-1)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages