Simple 2D game created as a research challenge for Reinforcement Learning algorithms. Here is a blog post explaining the modelling and training processes in finer details.
Rules are pretty simple: Moving the hero ball around the 2D plane, hit all the Green Balls while avoiding the Red ones. The balls' initial coordinates and velocity vector are randomly generated at the start of each game. The balls bounce on each of the 4 sides of the environment.
The game terminates if:
- All Green balls have been hit
- One of the Red balls has been hit
- 1000 time steps have elapsed
Despite its simplicity, the game's environment has a highly dimensional observation space and a pretty sparse reward structure. Solving it efficiently required training a Proximal Policy Optimisation model preliminary calibratred using Behavioural Cloning of a heuristic based expert agent.
The PPO model comes from the excellent StableBaselines3 Python library and as it is required in that case, the game environment implements the Gymnasium API standard.
The whole training process can be reconstructed on your machine:
- Run
bc_training.pyto record the simulations of the expert agent actions on 128 game simulations and, run the Behavioural Cloning algorithm using a balanced loss function. - Run
train.pyto finalise the training of the PPO model on multiple online simulations and save versions of the trained model. - Run
compare_performance.pyto compare your RL model and the expert agent. If the RL model does not outperform the agent, go back to step 2. - Run
simulate.pyto see and record your trained agent in action.
The animated GIF at the top of the page shows simulations of a trained Deep RL model with the following performance statistics:
DRL model: 1.975 ± 1.076
Expert Agent: 1.462 ± 1.304
Generally the higher the number of training epochs, the more efficient the Deep RL agent becomes.
