Reinforcement Learning with Car Racing Environment

This repository contains code for training a reinforcement learning agent using the Car Racing environment from Gymnasium (formerly OpenAI Gym). It uses Proximal Policy Optimization (PPO) from Stable Baselines3 for training the agent. PPO is compared to POEM, our adaptation to PPO.

Installation Guide

1. Install Anaconda and Create Environment

Download and install Anaconda or Miniconda
Open Anaconda Prompt (Windows) or Terminal (Mac/Linux)

Create a new conda environment:

conda create -n car-racing python=3.10
conda activate car-racing

2. Set Up Environment Dependencies

Install Microsoft Build Tools (Windows users only):
- Download and install from Visual Studio Build Tools
- During installation, select "C++ build tools" or "Desktop development with C++"

Install required packages:

pip install gymnasium[classic-control]
pip install swig
pip install gymnasium[box2d]
pip install stable-baselines3

3. Clone and Set Up Repository

Clone this repository:

git clone [your-repository-url]
cd [repository-name]

Ensure your conda environment is active:
```
conda activate car-racing
```

4. Running the Code

For manual control of the vehicle:
```
python car_racing.py
```
To train the RL agents specify the model and set TRAIN to True. To evealuate an existing model, set the path and TRAIN to False:
```
python training.py
```

5. Important: Environment File Replacement

The training algorithm relies on a modified version of the Car Racing environment. You need to:

Locate the original environment file in your conda environment:

[conda-path]/envs/car-racing/Lib/site-packages/gymnasium/envs/box2d/car_racing.py

Replace it with the custom car_racing.py file from this repository.

Note: This manual replacement is required for the training algorithm to work correctly. We're working on a more elegant solution for future updates.

6. Training a Model

Each environment has an associated training script named using the format: _.py For example, to run PPO on the CarRacing environment:python car_racing_ppo.py

Model Creation A model is initialized using either PPO or POEM. POEM is an extension of Stable Baselines3's PPO implementation, with five additional hyperparameters based on the research paper. These hyperparameters have been tuned, but users can modify them as
Training The model is trained for a set number of timesteps, defined by the TIMESTEPS variable
Initial Evaluation After training, the model is evaluated on 10 randomly generated episodes. Visual feedback is provided to help debug and analyze agent behavior. Note: Deterministic evaluation and reward data collection are handled by a separate script.
Running Pretrained Models If a model has already been trained, you can view it by setting TRAIN = False. This will load the saved model and run it in random environments for visual inspection and debugging
Model Storage Trained models are saved in the trained_models/ directory. These saved models are later used by the evaluate_model script for deterministic evaluation and performance comparison. This folder also stores the Tensorboard logs showing the trianing progress.

7. Evaluating Model

To evaluate a model you need both a PPO and POEM model trained for the same environment script is ran using format: evaluate_model.py --env with an optional --human flag if the user wants to visualize what is happening.

POEM/PPO folders Containg respective Training performance, Stepwise reward and average action space graphs along with CSV files for the models indiviual performance.
Comparison Graphs In the result directory there will be two graphs showing how the models compared to one another in the frequency specific actions and a comprison of total rewards.
T test results
Contains a text file with the T Test and P values of the two sets of rewards during training.
Methodology Each algorithm is evaluated using a preseeded environment the number of enviorments a model is tested on is determined by the valriable LONG_TRAINING_EVAL_EPISODES

8. Hyper Parameter Tuning

There are two ways to hyper parameter tune we have provided a grid search and HPO tuning

Grid Search Done by a script called tuning_grid_search which will iterate through the hyper parameters and find the best combination. GRIDSEARCH_TIMESTEP determines how long a model is trianed before evaluated and GRIDSEARCH_EVAL_EPISODES determines on how many environments a model is ran on to determine its average reward. Once ran the data will be saved in the folder tuning_grid_search and the script tuning_review_grid_search will display which model had the best hyper parameters.

To tune hyperparameters for POEM and PPO using Grid Search

  python tuning.py

To review results of tuning hyperparameters using Grid Search

  python tuning_review.py

HPO Done through optuna this utalizes a HPO algorithm to approximate the optimal hyper parameters. To use this script use the following syntax python tuning_HPO.py --model <model_type>--env --trials --timestep . This will gerneate a folder in tuning_hpo directory containg graphs showing which hyper parameters where most critical and graphing which hyperparameters resulted in the best rewards. In addition a text file will be provided listing the optimal set of hyper parameters.

Tutorial

For a tutorial on setting up project (https://www.youtube.com/watch?v=gMgj4pSHLww&ab_channel=JohnnyCode)

Acknowledgements

Gymnasium (formerly OpenAI Gym) for the Car Racing environment
Stable Baselines3 for the PPO implementation
Optuna for Hyperparameter tuning

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
__pycache__		__pycache__
logs		logs
modified_gym_envs		modified_gym_envs
results		results
trained_models		trained_models
tuning_hpo		tuning_hpo
.gitignore		.gitignore
POEM_rewards.csv		POEM_rewards.csv
README.md		README.md
bipedal_poem.py		bipedal_poem.py
bipedal_ppo.py		bipedal_ppo.py
car_racing_poem.py		car_racing_poem.py
car_racing_ppo.py		car_racing_ppo.py
env_comp.py		env_comp.py
evaluate_model.py		evaluate_model.py
frame.png		frame.png
lunar_lander_poem.py		lunar_lander_poem.py
lunar_lander_ppo.py		lunar_lander_ppo.py
mountain_cart_poem.py		mountain_cart_poem.py
mountain_cart_ppo.py		mountain_cart_ppo.py
poem_model.py		poem_model.py
python		python
tuning_HPO.py		tuning_HPO.py
tuning_grid_search.py		tuning_grid_search.py
tuning_review_grid_search.py		tuning_review_grid_search.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning with Car Racing Environment

Installation Guide

1. Install Anaconda and Create Environment

2. Set Up Environment Dependencies

3. Clone and Set Up Repository

4. Running the Code

5. Important: Environment File Replacement

6. Training a Model

7. Evaluating Model

8. Hyper Parameter Tuning

Tutorial

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

StephenHornish/POEM

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning with Car Racing Environment

Installation Guide

1. Install Anaconda and Create Environment

2. Set Up Environment Dependencies

3. Clone and Set Up Repository

4. Running the Code

5. Important: Environment File Replacement

6. Training a Model

7. Evaluating Model

8. Hyper Parameter Tuning

Tutorial

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages