Skip to content

Hybrid PPO implementation using stable-baselines3 and benchmarked in the Gymnasium-Hybrid standard environment. ๐Ÿ˜˜

Notifications You must be signed in to change notification settings

Jordan-Haidee/sb3-hppo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SB3-HPPO

Hybrid PPO implementation using stable-baselines3 and benchmarked in the Gymnasium-Hybrid standard environment.

Test demo:

SB3-HPPO in Moving-v0 โฌ‡ SB3-HPPO in Sliding-v0 โฌ‡
Moving-v0 Sliding-v0

Reward curve: reward_curve

Usage

1. Clone this repo

git clone https://github.com/Jordan-Haidee/sb3-hppo.git
cd path/to/sb3ppo

2. Install dependencies

uv sync # recommended
# or `pip install requirements.txt`

3. Run training

$ python train.py --help
usage: train.py [-h] [OPTIONS]

โ•ญโ”€ options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ -h, --help              show this help message and exit                                                          โ”‚
โ”‚ --env STR               env id from gymnasium_hybrid (Moving-v0 / Sliding-v0 / HardMove-v0) (default: Moving-v0) โ”‚
โ”‚ --n-envs INT            number of parallel environments (default: 8)                                             โ”‚
โ”‚ --seed INT              random seed (default: 42)                                                                โ”‚
โ”‚ --save-path {None}|PATH                                                                                          โ”‚
โ”‚                         path to save model and logs (default: None)                                              โ”‚
โ”‚ --total-timesteps INT   total timesteps to train (default: 5000000)                                              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Example:

python train.py --env Moving-v0 # or Sliding-v0, HardMove-v0

The trained policy and tensorboard log will be saved at output/sb3hppo_xxx/model.zip and output/sb3hppo_xxx/tb_log, respectively.

4. Test your policy

$ python .\test.py --help
usage: test.py [-h] [OPTIONS]

โ•ญโ”€ options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ -h, --help              show this help message and exit                                                โ”‚
โ”‚ --env STR               Env id from gymnasium_hybrid (Moving-v0 / Sliding-v0 / HardMove-v0) (required) โ”‚
โ”‚ --ckpt PATH             Path to the checkpoint file (*.zip) (required)                                 โ”‚
โ”‚ --render, --no-render   Whether to render the environment (default: False)                             โ”‚
โ”‚ --save-video {None}|PATH                                                                               โ”‚
โ”‚                         Path to save the video (None to disable) (default: None)                       โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Example:

python test.py --env "Moving-v0" --ckpt output/sb3hppo_Moving-v0_20250515_114301/model.zip --render

Acknowledgement

Thanks to @wild-firefox and @CAI23sbP ! This repo heavily depends on their preceding works:

About

Hybrid PPO implementation using stable-baselines3 and benchmarked in the Gymnasium-Hybrid standard environment. ๐Ÿ˜˜

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages