Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

BRO Paper; NeurIPS'24 (spotlight)
Project website

BRO - Short info

Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance. BRO achieves state-of-the-art results, significantly outperforming the leading model-based and model-free algorithms across 40 complex tasks from the DeepMind Control, MetaWorld, and MyoSuite benchmarks. BRO is the first model-free algorithm to achieve near-optimal policies in the notoriously challenging Dog and Humanoid tasks.

Our implementation of the BRO algorithm. The codebase is heavily inspired by JaxRL and Parallel JaxRL.

Examples

updates_per_step is the most important parameter, determines how many updates are performed per environment step. It controls a trade-off between sample efficiency and computational cost. We propose updates_per_step=10 as the default, as we found the higher values to bring marginal gains.

BRO with default settings (updates_per_step=10):

python3 train_parallel.py --benchmark=dmc --env_name=dog-run

BRO (fast); in many cases, setting lower value updates_per_step=2 bring already excellent performance, while being much faster.

python3 train_parallel.py --benchmark=dmc --env_name=dog-run --updates_per_step=2

Installation

To install the dependencies for the DMC experiments, run pip install -r requirements_dmc.txt. Since MyoSuite requires an older version of Gym, we recommend installing it in a separate environment than DMC.

If your systems throws an error while installing Gym, then before installing requirements_dmc.txt run the following lines:

pip install setuptools==65

pip install wheel==0.38.4

pip install pip==24.0

Other branches and related repos

A torch version -- an educational implementation of BRO in torch.
An implementation in Stable Baselines -- version compliant with SBX.

Citation

@inproceedings{
nauman2024bigger,
title={Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control},
author={Michal Nauman and Mateusz Ostaszewski and Krzysztof Jankowski and Piotr Miłoś and Marek Cygan},
booktitle={Advances in Neural Information Processing Systems},
year={2024},
url={https://arxiv.org/pdf/2405.16158},
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
jaxrl		jaxrl
results		results
scripts/base		scripts/base
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bro_results.png		bro_results.png
requirements_dmc.txt		requirements_dmc.txt
train_parallel.py		train_parallel.py
trainingdata.zip		trainingdata.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

BRO - Short info

Examples

Installation

Other branches and related repos

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

naumix/BiggerRegularizedOptimistic

Folders and files

Latest commit

History

Repository files navigation

Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

BRO - Short info

Examples

Installation

Other branches and related repos

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages