Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM

This repository is the official implementation of Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM.

Requirements

To install requirements:

pip install -r requirements.txt
pip install git+https://github.com/mila-iqia/atari-representation-learning.git
d3rlpy install d4rl

Install Mujoco: https://gist.github.com/saratrajput/60b1310fe9d9df664f9983b38b50d5da

Training

To collect the data from LLM, choose the model name and environment by editing the 'hyperparams' variable in llm_main.py. Then, run this command:

python llm_main.py

To pretrain the model(s) in the paper, choose the number of pretrain episode and number of pretrain steps by editing the 'hyperparams' variable (n_pretrain_eps, n_pretrain_steps) in pretrain_from_llm.py. Changing the file paths from the previous step in the 'get_llm_data_paths' funciton in pretrain_from_llm.py. Then, run this command:

python pretrain_from_llm.py

To fine-tune the RL algorithm on top of the pretrain models, after completing the previous two steps, choose the environment, the number of pretrain and online episodes by editing the 'hyperparams' variable in online_main.py and run this command:

python online_main.py

You can follow the same step above to collect on-policy (pure RL) data by running the on_policy_pretrain_exp.py file.

Evaluation

To visualize the results, use the visualization.ipynb notebook. You can run this directly to visualize the results shown in the paper.

Results

Our model achieves the following performance on six OpenAI Gym environments:

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM:

We investigate the usage of Large Language Model (LLM) in collecting high-quality data to warm-start Reinforcement Learning (RL) algorithms for learning in some classical Markov Decision Process (MDP) environments. In this work, we focus on using LLM to generate an off-policy dataset that sufficiently covers state-actions visited by optimal policies, then later using an RL algorithm to explore the environment and improve the policy suggested by the LLM. Our algorithm, LORO, can both converge to an optimal policy and have a high sample efficiency thanks to the LLM's good starting policy. On multiple OpenAI Gym environments, such as CartPole and Pendulum, we empirically demonstrate that LORO outperforms baseline algorithms such as pure LLM-based policies, pure RL, and a naive combination of the two, achieving up to four times the cumulative rewards of the pure RL baseline.

Contributing

Apache 2.0

Note

The code is referencing this repo. The environment descriptions are referenced from this repo.

Citation

Placeholder

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
data		data
env		env
examples		examples
figs		figs
llamagym		llamagym
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
coverage_vis.ipynb		coverage_vis.ipynb
llamagym.png		llamagym.png
llm_main.py		llm_main.py
main.sh		main.sh
on_policy_pretrain_exp_legacy.py		on_policy_pretrain_exp_legacy.py
online_main.py		online_main.py
poetry.lock		poetry.lock
pretrain_from_llm_legacy.py		pretrain_from_llm_legacy.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_all_main.sh		run_all_main.sh
run_all_mixed_pretraining.sh		run_all_mixed_pretraining.sh
run_mixed_pretraining.py		run_mixed_pretraining.py
utils.py		utils.py
vis_utils.py		vis_utils.py
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM

Requirements

Training

Evaluation

Results

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM:

Contributing

Note

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

duongnhatthang/LORO

Folders and files

Latest commit

History

Repository files navigation

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM

Requirements

Training

Evaluation

Results

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM:

Contributing

Note

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages