The Wasserstein Believer

Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

ICLR 2024 paper: https://openreview.net/forum?id=KrtGfTGaGe

Wasserstein Belief Updater (WBU) is an RNN free RL algorithm for POMDPs that learns a representation of the history via an approximation of the belief update in a reliable latent space model, providing theoretical guarantees for learning the optimal value.

This work concerns agents learning how to behave, i.e., their control policy, through reinforcement learning (RL). In real-world scenarios, the environment's state is very often perceived either through noisy sensors, cameras, or more geneally imperfect observations (e.g., visual observation vs. exact coordinates on a map). In that case, the observation is non-Markovian and the environment is partially observable. This usually leads to complications compared to theoretical perfect-observation RL (i.e., with Markovian observation). For optimal decision making, the agent must in that case base its decision either on (a) the full observation-action history, or (b) the distribution over the possible real states of the environment in which the agent could be at each time step. The latter is called the belief of the agent and is a sufficient statistic to optimize the agent's return.

The easiest method to tackle partial observability is to process the full history through an RNN to obain a compressed hidden state that can be fed to the policy of the learning agent. While appealing, RNNs don't yield any guarantee that the representation learned is actually useful (a sufficient statistic) to optimize the agent's return.

With WBU, we rather propose to learn a representation of the belief. Belief learning is difficult in RL because (1) the dynamics of the environments must be known to exactly compute the belief, and (2) it does not scale as it requires to integrate over the full state space (usually intractable). To tackle those challenges, WBU

learns a world model, through Wasserstein auto-encoded MDPs. This model comes with theoretical abstraction quality guarantees. It is learned through discrete latent spaces which eases the computation of the belief through the latent space.
minimizes the discrepancy between the theoretical belief update rule and the latent belief computed. This yields theoretical representation quality guarantees: close points in the representation space of the beliefs are guaranteed to yield close expected returns (Lipschitz continuity). This guarantees to support policy learning.

Installation

Warning: This code was tested on a linux environment with Python3.9

python3.9 -m venv venv
source venv/bin/activate
pip install --no-deps -r requirements.txt
pip install --no-deps -e modules/popgym
pip install --no-deps -e modules/POMinAtar
pip install --no-deps -e modules/bwu

Usage

Repeat Previous

python run.py --config modules/bwu/belief_learner/config/repeat_previous.toml

Stateless Cartpole

python run.py --config modules/bwu/belief_learner/config/stateless_cartpole.toml

Noisy Stateless Cartpole

python run.py --config modules/bwu/belief_learner/config/noisy_stateless_cartpole.toml

Space Invaders

python run.py --config modules/bwu/belief_learner/config/space_invaders.toml --timesteps 2e6

Noisy Space Invaders

python run.py --config modules/bwu/belief_learner/config/noisy_space_invaders.toml --timesteps 2e6

Results

Cite

If you use this code, please cite it as:

@inproceedings{
avalosdelgrange2024wbu,
title={The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models},
author={Rapha{\"e}l Avalos and Florent Delgrange and Ann Nowe and Guillermo Perez and Diederik M Roijers},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=KrtGfTGaGe}
}

Acknowledgements

MinAtar: https://github.com/kenjyoung/MinAtar
PopGym: https://github.com/proroklab/popgym

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
modules		modules
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
results.png		results.png
run.py		run.py
wbu.png		wbu.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Wasserstein Believer

Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Installation

Usage

Repeat Previous

Stateless Cartpole

Noisy Stateless Cartpole

Space Invaders

Noisy Space Invaders

Results

Cite

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

raphaelavalos/wbu

Folders and files

Latest commit

History

Repository files navigation

The Wasserstein Believer

Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Installation

Usage

Repeat Previous

Stateless Cartpole

Noisy Stateless Cartpole

Space Invaders

Noisy Space Invaders

Results

Cite

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages