Causal Reinforcement Learning Baselines

This is the official repository for Causal Reinforcement Learnining (CRL) baseline algorithms built on top of CausalGym. The package exposes causal-aware bandit and sequential algorithms, data-processing utilities, and environment wrappers that embrace Structural Causal Models and Pearl’s Causal Hierarchy (see, do, ctf_do).

Installation

Please install CausalGym first following this link.

Then, install this package via,

pip install -e .

The editable install pulls in the dependencies declared in setup.py (Gymnasium, Minigrid, MuJoCo Robot Envs, etc.). You will also need the causal_gym package that defines the environments referenced by the algorithms and wrappers.

Repository Layout

causalrl/
├── causal_rl/                # Python package with algorithms and wrappers
│   ├── algo/
│   │   ├── baselines/        # Standard baselines algorithms (UCB, RCT, IPW) for comparison
│   │   ├── cool/             # Causal Offline to Online Learning (COOL)
│   │   ├── ctf_do/           # Counterfactual Decision Making 
│   │   ├── imitation/        # (Sequential) Causal Imitation Learning
│   │   ├── reward_shaping/   # Confounding Robust Reward Shaping
│   │   └── where_do/         # Where to Intervene
│   └── wrappers/             # Gymnasium wrappers 
├── examples/                 # Jupyter notebooks of each task
│   ├── baselines/            
│   ├── cool/                 
│   ├── ctf_do/               
│   ├── imitation/            
│   ├── reward_shaping/       
│   └── where_do/             
├── setup.py                  # Packaging metadata and dependency pins
└── README.md                 # (this file)

The causal_rl package is deliberately thin: algorithms expect an environment that follows the causal_gym.core.PCH API (exposes reset, see, do, ctf_do, get_graph, etc.). The notebooks under examples/ demonstrate how to apply those algorithms to environments such as windy CartPole, Lava grid world, etc..

Quick Start: Interacting with a CausalGym environment

import causal_gym as cgym
from causal_gym.core.task import Task, LearningRegime

# 1. Configure the environment to allow see + do + ctf_do
task = Task(learning_regime=LearningRegime.ctf_do)
env = cgym.envs.CartPoleWindPCH(task=task, wind_std=0.05)

observation, info = env.reset(seed=0)

# Observe the behaviour (Level 1: see)
obs, reward, terminated, truncated, info = env.see()
natural_action = info.get("natural_action")

# Intervene with your own policy (Level 2: do)
def greedy_push_right(observation):
    return 1  # action index (env-specific)

obs, reward, terminated, truncated, info = env.do(greedy_push_right)

# Counterfactual action (Level 3: ctf_do)
def counterfactual_policy(observation, natural_action):
    # invert the behaviour policy: push left if it intends right
    return 0 if natural_action == 1 else natural_action

obs, reward, terminated, truncated, info = env.ctf_do(counterfactual_policy)

env.close()

To train/evaluate a causal RL agent, collect data via the PCH interface. You can refer to examples/ for more detailed usages of each supported algorithm.

Algorithms

We organise the codebase using the causal decision-making tasks from Causal Artificial Intelligence (Tasks 1–10). The table lists implemented tasks along with references to the corresponding sections or papers.

Task (ID)	Learning Regime	Modules	Highlights	Reference
Off-policy Learning (1)	`see`	`causal_rl/algo/baselines/ipw.py`	Inverse propensity weighting for off-policy evaluation using observational trajectories collected via `see()`.	CAI Book §8.2
Online Learning (2)	`do`	`causal_rl/algo/baselines/ucb.py`, `causal_rl/algo/baselines/rct.py`	Online learners with UCB exploration and an RCT baseline with an interventinal access in causal environments.	CAI Book §8.2
Causal Identification (3)	`see`	CAI textbook Code Companion	Graphical identification procedures (front-/back-door, transport) from the companion repository.	CAI Book §8.2
Causal Offline-to-Online Learning (4)	`see + do`	`causal_rl/algo/cool/cool.py`	COOL algorithms that warm-start UCB using observaitonal data contaminated with confounding bias.	CAI Book §9.2
Where to Do & What to Look For (5)	`do`	`causal_rl/algo/where_do/`	`WhereDo` solver to locate minimal intervention sets and interventional borders on DAG SCMs.	CAI Book §9.3
Counterfactual Decision Making (6)	`ctf_do`	`causal_rl/algo/ctf_do/`	Counterfactual UCB variants (`UCBVI`, `UCBQ`, `CtfUCB`) maintaining optimistic estimates over intended vs. executed actions.	CAI Book §9.4
Causal Imitation Learning (7)	`see`	`causal_rl/algo/imitation/`	Sequential π-backdoor criterion, expert dataset utilities, and GAN-based policy learners under causal assumptions.	CAI Book §9.5
Causally Aligned Curriculum Learning (8)	`do`	To Be Implemented	Planned curricula that coordinate interventions across SCM families.	ICLR 2024
Reward Shaping (9)	`see + do`	`causal_rl/algo/reward_shaping/`	Optimistic shaping and offline value bounds for Windy MiniGrid-style SCMs.	ICML 2025
Causal Game Theory (10)	`do`	To Be Implemented	Placeholder for causal game-theoretic solvers operating over SCMs.	Tech Report

Tasks 8 and 10 are planned additions. Links point to the corresponding book sections or research papers.

Examples

Each subdirectory in examples/ contains a notebook (plus occasional figures) that walks through a causal RL task:

Task 1 – Off-policy Learning: examples/baselines/test_ipw.ipynb runs inverse propensity weighting for observational evaluation using logged trajectories.
Task 2 – Online Learning: examples/baselines/test_{rct,ucb}.ipynb benchmark UCB and RCT-style learners on bandits.
Task 4 – Causal Offline-to-Online: examples/cool/test_cool.ipynb contrasts causal offline-to-online algorithms with standard online learners starting from scratch in confounded bandits.
Task 5 – Where to Do / What to Look For: examples/where_do/test_where_do_bookexamples.ipynb reproduces the Chapter 9 exercises using the WhereDo solver.
Task 6 – Counterfactual Decision Making: examples/ctf_do/test_ctf_do_cartpole.ipynb trains UCBVI and UCBQ on a windy CartPole SCM, visualising regret curves and policy snapshots.
Task 7 – Causal Imitation Learning: examples/imitation/test_race_imitation.ipynb applies sequential causal imatitability checks to ensure learned policies generalise under interventions.
Task 9 – Reward Shaping: examples/reward_shaping/test_reward_shaping_{lavacross,robotwalk}.ipynb benchmarks optimistic shaping strategies on Windy MiniGrid and RobotWalk.

The notebooks assume you have jupyter installed and that causal_gym environments are available. Visual assets (.png, .gif) illustrate policy trajectories and causal diagrams.

Core Concepts Explained

See this section from our CausalGym repo.

Working with CausalGym

For an exhaustive tour of available environments, graph semantics, and task configuration, see CausalGym for details on:

How SCMs are constructed (endogenous / exogenous variables, structural equations, and causal graphs).
The meaning of Task and LearningRegime objects (e.g., see_do, do_only, ctf_do).
Domain-specific details for each registered environment (CartPole, LunarLander, Highway, Windy MiniGrid, bandits, etc.).

The algorithms and wrappers in this repository build directly on those abstractions—use the same Task regime for both the environment and the learning procedure to guarantee that the available data aligns with the assumptions of your causal RL method.

Contributing & Collaboration

Feel free to open issues or pull requests if you have new causal RL algorithms, environments, or experiment example notebooks. Please adhere to the SCM-PCH interface so they remain compatible with the broader CausalGym ecosystem.

We are also happy to engage with passionate research/engineering interns throughout the year on a rolling basis. If you are interested, fill this form to kick start your application to Causal Artificial Intelligence Lab @ Columbia Univeristy.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
causal_rl		causal_rl
examples		examples
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tasks.png		tasks.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Causal Reinforcement Learning Baselines

Installation

Repository Layout

Quick Start: Interacting with a CausalGym environment

Algorithms

Examples

Core Concepts Explained

Working with CausalGym

Contributing & Collaboration

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

CausalAILab/Causal-RL

Folders and files

Latest commit

History

Repository files navigation

Causal Reinforcement Learning Baselines

Installation

Repository Layout

Quick Start: Interacting with a CausalGym environment

Algorithms

Examples

Core Concepts Explained

Working with CausalGym

Contributing & Collaboration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages