🇧🇷 Our paper has been accepted at ICLR 2026! 🇧🇷
-
Install uv: Download from https://github.com/astral-sh/uv or install via:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Create virtual environment:
uv venv
-
Install dependencies:
uv sync
-
Make scripts executable:
chmod +x run_atari_paper.sh run_procgen_paper.sh
-
Wandb tracking:
Replace
your-entityin all config files with your wandb entity name:find config -name "*.yaml" -exec sed -i 's/your-entity/YOUR_ENTITY_NAME/g' {} +
Replace
YOUR_ENTITY_NAMEwith your actual wandb entity.
- Atari experiments:
./run_atari_paper.sh - Procgen experiments:
./run_procgen_paper.sh
Contains configs for Atari benchmark with DDQN:
hyperpp.yaml- HYPER++ (ours): Hyperboloid model with RMSNorm, learned scaling, and categorical losshyper_paper.yaml- Hyper+S-RYM: Poincaré Ball with SpectralNorm and 1/√d scalingeuclidean.yaml- Euclidean baseline: Standard Euclidean representations
To run a specific config:
uv run run_ddqn.py -cd=config/atari_paper -cn=hyperpp experiment.gpu=0 env_id="NameThisGameNoFrameskip-v4"Atari-5 environments: NameThisGameNoFrameskip-v4, PhoenixNoFrameskip-v4, BattleZoneNoFrameskip-v4, QbertNoFrameskip-v4, DobleDunkNoFrameskip-v4
Contains configs for all 16 ProcGen environments with PPO:
hyperpp.yaml- HYPER++ (ours)hyper_paper.yaml- Hyper+S-RYMeuclidean_baseline.yaml- Euclidean baseline
To run a specific config:
uv run run_ppo.py -cd=config/procgen_paper -cn=hyperpp experiment.gpu=0 env_id=bigfishAvailable Procgen environments: bigfish, bossfight, caveflyer, chaser, climber, coinrun, dodgeball, fruitbot, heist, jumper, leaper, maze, miner, ninja, plunder, starpilot
Ablation studies for individual HYPER++ components:
procgen_hyperpp.yaml- Full HYPER++ (baseline for ablations)procgen_hyperpp_no_rms.yaml- Removing RMSNormprocgen_hyperpp_noscale.yaml- Removing learned scalingprocgen_hyperpp_nohlgauss.yaml- Using MSE instead of categorical lossprocgen_hyperpp_poincare.yaml- Using Poincaré Ball instead of Hyperboloid
uv run run_ppo.py -cd=config/ppo_ablations -cn=procgen_hyperpp_no_rms experiment.gpu=0 env_id=bigfishCommon parameters you might want to adjust:
Environment & GPU:
env_id=<name>- Change environment (see lists above)experiment.gpu=<id>- GPU device ID (e.g.,0,1,2, etc.)experiment.track=<bool>- Activate/deactivate wandb tracking
Training:
num_envs=<int>- Number of parallel environments (default: 64 for Procgen)total_timesteps=<int>- Training duration (default: 25M for Procgen, 10M for Atari)
If you use our code for your research, please cite our paper:
@article{klein2025hyperrl,
title={Understanding and Improving Hyperbolic Deep Reinforcement Learning},
author={Klein, Timo and Lang, Thomas and Shkabrii, Andrii and Sturm, Alexander and Sidak, Kevin and Miklautz, Lukas and Velaj, Yllka and Plant, Claudia and Tschiatschek, Sebastian},
journal={arXiv preprint arXiv:2512.14202},
year={2025}
}