PyTorch implementation of DreamerV3, Mastering Diverse Domains through World Models.
Clone GitHub repository and set up environment
git clone https://github.com/burchim/DreamerV3-PyTorch.git && cd DreamerV3-PyTorch
pip install -r requirements.txt
Train agent on a specific task:
env_name=dmc-Acrobot-swingup python3 main.py -c configs/DreamerV3/dreamer_v3.py
Train agent on all tasks:
./train_dreamerv3_dmc.sh
./train_dreamerv3_atari100k.sh
Visualize experiments
tensorboard --logdir ./callbacks
Overriding model config hyperparameters
override_config='{"num_envs": 4, "eval_episode_saving_path": "./videos"}' env_name=dmc-Acrobot-swingup python3 main.py -c configs/DreamerV3/dreamer_v3.py
env_name=dmc-Acrobot-swingup python3 main.py -c configs/DreamerV3/dreamer_v3.py --mode evaluation
We averaged the evaluation score over 10 episodes and used 3 seeds per experiment.
| Task | DreamerV3 | DreamerV3-PyTorch (this repo) |
|---|---|---|
| Env Steps | 1M | 1M |
| Acrobot Swingup | 210.0 | 410.8 |
| Cartpole Balance | 996.4 | 999.3 |
| Cartpole Balance Sparse | 1000.0 | 1000.0 |
| Cartpole Swingup | 819.1 | 865.1 |
| Cartpole Swingup Sparse | 792.9 | 525.6 |
| Cheetah Run | 728.7 | 886.6 |
| Cup Catch | 957.1 | 741.1 |
| Finger Spin | 818.5 | 547.6 |
| Finger Turn Easy | 787.7 | 819.4 |
| Finger Turn Hard | 810.8 | 832.2 |
| Hopper Hop | 369.6 | 369.7 |
| Hopper Stand | 900.6 | 944.6 |
| Pendulum Swingup | 806.3 | 791.8 |
| Quadruped Run | 352.3 | 683.7 |
| Quadruped Walk | 352.6 | 733.4 |
| Reacher Easy | 898.9 | 831.5 |
| Reacher Hard | 499.2 | 597.2 |
| Walker Run | 757.8 | 701.1 |
| Walker Stand | 976.7 | 900.0 |
| Walker Walk | 955.8 | 956.0 |
| Mean | 739.6 | 756.8 |
| Median | 808.5 | 814.0 |
We averaged the evaluation score over 10 episodes and used 3 seeds per experiment.
| Task | Random | Human | DreamerV3 | DreamerV3-PyTorch (this repo) |
|---|---|---|---|---|
| Env Steps | - | - | 400K | 400K |
| Alien | 228 | 7128 | 959 | 1093 |
| Amidar | 6 | 1720 | 139 | 115 |
| Assault | 222 | 742 | 706 | 604 |
| Asterix | 210 | 8503 | 932 | 1500 |
| Bank Heist | 14 | 753 | 649 | 639 |
| Battle Zone | 2360 | 37188 | 12250 | 13867 |
| Boxing | 0 | 12 | 78 | 78 |
| Breakout | 2 | 30 | 31 | 65 |
| Chopper Com. | 811 | 7388 | 420 | 1127 |
| Crazy Climber | 10780 | 35829 | 97190 | 79647 |
| Demon Attack | 152 | 1971 | 303 | 233 |
| Freeway | 0 | 30 | 0 | 10 |
| Frostbite | 65 | 4335 | 909 | 364 |
| Gopher | 258 | 2412 | 3730 | 3285 |
| Hero | 1027 | 30826 | 11161 | 9610 |
| James Bond | 29 | 303 | 445 | 655 |
| Kangaroo | 52 | 3035 | 4098 | 4120 |
| Krull | 1598 | 2666 | 7782 | 8144 |
| Kung Fu Master | 258 | 22736 | 21420 | 26047 |
| Ms Pacman | 307 | 6952 | 1327 | 1649 |
| Pong | –21 | 15 | 18 | 20 |
| Private Eye | 25 | 69571 | 882 | 1141 |
| Qbert | 164 | 13455 | 3405 | 1978 |
| Road Runner | 12 | 7845 | 15565 | 12913 |
| Seaquest | 68 | 42055 | 618 | 786 |
| Up N Down | 533 | 11693 | 7600 | 14986 |
| Human Mean | 0% | 100% | 112% | 120% |
| Human Median | 0% | 100% | 49% | 44% |
| Task | Random | Human | DreamerV3 | DreamerV3-PyTorch (this repo) |
|---|---|---|---|---|
| Env Steps | - | - | 200M | 200M |
| Breakout | 2 | 30 | 300 | 396 |
Original work trains for 100M environment steps and uses DreamerV2, current results are for 10M steps. We used the same hyperparameters as Atari 200M experiments in the DreamerV3 paper.
| Task | DreamerV2 | DreamerV3-PyTorch (this repo) |
|---|---|---|
| Env Steps | 100M | 10M |
| Memory 9x9 | 28.2 | 26.2 |
Official DreamerV3 Implementation: https://github.com/danijar/dreamerv3
Memory Maze (Evaluating Long-Term Memory in 3D Mazes): https://github.com/jurgisp/memory-maze




















































