Deep Q-Network (DQN) implementation for learning to play the classic Q*bert Atari 2600 game using reinforcement learning.
- About Q*bert
- Why This Game?
- Game Mechanics
- Installation
- Quick Start
- Project Structure
- Benchmarks
- Configuration
- Troubleshooting
- References
Q*bert is an excellent benchmark for deep reinforcement learning with:
- 6 discrete actions - tractable action space
- Visual input - 210Γ160 pixel screens
- Progressive difficulty - 9+ levels with increasing complexity
- Clear rewards - +25 points per cube color change
- Strategic gameplay - requires planning and spatial awareness
Q*bert provides unique challenges for RL agents:
- Dynamic Environment: Game rules change between levels
- Multi-objective: Must balance cube completion with enemy evasion
- Long-term Planning: Strategic disc usage for +500 point bonuses
- Non-stationary: Later levels revert cube colors if stepped on again
Change all pyramid cubes to the target color (shown top-left) by hopping on them.
| Action | Points |
|---|---|
| Change cube color | +25 |
| Catch green ball | +100 |
| Defeat Slick/Sam | +300 |
| Defeat Coily via disc | +500 |
| Complete screen | 1,000 + (250 Γ level) |
| Unused disc | 50-100 |
- Level 1: Hop once to change color
- Level 2: Hop twice (intermediate β target)
- Level 3+: Cubes revert if stepped on again after completion
- Coily (snake): Actively pursues Q*bert
- Ugg & Wrong Way: Move horizontally
- Slick & Sam: Change cube colors back
# Install dependencies
pip install gym[atari]
pip install ale-py
pip install torch torchvision torchaudio
pip install numpy matplotlib
# Download Atari ROMs
pip install autorom[accept-rom-license]# Vanilla DQN
python train.py --algorithm dqn --episodes 5000
# Double DQN (recommended)
python train.py --algorithm double_dqn --episodes 5000
# Dueling DQN (best performance)
python train.py --algorithm dueling_dqn --episodes 6000python evaluate.py --checkpoint checkpoints/dqn_best.pth --episodes 10import gym
env = gym.make("Qbert-v4")
# Action space (6 actions)
# 0: NOOP
# 1: FIRE
# 2: UP
# 3: RIGHT
# 4: LEFT
# 5: DOWNQ-bert/
βββ README.md # This file
βββ requirements.txt # Dependencies
βββ config.py # Hyperparameters
βββ model.py # DQN architectures
βββ agent.py # DQN agent
βββ environment.py # Environment wrapper
βββ train.py # Training loop
βββ evaluate.py # Evaluation script
βββ utils/
β βββ replay_buffer.py # Experience replay
β βββ preprocessing.py # Frame preprocessing
β βββ visualization.py # Plotting tools
βββ checkpoints/ # Saved models
| Algorithm | Avg Score | Episodes | Improvement |
|---|---|---|---|
| Vanilla DQN | 734 | 3,601 | Baseline |
| Double DQN | 1,428 | 4,718 | +94% |
| Dueling DQN | 2,256 | 6,369 | +58% |
| Algorithm | Avg Score |
|---|---|
| Vanilla DQN | 700 |
| DRQN (Recurrent) | 850 |
| Hardware | Vanilla DQN | Double DQN | Dueling DQN |
|---|---|---|---|
| RTX 3080+ | 2-3 hours | 3-4 hours | 4-5 hours |
| RTX 2080 | 4-6 hours | 6-8 hours | 8-10 hours |
| CPU Only | 24-48 hours | Not recommended | Not recommended |
| Parameter | Default | Range | Notes |
|---|---|---|---|
| Learning Rate | 0.0001 | [1e-5, 1e-3] | Standard: 1e-4 |
| Discount (Ξ³) | 0.99 | [0.95, 0.999] | Higher = longer horizon |
| Epsilon Start | 1.0 | [0.5, 1.0] | Initial exploration |
| Epsilon Final | 0.01 | [0.001, 0.1] | Min exploration |
| Epsilon Decay | 500k | [100k, 1M] | Decay steps |
| Replay Buffer | 100k | [50k, 1M] | Memory usage |
| Batch Size | 32 | [16, 64] | Gradient samples |
| Update Freq | 4 | [1, 10] | Steps between updates |
| Target Update | 1000 | [500, 5000] | Network sync |
targets = {
"vanilla_dqn": 734, # Β± 100
"double_dqn": 1428, # Β± 150
"dueling_dqn": 2256, # Β± 200
}If scores are significantly lower, check:
- β Frame preprocessing (stacking, grayscale, normalization)
- β Network architecture (Conv2D layers)
- β Hyperparameters (learning rate, update frequency)
- β Replay buffer implementation
- β Target network synchronization
- Cause: Disc mechanics not implemented properly
- Fix: Verify disc logic and enemy collision detection
- Cause: Learning rate too high or reward scaling issues
- Fix: Reduce LR to 1e-5; normalize rewards to [-1, 1]
- Cause: Insufficient exploration
- Fix: Increase initial Ξ΅ or add noise to action selection
- Cause: Replay buffer or batch too large
- Fix: Reduce buffer to 50k; reduce batch size to 16
- Vanilla DQN
- Experience Replay
- Target Network
- Double DQN
- Dueling DQN
- Prioritized Experience Replay
- Rainbow DQN
- DRQN (LSTM)
- Noisy Networks
- Learning curves with confidence intervals
- Attention heatmaps
- Action value visualization
- Gameplay recordings
- DQN: Mnih et al. (2015) - Human-level control through deep reinforcement learning
- Double DQN: Van Hasselt et al. (2015) - Deep Reinforcement Learning with Double Q-learning
- Dueling DQN: Wang et al. (2015) - Dueling Network Architectures
- Stanford CS229 (2016) - Deep Q-Learning with Recurrent Neural Networks
- Stanford CS224R (2024) - Q*bert Baseline Performance Comparison
MIT License - See LICENSE file for details.
For questions or issues:
- Check the Troubleshooting section
- Review the Benchmarks
- Open an issue with:
- Your hyperparameters
- Training logs (first 10 lines)
- Hardware specs
- Expected vs actual performance
@article{mnih2015human,
title={Human-level control through deep reinforcement learning},
author={Mnih, Volodymyr and others},
journal={Nature},
year={2015}
}
@misc{dqn_qbert_2026,
title={DQN Agent for Q*bert},
author={cudnah124},
year={2026},
howpublished={\url{https://github.com/cudnah124/Q-bert}}
}Made by PDN for Deep Reinforcement Learning