Skip to content

cudnah124/Q-bert

Repository files navigation

DQN Agent for Q*bert - Atari Game

Deep Q-Network (DQN) implementation for learning to play the classic Q*bert Atari 2600 game using reinforcement learning.

Table of Contents

About Q*bert

Q*bert is an excellent benchmark for deep reinforcement learning with:

  • 6 discrete actions - tractable action space
  • Visual input - 210Γ—160 pixel screens
  • Progressive difficulty - 9+ levels with increasing complexity
  • Clear rewards - +25 points per cube color change
  • Strategic gameplay - requires planning and spatial awareness

Why This Game?

Q*bert provides unique challenges for RL agents:

  1. Dynamic Environment: Game rules change between levels
  2. Multi-objective: Must balance cube completion with enemy evasion
  3. Long-term Planning: Strategic disc usage for +500 point bonuses
  4. Non-stationary: Later levels revert cube colors if stepped on again

Game Mechanics

Objective

Change all pyramid cubes to the target color (shown top-left) by hopping on them.

Reward Structure

Action Points
Change cube color +25
Catch green ball +100
Defeat Slick/Sam +300
Defeat Coily via disc +500
Complete screen 1,000 + (250 Γ— level)
Unused disc 50-100

Level Progression

  • Level 1: Hop once to change color
  • Level 2: Hop twice (intermediate β†’ target)
  • Level 3+: Cubes revert if stepped on again after completion

Enemies

  • Coily (snake): Actively pursues Q*bert
  • Ugg & Wrong Way: Move horizontally
  • Slick & Sam: Change cube colors back

Installation

# Install dependencies
pip install gym[atari]
pip install ale-py
pip install torch torchvision torchaudio
pip install numpy matplotlib

# Download Atari ROMs
pip install autorom[accept-rom-license]

πŸƒ Quick Start

1. Train a Model

# Vanilla DQN
python train.py --algorithm dqn --episodes 5000

# Double DQN (recommended)
python train.py --algorithm double_dqn --episodes 5000

# Dueling DQN (best performance)
python train.py --algorithm dueling_dqn --episodes 6000

2. Evaluate Trained Agent

python evaluate.py --checkpoint checkpoints/dqn_best.pth --episodes 10

3. Environment Setup

import gym

env = gym.make("Qbert-v4")

# Action space (6 actions)
# 0: NOOP
# 1: FIRE
# 2: UP
# 3: RIGHT
# 4: LEFT
# 5: DOWN

Project Structure

Q-bert/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ requirements.txt          # Dependencies
β”œβ”€β”€ config.py                 # Hyperparameters
β”œβ”€β”€ model.py                  # DQN architectures
β”œβ”€β”€ agent.py                  # DQN agent
β”œβ”€β”€ environment.py            # Environment wrapper
β”œβ”€β”€ train.py                  # Training loop
β”œβ”€β”€ evaluate.py               # Evaluation script
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ replay_buffer.py      # Experience replay
β”‚   β”œβ”€β”€ preprocessing.py      # Frame preprocessing
β”‚   └── visualization.py      # Plotting tools
└── checkpoints/              # Saved models

Benchmarks

Algorithm Comparison (Stanford CS224R 2024)

Algorithm Avg Score Episodes Improvement
Vanilla DQN 734 3,601 Baseline
Double DQN 1,428 4,718 +94%
Dueling DQN 2,256 6,369 +58%

Historical Baselines (Stanford CS229 2016)

Algorithm Avg Score
Vanilla DQN 700
DRQN (Recurrent) 850

Expected Training Time

Hardware Vanilla DQN Double DQN Dueling DQN
RTX 3080+ 2-3 hours 3-4 hours 4-5 hours
RTX 2080 4-6 hours 6-8 hours 8-10 hours
CPU Only 24-48 hours Not recommended Not recommended

Configuration

Key Hyperparameters

Parameter Default Range Notes
Learning Rate 0.0001 [1e-5, 1e-3] Standard: 1e-4
Discount (Ξ³) 0.99 [0.95, 0.999] Higher = longer horizon
Epsilon Start 1.0 [0.5, 1.0] Initial exploration
Epsilon Final 0.01 [0.001, 0.1] Min exploration
Epsilon Decay 500k [100k, 1M] Decay steps
Replay Buffer 100k [50k, 1M] Memory usage
Batch Size 32 [16, 64] Gradient samples
Update Freq 4 [1, 10] Steps between updates
Target Update 1000 [500, 5000] Network sync

Validation Targets

targets = {
    "vanilla_dqn": 734,      # Β± 100
    "double_dqn": 1428,      # Β± 150
    "dueling_dqn": 2256,     # Β± 200
}

If scores are significantly lower, check:

  • βœ“ Frame preprocessing (stacking, grayscale, normalization)
  • βœ“ Network architecture (Conv2D layers)
  • βœ“ Hyperparameters (learning rate, update frequency)
  • βœ“ Replay buffer implementation
  • βœ“ Target network synchronization

Troubleshooting

Agent camps in corners

  • Cause: Disc mechanics not implemented properly
  • Fix: Verify disc logic and enemy collision detection

Training diverges (NaN loss)

  • Cause: Learning rate too high or reward scaling issues
  • Fix: Reduce LR to 1e-5; normalize rewards to [-1, 1]

Stuck at low score (1000+ episodes)

  • Cause: Insufficient exploration
  • Fix: Increase initial Ξ΅ or add noise to action selection

Memory errors

  • Cause: Replay buffer or batch too large
  • Fix: Reduce buffer to 50k; reduce batch size to 16

Roadmap

Level 1: Core Implementation

  • Vanilla DQN
  • Experience Replay
  • Target Network

Level 2: Algorithm Improvements

  • Double DQN
  • Dueling DQN
  • Prioritized Experience Replay

Level 3: Advanced Methods

  • Rainbow DQN
  • DRQN (LSTM)
  • Noisy Networks

Level 4: Analysis

  • Learning curves with confidence intervals
  • Attention heatmaps
  • Action value visualization
  • Gameplay recordings

References

Original Papers

Benchmarks

  • Stanford CS229 (2016) - Deep Q-Learning with Recurrent Neural Networks
  • Stanford CS224R (2024) - Q*bert Baseline Performance Comparison

Resources

License

MIT License - See LICENSE file for details.

Contact

For questions or issues:

  1. Check the Troubleshooting section
  2. Review the Benchmarks
  3. Open an issue with:
    • Your hyperparameters
    • Training logs (first 10 lines)
    • Hardware specs
    • Expected vs actual performance

Citation

@article{mnih2015human,
  title={Human-level control through deep reinforcement learning},
  author={Mnih, Volodymyr and others},
  journal={Nature},
  year={2015}
}

@misc{dqn_qbert_2026,
  title={DQN Agent for Q*bert},
  author={cudnah124},
  year={2026},
  howpublished={\url{https://github.com/cudnah124/Q-bert}}
}

Made by PDN for Deep Reinforcement Learning

About

DQN Agent for Q*bert: Compare vanilla, double & dueling deep Q-networks on classic Atari

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages