This repository implements a multi-agent reinforcement learning system for the Atari Boxing environment. The goal of this project is to create two intelligent agents that compete against each other in the boxing game, learn from their interactions, and improve their performance over time. The agents are trained using deep Q-learning and work in parallel to ensure dynamic and competitive gameplay.
- Multi-Agent System: Two agents trained simultaneously in a competitive environment.
- Deep Q-Learning: Uses convolutional neural networks (CNNs) to process game frames and predict actions.
- Replay Buffer: Stores past experiences for efficient training and stability.
- Environment Wrapper: Simplifies interaction with the PettingZoo Atari Boxing environment.
- Dynamic Exploration: Implements epsilon-greedy exploration with adaptive decay.
- Evaluation Pipeline: Robust tools to evaluate agent performance and visualize results.
- Checkpointing: Save and load models to resume training or evaluation.
AtariBoxingProject/
├── agents/
│ ├── __init__.py # Initializes the agents package
│ ├── agent.py # DQNAgent class for single-agent functionality
│ ├── multi_agent.py # Multi-agent logic for training two agents
├── checkpoints/ # Directory for saving and loading model checkpoints
├── env_setup/
│ ├── __init__.py # Initializes the environment setup package
│ ├── env_wrapper.py # Wrapper for PettingZoo Atari environment
│ ├── utils.py # Helper functions for environment setup
├── models/
│ ├── __init__.py # Initializes the models package
│ ├── cnn_model.py # CNN architecture for feature extraction
├── roms/ # Contains the game ROM (Atari Boxing files)
├── training/
│ ├── __init__.py # Initializes the training package
│ ├── evaluation.py # Evaluates trained agents' performance
│ ├── replay_buffer.py # Experience replay buffer implementation
│ ├── train.py # Training script for the multi-agent system
│ ├── utils.py # Helper functions for the training process
├── main.py # Entry point for running the project
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── test.py # Unit tests for components
-
Clone the Repository
git clone https://github.com/your-username/AtariBoxingProject.git cd AtariBoxingProject -
Install Dependencies
Use the
requirements.txtfile to install dependencies:pip install -r requirements.txt
-
Add ROMs
Place the necessary Atari Boxing ROM files in the
roms/directory. Ensure you have legal access to these files. -
Operating System Limitations
- This project works only on Mac/Linux/Unix systems. Windows systems are not supported due to compatibility issues with the PettingZoo Atari environment.
- The default code is configured for Mac systems. For Linux/Unix, ensure you replace the following lines:
with:
env = boxing_v2.env(render_mode="human")
Update this in the following files:env = boxing_v2.env(render_mode="human", auto_rom_install_path="roms")
multi_agent.pycnn_model.pytrain.pyevaluation.py
To train the agents, run:
python main.py --mode train --num_episodes 1000 --batch_size 64 --target_update_freq 10 --gamma 0.99 --checkpoint_dir checkpointsTo evaluate the agents after training, run:
python main.py --mode evaluate --num_eval_games 10 --checkpoint_dir checkpointsRun unit tests to ensure code functionality:
python test.py| Argument | Description | Default |
|---|---|---|
--mode |
Mode to run the script: train or evaluate |
train |
--num_episodes |
Number of episodes to train | 0 |
--batch_size |
Batch size for experience replay during training | 32 |
--target_update_freq |
Frequency (in episodes) to update target network | 10 |
--gamma |
Discount factor for Q-learning | 0.99 |
--num_eval_games |
Number of games to play during evaluation | 10 |
--checkpoint_dir |
Directory to save/load model checkpoints | checkpoints |
The project uses a multi-agent version of DQN where each agent is represented by a CNN and trains independently while competing in the shared environment.
The CNN processes game frames and extracts features. The architecture includes:
- 3 convolutional layers for feature extraction.
- Fully connected layers for decision-making.
- Rectified Linear Unit (ReLU) activation for non-linearity.
Stores past experiences (state, action, reward, next_state) and samples mini-batches for training to ensure better convergence.
Simplifies interaction with the PettingZoo environment by:
- Normalizing observations.
- Stacking consecutive frames for temporal awareness.
- Training Progress: Agents progressively learn strategies to maximize their scores.
- Competitive Dynamics: Both agents adapt to each other's strategies, leading to engaging matches.
- Model Performance: The agents demonstrate high accuracy in predicting optimal actions after sufficient training.
Sample Training Log:
| Metric | Value {White Agent, Black Agent} |
|---|---|
| Episodes Trained | 1000 |
| Average WinRates | {0.3521, 0.6749} |
| Final Epsilon | 0.01 |
| Average Reward | {-1.7, 1.7} |
- Implement advanced algorithms like Proximal Policy Optimization (PPO).
- Extend to cooperative multi-agent environments.
- Add visualization tools for real-time performance tracking.