My Summer 2023 machine learning research project for training autonomous agents to perform accurate bow-and-arrow shots in Minecraft using Mineflayer bots, computer vision, and reinforcement learning.
Project Type: Summer 2023 Research Project
Status: Paused Development
Key Features: POV Recording, Barrier Detection, Archery Data Collection, DQN/DDPG RL Training
- Project Overview
- System Architecture
- Getting Started
- How It Works
- Training the Models
- Bot Commands and Operations
- Data Collection
- Future Work
- Project Structure
- References
The primary objective of this project is to develop an AI-driven Minecraft bot capable of accurately shooting moving targets with a bow and arrow. The bot learns to predict projectile trajectories and adjust aim parameters (yaw and pitch) in real-time using deep reinforcement learning.
Long-term Vision: Create a system where the bot can:
- Predict target movement patterns
- Calculate predictive lead (lead shots ahead of moving targets)
- Dynamically adjust aim based on distance, height difference, and target velocity
- Continuously improve accuracy through reinforcement learning
Implemented:
- Bot spawning and server connection (Mineflayer)
- First-person POV recording and streaming
- Target player detection and tracking
- Arrow trajectory tracking
- Gravity and physics data collection
- Discrete action DQN training
- Continuous action DDPG training
- YOLOv8-based object detection for barriers
- Reward calculation based on shot accuracy
- Model persistence (save/load training state)
In Development:
- Movement prediction models
- Predictive shot calculation
- Multi-target engagement
- Real-time trajectory optimization
┌─────────────────────────────────────────────────────┐
│ Minecraft Server (1.12) │
│ ├─ Bot Agent (Mineflayer) │
│ └─ Target Player │
└────────────┬──────────────────────────────────────┬─┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Prismarine │ │ Physics Engine │
│ Viewer │ │ (Arrow/Gravity) │
│ (POV Stream) │ └──────────────────┘
└────────┬────────┘
│
┌────────▼──────────────────────┐
│ RL Training Pipeline │
│ ├─ State: Screenshot │
│ ├─ Action: Yaw/Pitch Change │
│ ├─ Reward: Hit Distance │
│ └─ Training: DQN/DDPG │
└───────────────────────────────┘
- JavaScript library for Minecraft bot control
- Handles server connection, chat, movement, and item usage
- Provides entity tracking and world state information
Discrete RL (discrete_rl.py):
- DQN architecture with fully-connected layers
- Discrete action space (fixed yaw/pitch increments)
- Action space: 15 pitch changes × 1 yaw change = 15 actions
- Batch size: 64, Gamma: 0.99, Learning rate: 1e-4
DDPG RL (ddpg_rl.py):
- Deep Deterministic Policy Gradient for continuous actions
- Separate actor and critic networks using TensorFlow/Keras
- Continuous yaw and pitch adjustments
- Replay buffer with capacity 2000, batch size 64
- OpenCV for image processing
- YOLOv8 for barrier/obstacle detection
- POV screenshots captured from Prismarine viewer
- Image normalization and template matching for target tracking
- Gravity calculation from empirical data
- Arrow trajectory tracking in real-time
- Hit detection and reward computation
Software Requirements:
- Python 3.8+
- Node.js 14+ (for Mineflayer)
- Minecraft Java Edition 1.12 server
Python Libraries:
numpy
torch / tensorflow
opencv-python
pillow
scikit-learn
matplotlib
seaborn
pandas
ultralytics (YOLO)
Node.js Modules:
mineflayerminecraft-dataprismarine-viewer
-
Clone the repository:
git clone <repo-url> cd mlmcbot
-
Install Python dependencies:
pip install -r requirements.txt
-
Install Node.js dependencies:
npm install mineflayer minecraft-data prismarine-viewer
-
Set up the Minecraft server:
- Launch a Minecraft 1.12 server
- Ensure it's accessible at
127.0.0.1:25565(configurable) - Create a flat/prepared testing world
-
Configure paths: Update file paths in the bot scripts to match your system:
SAVE_PATH: Directory for saving trained modelsRECORD_PATH: Directory for recording RL training data- Output paths in bot initialization
-
Start the Minecraft server:
[Start your 1.12 server] -
Launch the bot:
python BotScripts/mlmcbot.py
-
View POV stream:
- The bot will print a localhost URL (e.g.,
http://localhost:3000) - Open in a web browser to see the bot's first-person view
- The bot will print a localhost URL (e.g.,
-
Connect target player:
- Join the server with username
rl_target - Position yourself in the bot's field of view
- Join the server with username
# Main event loop in mlmcbot.py
def on_physics_tick_handler():
# Called every tick (~20ms)
# - Update arrow positions
# - Track target location
# - Calculate hit/miss
# - Update RL stateKey operations:
- Tracking: Bot maintains real-time position of target player and arrows
- State representation: Captures POV screenshot as state for RL agent
- Action execution: Applies yaw/pitch adjustments to aim
- Reward calculation: Computes distance between arrow and target at impact
Currently implemented:
- Physics-based: Gravity constant determined empirically
- Arrow tracking: Real-time position updates from Minecraft entity data
Future enhancement:
- ML-based prediction: Train LSTM/Transformer on historical trajectories
- Lead calculation: Predict target's future position and adjust shot accordingly
State Space: 4-dimensional vector or 64×64 RGB screenshot
Action Space: {-10°, -7°, -5°, ..., 0°, ..., 7°, 10°} pitch changes (15 discrete actions)
Reward: -distance_to_target (negative, lower is better)
Training: Experience replay + target network updates every 24 episodes
State Space: 256×128 RGB screenshot
Action Space: [Δyaw ∈ [-90, 90], Δpitch ∈ [-90, 90]]
Reward: -distance_to_target + bonus for hits
Training: Off-policy with critic guidance, soft target updates (TAU=0.005)
State: First-person POV screenshot processed to 64×64 or 256×128 RGB
Actions (Discrete):
- Yaw changes: [0] (no horizontal adjustment in base version)
- Pitch changes: 15 discrete values from -10° to +10°
Actions (Continuous - DDPG):
- Yaw adjustment: [-90°, +90°] (full rotation possible)
- Pitch adjustment: [-90°, +90°] (look up/down range)
Reward Function:
reward = -euclidean_distance(arrow_pos, target_pos) + hit_bonus
# Incentivizes minimizing distance to target
# Bonus applied if arrow hits within toleranceDiscrete DQN:
python BotScripts/archer_mcbot.py
# Uses discrete_rl.py for action selection
# Trains policy network incrementallyContinuous DDPG:
python BotScripts/archer_bot_ddpg_rl.py
# Uses ddpg_rl.py for continuous control
# Actor-critic architecture-
Initialization:
- Bot spawns target player
- RL agent initializes policy and target networks
- Replay buffer created
-
Episode Loop:
- Bot aims at target (random initialization)
- Selects action from RL policy
- Applies yaw/pitch adjustment
- Records POV screenshot as state
- Shoots arrow
- Observes reward (distance to target)
- Stores transition: (state, action, reward, next_state, done)
-
Learning:
- Sample minibatch from replay buffer
- Compute TD-error using target network
- Update policy network via gradient descent
- Soft update target network (TAU = 0.001 - 0.005)
-
Evaluation:
- Track rewards over episodes
- Save best model weights
- Plot loss curves and accuracy metrics
DQN:
- Batch size: 64
- Gamma (discount): 0.99
- Learning rate: 1e-4
- Epsilon start: 0.9, end: 0.05, decay: 1000 steps
- Buffer capacity: 10,000
- Target update frequency: Every 24 episodes
DDPG:
- Batch size: 64
- Gamma (discount): 0.99
- Learning rate: 1e-4
- TAU (soft update): 0.005
- Buffer capacity: 2,000
- Actor/Critic networks: Dense layers (configurable)
Models and data are saved to:
C:\Users\btm74\AdventuresInMinecraft-PC\BowRL\
├── policy_net.pth # Trained policy network
├── target_net.pth # Target network
├── memory.pth # Replay buffer
├── ddpg_rl_records/ # DDPG training records
│ ├── states.npy
│ ├── actions.npy
│ ├── rewards.npy
│ ├── next_states.npy
│ └── dones.npy
└── bow_rewards.csv # Training rewards log
To plot training progress:
import pandas as pd
import matplotlib.pyplot as plt
rewards = pd.read_csv('bow_rewards.csv')
plt.plot(rewards['episode'], rewards['reward'])
plt.xlabel('Episode')
plt.ylabel('Cumulative Reward')
plt.show()The bot responds to in-game chat commands:
!aim # Start aiming at target player
!shoot # Fire arrow at current aim
!train # Begin RL training episode
!record # Record POV video
!detect # Run barrier detection
!gravity_test # Collect gravity data
!range_test # Test shooting range
!stop # Halt current operation
from BotScripts.mlmcbot import mlmcbot
bot = mlmcbot("ArcherBot")
# Manual control
bot.look_at_player(target_x, target_y, target_z)
bot.shoot_arrow()
# RL training
bot.run_rl_training(episodes=100)
# Data collection
bot.collect_gravity_data()
bot.collect_trajectory_data()- Gravity measurements: Fall time vs. fall height
- Trajectory data: Arrow position, velocity, target distance
- Stored in:
gravity-data.txt,trajectory-data.csv
- State: POV screenshots (256×128 or 64×64 RGB)
- Actions: Yaw/pitch adjustments
- Rewards: Distance to target
- Next states: POV after action
- Stored in: Replay buffer (in-memory + checkpoint files)
- Images: POV screenshots with barriers/obstacles
- Annotations: LabelMe JSON format
- Converted to: YOLO format for YOLOv8 training
- Path:
Detection_Povs/andDetection_Povs_Test/
- Hit distance: Minimum distance arrow reached to target
- Accuracy: Percentage of arrows within X blocks of target
- Correlation: NCC (Normalized Cross-Correlation) for template matching
- Exported to: CSV and text files for analysis
Raw POV Screenshot
↓
Image Normalization
↓
Grayscale Conversion (for correlation)
↓
Resize to 64×64 / 256×128
↓
Normalize to [0, 1] or [-1, 1]
↓
Feed to RL Agent / Detection Model
Goal: Enable the bot to predict player movement and calculate predictive leads.
Implementation:
-
Collect trajectory data:
- Record player positions over time
- Sample at 10-20 Hz frequency
- Create dataset of movement patterns
-
Train movement predictor:
Architecture: LSTM / Transformer Input: Player position history (last 10 positions) Output: Predicted position at t+Δt Loss: MSE on positional error -
Calculate lead:
predicted_pos = movement_model(position_history) time_to_arrow_impact = distance / arrow_velocity aim_vector = predicted_pos - current_pos aim_at(aim_vector)
Expected improvement: 30-50% increase in hit rate for moving targets
Goal: Learn to adjust shot trajectory based on distance and environmental factors.
Approach:
-
Factor extraction:
- Distance to target
- Height difference
- Wind/physics parameters (future for modded servers)
-
Conditional policy:
policy(state | distance, height, velocity) → (yaw, pitch) -
Multi-agent RL:
- Train separate policies for different distance ranges
- Ensemble approach for robustness
Expected improvement: 40-60% accuracy on diverse ranges
Goal: Bot continuously adapts to server-specific physics and player behavior.
Features:
- Online learning during gameplay
- Few-shot adaptation to new scenarios
- Transfer learning from related games/tasks
- Curriculum learning (easy targets → moving targets → multiple targets)
Potential enhancements:
- 3D pose estimation of target player
- Obstacle detection and navigation
- Wind/environmental factor prediction
- Multi-modal learning (visual + physical sensors)
mlmcbot/
├── BotScripts/ # Main bot implementations
│ ├── mlmcbot.py # Base bot class with all features
│ ├── archer_mcbot.py # Discrete DQN trainer
│ ├── archer_bot_ddpg_rl.py # DDPG continuous trainer
│ ├── discrete_rl.py # DQN network + training logic
│ ├── ddpg_rl.py # DDPG network + training logic
│ ├── rl.py # Legacy RL utilities
│ ├── rlutil.py # RL helper functions
│ ├── utils.py # General utilities (image, file, math)
│ ├── mcyolov8.py # YOLOv8 object detection wrapper
│ ├── neuralnet.py # Custom neural network definitions
│ ├── discrete_rl_V2.py # Improved discrete RL variant
│ ├── archerV2.py # Alternative archer bot
│ └── prof_nn_def.py # Profiling utilities
├── New_BotScripts/ # Development/experimental code
│ ├── new_archer_bot/ # Refactored bot implementation
│ ├── new_rl.py # Experimental RL approach
│ └── utils.py # Updated utilities
├── README.md # This file
├── LICENSE # Project license
└── .git/ # Version control
Key Directories (External):
BowRL/ # Trained models directory
├── policy_net.pth
├── target_net.pth
├── memory.pth
└── bow_rewards.csv
MineflayerData/ # Collected data
├── gravity-data.txt
├── correlation-data.txt
├── detection_data.txt
└── Detection_Povs/
class mlmcbot:
def __init__(self, botname, sentient=True, ip=HOST, port=PORT)
# Control
def shoot_arrow()
def aim_at_player()
def look_at(x, y, z)
# Training
def run_rl_training(episodes, trainer='dqn')
def collect_trajectory_data()
def collect_gravity_data()
# Detection
def detect_barriers()
def run_barrier_detection()
# Utilities
def get_target_position() → (x, y, z)
def get_arrow_position() → (x, y, z)
def calculate_distance_to_target() → floatclass DQN(nn.Module):
def __init__(self, state_size=4, action_dim=15)
def forward(x) → action_logits
# Training functions
def select_action(state) → action_index
def optimize_model() → lossclass ReplayBuffer:
def add_record(state, action, reward, next_state, done)
def sample(batch_size) → batch
def save(folder_name)
def load(folder_name)- Check server: Ensure Minecraft server is running on 127.0.0.1:25565
- Python environment: Verify mineflayer-js bridge is installed
- Firewall: Disable local firewall rules blocking port 25565
- Port conflict: Check if viewer port (3000+) is in use
- Browser cache: Clear cache or use incognito window
- Console errors: Check browser developer console (F12) for WebSocket errors
- Reward scaling: Ensure reward values are reasonable (not too large/small)
- State preprocessing: Verify POV screenshots are being captured correctly
- Action bounds: Confirm yaw/pitch adjustments are within valid ranges
- Learning rate: Try adjusting LR (1e-4 is typical starting point)
- Batch size: Reduce BATCH_SIZE if OOM errors occur
- Image resolution: Use 64×64 instead of 256×128
- Buffer capacity: Reduce BUFFER_CAPACITY if memory limited
- Mineflayer - Minecraft bot framework
- PyTorch - Deep learning framework
- TensorFlow/Keras - Alternative DL framework
- YOLOv8 - Object detection
- Prismarine - 3D world viewer
- Summer 2023 Research Project
- Focus: AI-driven gameplay and trajectory prediction
- Testbed: Minecraft 1.12 with custom world
Contributions are welcome! Areas of focus:
- Implement LSTM-based movement predictor
- Optimize RL training (better hyperparameters)
- Add multi-target support
- Implement curriculum learning
- Create comprehensive documentation
- Add unit tests
- Performance profiling and optimization
See LICENSE for details.
Author: Research Team | Last Updated: January 2026
For questions or issues, please refer to the repository's issue tracker or contact the project maintainers.