Skip to content

Mere-Solace/mlmcbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlmcbot - Machine Learning Archery Bot for Minecraft

My Summer 2023 machine learning research project for training autonomous agents to perform accurate bow-and-arrow shots in Minecraft using Mineflayer bots, computer vision, and reinforcement learning.

Project Type: Summer 2023 Research Project
Status: Paused Development
Key Features: POV Recording, Barrier Detection, Archery Data Collection, DQN/DDPG RL Training


Table of Contents

  1. Project Overview
  2. System Architecture
  3. Getting Started
  4. How It Works
  5. Training the Models
  6. Bot Commands and Operations
  7. Data Collection
  8. Future Work
  9. Project Structure
  10. References

Project Overview

Goal

The primary objective of this project is to develop an AI-driven Minecraft bot capable of accurately shooting moving targets with a bow and arrow. The bot learns to predict projectile trajectories and adjust aim parameters (yaw and pitch) in real-time using deep reinforcement learning.

Long-term Vision: Create a system where the bot can:

  • Predict target movement patterns
  • Calculate predictive lead (lead shots ahead of moving targets)
  • Dynamically adjust aim based on distance, height difference, and target velocity
  • Continuously improve accuracy through reinforcement learning

Current Capabilities

Implemented:

  • Bot spawning and server connection (Mineflayer)
  • First-person POV recording and streaming
  • Target player detection and tracking
  • Arrow trajectory tracking
  • Gravity and physics data collection
  • Discrete action DQN training
  • Continuous action DDPG training
  • YOLOv8-based object detection for barriers
  • Reward calculation based on shot accuracy
  • Model persistence (save/load training state)

In Development:

  • Movement prediction models
  • Predictive shot calculation
  • Multi-target engagement
  • Real-time trajectory optimization

System Architecture

High-Level Overview

┌─────────────────────────────────────────────────────┐
│          Minecraft Server (1.12)                    │
│  ├─ Bot Agent (Mineflayer)                          │
│  └─ Target Player                                   │
└────────────┬──────────────────────────────────────┬─┘
             │                                      │
             ▼                                      ▼
      ┌─────────────────┐              ┌──────────────────┐
      │ Prismarine      │              │ Physics Engine   │
      │ Viewer          │              │ (Arrow/Gravity)  │
      │ (POV Stream)    │              └──────────────────┘
      └────────┬────────┘
               │
      ┌────────▼──────────────────────┐
      │  RL Training Pipeline         │
      │  ├─ State: Screenshot         │
      │  ├─ Action: Yaw/Pitch Change  │
      │  ├─ Reward: Hit Distance      │
      │  └─ Training: DQN/DDPG        │
      └───────────────────────────────┘

Core Components

1. Bot Framework (Mineflayer)

  • JavaScript library for Minecraft bot control
  • Handles server connection, chat, movement, and item usage
  • Provides entity tracking and world state information

2. Reinforcement Learning Modules

Discrete RL (discrete_rl.py):

  • DQN architecture with fully-connected layers
  • Discrete action space (fixed yaw/pitch increments)
  • Action space: 15 pitch changes × 1 yaw change = 15 actions
  • Batch size: 64, Gamma: 0.99, Learning rate: 1e-4

DDPG RL (ddpg_rl.py):

  • Deep Deterministic Policy Gradient for continuous actions
  • Separate actor and critic networks using TensorFlow/Keras
  • Continuous yaw and pitch adjustments
  • Replay buffer with capacity 2000, batch size 64

3. Vision & Detection

  • OpenCV for image processing
  • YOLOv8 for barrier/obstacle detection
  • POV screenshots captured from Prismarine viewer
  • Image normalization and template matching for target tracking

4. Physics Simulation

  • Gravity calculation from empirical data
  • Arrow trajectory tracking in real-time
  • Hit detection and reward computation

Getting Started

Prerequisites

Software Requirements:

  • Python 3.8+
  • Node.js 14+ (for Mineflayer)
  • Minecraft Java Edition 1.12 server

Python Libraries:

numpy
torch / tensorflow
opencv-python
pillow
scikit-learn
matplotlib
seaborn
pandas
ultralytics (YOLO)

Node.js Modules:

  • mineflayer
  • minecraft-data
  • prismarine-viewer

Installation

  1. Clone the repository:

    git clone <repo-url>
    cd mlmcbot
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Install Node.js dependencies:

    npm install mineflayer minecraft-data prismarine-viewer
  4. Set up the Minecraft server:

    • Launch a Minecraft 1.12 server
    • Ensure it's accessible at 127.0.0.1:25565 (configurable)
    • Create a flat/prepared testing world
  5. Configure paths: Update file paths in the bot scripts to match your system:

    • SAVE_PATH: Directory for saving trained models
    • RECORD_PATH: Directory for recording RL training data
    • Output paths in bot initialization

Quick Start

  1. Start the Minecraft server:

    [Start your 1.12 server]
    
  2. Launch the bot:

    python BotScripts/mlmcbot.py
  3. View POV stream:

    • The bot will print a localhost URL (e.g., http://localhost:3000)
    • Open in a web browser to see the bot's first-person view
  4. Connect target player:

    • Join the server with username rl_target
    • Position yourself in the bot's field of view

How It Works

1. Bot Control Loop

# Main event loop in mlmcbot.py
def on_physics_tick_handler():
    # Called every tick (~20ms)
    # - Update arrow positions
    # - Track target location
    # - Calculate hit/miss
    # - Update RL state

Key operations:

  • Tracking: Bot maintains real-time position of target player and arrows
  • State representation: Captures POV screenshot as state for RL agent
  • Action execution: Applies yaw/pitch adjustments to aim
  • Reward calculation: Computes distance between arrow and target at impact

2. Trajectory Prediction

Currently implemented:

  • Physics-based: Gravity constant determined empirically
  • Arrow tracking: Real-time position updates from Minecraft entity data

Future enhancement:

  • ML-based prediction: Train LSTM/Transformer on historical trajectories
  • Lead calculation: Predict target's future position and adjust shot accordingly

3. Reinforcement Learning Training

DQN (Discrete Actions)

State Space:     4-dimensional vector or 64×64 RGB screenshot
Action Space:    {-10°, -7°, -5°, ..., 0°, ..., 7°, 10°} pitch changes (15 discrete actions)
Reward:          -distance_to_target (negative, lower is better)
Training:        Experience replay + target network updates every 24 episodes

DDPG (Continuous Actions)

State Space:     256×128 RGB screenshot
Action Space:    [Δyaw ∈ [-90, 90], Δpitch ∈ [-90, 90]]
Reward:          -distance_to_target + bonus for hits
Training:        Off-policy with critic guidance, soft target updates (TAU=0.005)

4. State & Action Space

State: First-person POV screenshot processed to 64×64 or 256×128 RGB

Actions (Discrete):

  • Yaw changes: [0] (no horizontal adjustment in base version)
  • Pitch changes: 15 discrete values from -10° to +10°

Actions (Continuous - DDPG):

  • Yaw adjustment: [-90°, +90°] (full rotation possible)
  • Pitch adjustment: [-90°, +90°] (look up/down range)

Reward Function:

reward = -euclidean_distance(arrow_pos, target_pos) + hit_bonus
# Incentivizes minimizing distance to target
# Bonus applied if arrow hits within tolerance

Training the Models

Starting Training

Discrete DQN:

python BotScripts/archer_mcbot.py
# Uses discrete_rl.py for action selection
# Trains policy network incrementally

Continuous DDPG:

python BotScripts/archer_bot_ddpg_rl.py
# Uses ddpg_rl.py for continuous control
# Actor-critic architecture

Training Process

  1. Initialization:

    • Bot spawns target player
    • RL agent initializes policy and target networks
    • Replay buffer created
  2. Episode Loop:

    • Bot aims at target (random initialization)
    • Selects action from RL policy
    • Applies yaw/pitch adjustment
    • Records POV screenshot as state
    • Shoots arrow
    • Observes reward (distance to target)
    • Stores transition: (state, action, reward, next_state, done)
  3. Learning:

    • Sample minibatch from replay buffer
    • Compute TD-error using target network
    • Update policy network via gradient descent
    • Soft update target network (TAU = 0.001 - 0.005)
  4. Evaluation:

    • Track rewards over episodes
    • Save best model weights
    • Plot loss curves and accuracy metrics

Hyperparameters

DQN:

  • Batch size: 64
  • Gamma (discount): 0.99
  • Learning rate: 1e-4
  • Epsilon start: 0.9, end: 0.05, decay: 1000 steps
  • Buffer capacity: 10,000
  • Target update frequency: Every 24 episodes

DDPG:

  • Batch size: 64
  • Gamma (discount): 0.99
  • Learning rate: 1e-4
  • TAU (soft update): 0.005
  • Buffer capacity: 2,000
  • Actor/Critic networks: Dense layers (configurable)

Monitoring Training

Models and data are saved to:

C:\Users\btm74\AdventuresInMinecraft-PC\BowRL\
├── policy_net.pth          # Trained policy network
├── target_net.pth          # Target network
├── memory.pth              # Replay buffer
├── ddpg_rl_records/        # DDPG training records
│   ├── states.npy
│   ├── actions.npy
│   ├── rewards.npy
│   ├── next_states.npy
│   └── dones.npy
└── bow_rewards.csv         # Training rewards log

To plot training progress:

import pandas as pd
import matplotlib.pyplot as plt

rewards = pd.read_csv('bow_rewards.csv')
plt.plot(rewards['episode'], rewards['reward'])
plt.xlabel('Episode')
plt.ylabel('Cumulative Reward')
plt.show()

Bot Commands and Operations

Chat Commands

The bot responds to in-game chat commands:

!aim                    # Start aiming at target player
!shoot                  # Fire arrow at current aim
!train                  # Begin RL training episode
!record                 # Record POV video
!detect                 # Run barrier detection
!gravity_test           # Collect gravity data
!range_test             # Test shooting range
!stop                   # Halt current operation

Programmatic Control

from BotScripts.mlmcbot import mlmcbot

bot = mlmcbot("ArcherBot")

# Manual control
bot.look_at_player(target_x, target_y, target_z)
bot.shoot_arrow()

# RL training
bot.run_rl_training(episodes=100)

# Data collection
bot.collect_gravity_data()
bot.collect_trajectory_data()

Data Collection

Types of Data Collected

1. Physics Data

  • Gravity measurements: Fall time vs. fall height
  • Trajectory data: Arrow position, velocity, target distance
  • Stored in: gravity-data.txt, trajectory-data.csv

2. Training Data

  • State: POV screenshots (256×128 or 64×64 RGB)
  • Actions: Yaw/pitch adjustments
  • Rewards: Distance to target
  • Next states: POV after action
  • Stored in: Replay buffer (in-memory + checkpoint files)

3. Detection Data

  • Images: POV screenshots with barriers/obstacles
  • Annotations: LabelMe JSON format
  • Converted to: YOLO format for YOLOv8 training
  • Path: Detection_Povs/ and Detection_Povs_Test/

4. Evaluation Metrics

  • Hit distance: Minimum distance arrow reached to target
  • Accuracy: Percentage of arrows within X blocks of target
  • Correlation: NCC (Normalized Cross-Correlation) for template matching
  • Exported to: CSV and text files for analysis

Processing Pipeline

Raw POV Screenshot
    ↓
Image Normalization
    ↓
Grayscale Conversion (for correlation)
    ↓
Resize to 64×64 / 256×128
    ↓
Normalize to [0, 1] or [-1, 1]
    ↓
Feed to RL Agent / Detection Model

Future Work

Phase 1: Movement Prediction (Short-term)

Goal: Enable the bot to predict player movement and calculate predictive leads.

Implementation:

  1. Collect trajectory data:

    • Record player positions over time
    • Sample at 10-20 Hz frequency
    • Create dataset of movement patterns
  2. Train movement predictor:

    Architecture: LSTM / Transformer
    Input: Player position history (last 10 positions)
    Output: Predicted position at t+Δt
    Loss: MSE on positional error
    
  3. Calculate lead:

    predicted_pos = movement_model(position_history)
    time_to_arrow_impact = distance / arrow_velocity
    aim_vector = predicted_pos - current_pos
    aim_at(aim_vector)
    

Expected improvement: 30-50% increase in hit rate for moving targets

Phase 2: Dynamic Trajectory Optimization (Medium-term)

Goal: Learn to adjust shot trajectory based on distance and environmental factors.

Approach:

  1. Factor extraction:

    • Distance to target
    • Height difference
    • Wind/physics parameters (future for modded servers)
  2. Conditional policy:

    policy(state | distance, height, velocity) → (yaw, pitch)
    
  3. Multi-agent RL:

    • Train separate policies for different distance ranges
    • Ensemble approach for robustness

Expected improvement: 40-60% accuracy on diverse ranges

Phase 3: Adaptive Learning (Long-term)

Goal: Bot continuously adapts to server-specific physics and player behavior.

Features:

  • Online learning during gameplay
  • Few-shot adaptation to new scenarios
  • Transfer learning from related games/tasks
  • Curriculum learning (easy targets → moving targets → multiple targets)

Phase 4: Advanced Perception (Speculative)

Potential enhancements:

  • 3D pose estimation of target player
  • Obstacle detection and navigation
  • Wind/environmental factor prediction
  • Multi-modal learning (visual + physical sensors)

Project Structure

mlmcbot/
├── BotScripts/                      # Main bot implementations
│   ├── mlmcbot.py                   # Base bot class with all features
│   ├── archer_mcbot.py              # Discrete DQN trainer
│   ├── archer_bot_ddpg_rl.py        # DDPG continuous trainer
│   ├── discrete_rl.py               # DQN network + training logic
│   ├── ddpg_rl.py                   # DDPG network + training logic
│   ├── rl.py                        # Legacy RL utilities
│   ├── rlutil.py                    # RL helper functions
│   ├── utils.py                     # General utilities (image, file, math)
│   ├── mcyolov8.py                  # YOLOv8 object detection wrapper
│   ├── neuralnet.py                 # Custom neural network definitions
│   ├── discrete_rl_V2.py            # Improved discrete RL variant
│   ├── archerV2.py                  # Alternative archer bot
│   └── prof_nn_def.py               # Profiling utilities
├── New_BotScripts/                  # Development/experimental code
│   ├── new_archer_bot/              # Refactored bot implementation
│   ├── new_rl.py                    # Experimental RL approach
│   └── utils.py                     # Updated utilities
├── README.md                        # This file
├── LICENSE                          # Project license
└── .git/                            # Version control

Key Directories (External):
BowRL/                               # Trained models directory
├── policy_net.pth
├── target_net.pth
├── memory.pth
└── bow_rewards.csv

MineflayerData/                      # Collected data
├── gravity-data.txt
├── correlation-data.txt
├── detection_data.txt
└── Detection_Povs/

Key Classes and Functions

mlmcbot.mlmcbot (Main Bot Class)

class mlmcbot:
    def __init__(self, botname, sentient=True, ip=HOST, port=PORT)
    
    # Control
    def shoot_arrow()
    def aim_at_player()
    def look_at(x, y, z)
    
    # Training
    def run_rl_training(episodes, trainer='dqn')
    def collect_trajectory_data()
    def collect_gravity_data()
    
    # Detection
    def detect_barriers()
    def run_barrier_detection()
    
    # Utilities
    def get_target_position() → (x, y, z)
    def get_arrow_position() → (x, y, z)
    def calculate_distance_to_target() → float

discrete_rl.DQN (Discrete Action Network)

class DQN(nn.Module):
    def __init__(self, state_size=4, action_dim=15)
    def forward(x) → action_logits
    
# Training functions
def select_action(state) → action_index
def optimize_model() → loss

ddpg_rl.ReplayBuffer (DDPG Experience Storage)

class ReplayBuffer:
    def add_record(state, action, reward, next_state, done)
    def sample(batch_size) → batch
    def save(folder_name)
    def load(folder_name)

Troubleshooting

Bot Won't Connect

  • Check server: Ensure Minecraft server is running on 127.0.0.1:25565
  • Python environment: Verify mineflayer-js bridge is installed
  • Firewall: Disable local firewall rules blocking port 25565

POV Stream Not Loading

  • Port conflict: Check if viewer port (3000+) is in use
  • Browser cache: Clear cache or use incognito window
  • Console errors: Check browser developer console (F12) for WebSocket errors

Training Not Improving

  • Reward scaling: Ensure reward values are reasonable (not too large/small)
  • State preprocessing: Verify POV screenshots are being captured correctly
  • Action bounds: Confirm yaw/pitch adjustments are within valid ranges
  • Learning rate: Try adjusting LR (1e-4 is typical starting point)

Memory Issues

  • Batch size: Reduce BATCH_SIZE if OOM errors occur
  • Image resolution: Use 64×64 instead of 256×128
  • Buffer capacity: Reduce BUFFER_CAPACITY if memory limited

References

Reinforcement Learning

Libraries & Tools

Project Context

  • Summer 2023 Research Project
  • Focus: AI-driven gameplay and trajectory prediction
  • Testbed: Minecraft 1.12 with custom world

Contributing

Contributions are welcome! Areas of focus:

  • Implement LSTM-based movement predictor
  • Optimize RL training (better hyperparameters)
  • Add multi-target support
  • Implement curriculum learning
  • Create comprehensive documentation
  • Add unit tests
  • Performance profiling and optimization

License

See LICENSE for details.


Author: Research Team | Last Updated: January 2026

For questions or issues, please refer to the repository's issue tracker or contact the project maintainers.

About

Machine Learning in Minecraft with Mineflayer | Summer 2023 Research Project | POV Recording, Simple Barrier Detection, and Archery Data Collection with a DQN for Accuracy Training

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages