Clash Royale Reinforcement Learning Agent

An experimental reinforcement learning project to train an autonomous agent to play Clash Royale via computer vision and automated input (no private game APIs). The agent observes the game through screen capture of a BlueStacks emulator window, converts pixels into a structured state, and executes actions using ADB shell commands.

Current Status (Tooling & Prototype Phase)

We are in an early tooling phase validating capture, preprocessing, and input pipelines before implementing the full Gymnasium environment and PPO training loop.

Key Decisions (So Far)

Screen Capture (Windows): Using windows-capture (event-driven) for the BlueStacks window. DXcam retained as fallback.
Standard Frame Resolution: All frames cropped then downscaled to 480x854 for deterministic, lower‑compute downstream processing.
ADB Control: Pure Python adb-shell library (no external adb binary) for launching the game and issuing shell/input commands.
Template Matching: OpenCV template matching prototype for card detection proof-of-concept.
Environment Config: .env driven configuration for crop offsets, window name, paths, and device connection.

Neural Network Architecture

The agent uses a Structured MLP Policy with a shared card encoder to efficiently process the 53-dimensional state vector. This architecture is designed to handle the distinct types of information present in the game state.

Architecture Breakdown:

Global Processor: Processes the 13-dimensional global state, which includes elixir, time, tower health, and game phase indicators.
Shared Card Encoder: Processes the 10-dimensional features for each of the four cards in the player's hand. This shared encoder allows the model to learn a single representation for cards that can be applied to any card in any slot.
Fusion Layer: Combines the representations from the global processor and the card encoder into a single, fused representation.
Final Decision Layer: Produces action logits for the MultiDiscrete([4, 32, 18]) action space, which corresponds to the card slot, x-coordinate, and y-coordinate of the deployment.

This structured approach allows the model to learn more efficiently by processing the global and card-specific features separately before combining them to make a final decision.

Helper Scripts Overview (`HelperScripts/`)

Script	Purpose
`windows-capture-testing.py`	One-shot frame grab for a specified BlueStacks window.
`windows-capture-with required resolution.py`	Capture → crop → resize → save normalized 480p frame.
`downscale.py`	Standalone image downscaling utility (defaults 480x854).
`Card_Template_matching_Example.py`	OpenCV template match demo for card detection latency check.
`py_adb.py`	Interactive adb-shell client + auto launch of Clash Royale intent.
`frame-extract.py`	Extract frames from recorded gameplay videos at intervals for dataset creation.
`.env.example`	Template environment variables (window name, crop, device IP/port, asset paths).
`requirements.txt`	Minimal dependency list for current tooling layer.

Environment Variables (Excerpt)

Create a .env based on .env.example:

WINDOW_NAME="BlueStacks App Player 1"
CROP_LEFT=657
CROP_RIGHT=657
TARGET_WIDTH=480
TARGET_HEIGHT=854
ADB_DEVICE_IP=127.0.0.1
ADB_DEVICE_PORT=5555
CARD_IMAGE_PATH=path/to/card.png
GAME_STATE_IMAGE_PATH=path/to/state.png

Planned Architecture (Roadmap Alignment)

Layer	Description
Capture & Preprocess	Event-driven window capture → crop → standard 480p tensor.
CV Extraction	YOLO (troops), OCR (tower health), pixel counting (elixir), template match (cards).
State Assembly	Handcrafted feature vector (global + unit + tower + hand features).
Action Space	Hierarchical MultiDiscrete: [card_slot, x_tile, y_tile] with action masking.
RL Core	PPO (Stable-Baselines3) with composite reward (terminal + shaping).
Network	Multi-modal (CNN for spatial grid + MLP for scalars, dual actor/critic heads).
Scaling (Future)	Parallel rollouts & self-play (Ray RLlib) after single-instance stability.

Next Immediate Tasks

Integrate capture + downscale + template match into a single prototype pipeline script.
Add timing benchmarks (capture latency, preprocessing ms, template match ms).
Introduce structured logging & error handling wrappers.
Draft minimal ClashRoyaleEnv scaffold (reset/step placeholders) once pipeline stable.

OCR Performance Optimization

The agent uses PaddleOCR 2.7.3 with ONNX Runtime for high-speed digit extraction from elixir and tower health regions.

Performance Benchmarks

Original PaddleOCR 3.x: ~800ms total (150ms elixir + 650ms towers)
PaddleOCR 2.7.3 Standard: ~437ms total (62.51ms × 7 calls)
ONNX Runtime Sequential: ~301ms total (43.01ms × 7 calls)
ONNX Runtime Parallel (7 workers): ~91ms total ⚡ (3.84x speedup)

Total speedup: 8.8x faster than original (800ms → 91ms)

ONNX Model Setup

Run the setup script to download and convert PaddleOCR models to ONNX format:

.\setup_paddleocr2_onnx.ps1

This script will:

Download PP-OCRv3 detection, PP-OCRv4 recognition, and v2.0 classification models
Extract and convert them to ONNX format using paddle2onnx
Save to inference/det_onnx/, inference/rec_onnx/, and inference/cls_onnx/

Requirements:

PaddleOCR 2.7.3 (not 3.x)
NumPy 1.26.4 (for imgaug compatibility)
paddle2onnx 2.0.2rc3 (install via paddlex --install paddle2onnx)

Architecture Details

The OCR system processes 7 ROIs in parallel:

1 Elixir counter: Single digit (0-10)
6 Tower health displays: 3-4 digits (friendly + enemy king/princess towers)

Uses ThreadPoolExecutor with 7 workers for parallel ONNX inference, achieving near-linear speedup due to ONNX Runtime's thread-safe C++ implementation.

Installing & Running (Tooling Phase)

python -m venv .venv
.\.venv\Scripts\activate  # Windows PowerShell
pip install -r requirements.txt
cp HelperScripts/.env.example .env
# Edit .env with correct window name & paths

# Set up ONNX models for fast OCR
.\setup_paddleocr2_onnx.ps1

# Test helper scripts
python HelperScripts/windows-capture-testing.py
python HelperScripts/windows-capture-with\ required\ resolution.py
python HelperScripts/py_adb.py

Safety & Anti-Detection Measures (Planned)

Coordinate jitter & variable tap delays.
Avoid deterministic frame pacing for action issuance.
Optional throttling & humanized randomization for non-critical actions.

This README will expand as the project transitions from tooling prototypes to the formal RL environment and training stack.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
HelperScripts		HelperScripts
assets		assets
docs		docs
examples		examples
models/onnx_tools		models/onnx_tools
research		research
src		src
test		test
.gitignore		.gitignore
README.md		README.md
attribute.txt		attribute.txt
deck.json		deck.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clash Royale Reinforcement Learning Agent

Current Status (Tooling & Prototype Phase)

Key Decisions (So Far)

Neural Network Architecture

Helper Scripts Overview (`HelperScripts/`)

Environment Variables (Excerpt)

Planned Architecture (Roadmap Alignment)

Next Immediate Tasks

OCR Performance Optimization

Performance Benchmarks

ONNX Model Setup

Architecture Details

Installing & Running (Tooling Phase)

Safety & Anti-Detection Measures (Planned)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clash Royale Reinforcement Learning Agent

Current Status (Tooling & Prototype Phase)

Key Decisions (So Far)

Neural Network Architecture

Helper Scripts Overview (HelperScripts/)

Environment Variables (Excerpt)

Planned Architecture (Roadmap Alignment)

Next Immediate Tasks

OCR Performance Optimization

Performance Benchmarks

ONNX Model Setup

Architecture Details

Installing & Running (Tooling Phase)

Safety & Anti-Detection Measures (Planned)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Helper Scripts Overview (`HelperScripts/`)

Packages