Skip to content

Multi-Agent system for Automate mobile QA testing using LLMs, ADB, and vision-grounded execution with Simular Agent S3 framework.

License

Notifications You must be signed in to change notification settings

Surenbandari/MobileTestingAgent

Repository files navigation

Mobile QA Agent - Multi-Agent System

A Supervisor Planner-Executor architecture for automated mobile app testing using LLMs and ADB commands with vision grounding. Built on Simular Agent S3 framework for vision-grounded execution and reliable task verification.

Architecture

This system implements a Planner-Executor-Supervisor multi-agent architecture on top of Simular Agent S3's vision-grounded execution framework. Agent S3 provides the core grounding and verification capabilities, while the custom multi-agent system handles test case planning, step-by-step execution, and assertion verification.

Framework Foundation

  • Simular Agent S3: Provides vision-grounded execution using LMMAgent and UI-TARS model for reliable element localization
  • Custom Multi-Agent System: Implements explicit Planner-Executor-Supervisor roles for structured test execution
  • See report.md for detailed framework selection rationale

Agents

  1. Planner Agent

    • Breaks down test cases into actionable steps
    • Uses Gemini LLM for planning
    • Outputs structured step-by-step plans
  2. Executor Agent

    • Executes planned actions using ADB commands
    • Uses grounding model (vLLM/UI-TARS) for element localization
    • Converts natural language actions to ADB commands
  3. Supervisor Agent

    • Verifies state transitions
    • Checks assertions
    • Distinguishes between execution failures and assertion failures
    • Determines final test results (PASS/FAIL)

Grounding Agent

The AndroidACI (Android Action Command Interface) class:

  • Handles ADB command execution
  • Uses Simular Agent S3's LMMAgent with UI-TARS model for vision-grounded coordinate generation
  • Provides action primitives: tap, swipe, type, press_key, etc.
  • Based on Agent S3's grounding infrastructure for reliable element localization

Installation

Install Python dependencies:

pip install -r requirements.txt
Requires: 3.9 =< Python <=3.12

Make sure the Agent-S directory is available (contains the gui_agents framework).

Set up environment variables in .env:

  • GEMINI_API_KEY - Gemini API key for LLM
  • HF_TOKEN - HuggingFace token (optional, for model access)

Configuration

Edit config.py to configure:

  • LLM Config: Gemini API settings for planning and execution
  • Grounding Config: vLLM endpoint with UI-TARS model for element localization
  • Android Config: Device settings (screen size, device ID)
  • Prommpts: Prompts for planner, executer and supervisor

Usage

Running Test Suite

# Run all test cases from test_cases.py
python run_mobile_qa.py

# Run with specific device
python run_mobile_qa.py --device-id emulator-5554

# Run single test case
python run_mobile_qa.py --test-case "Open Obsidian and create a vault"

# Limit steps per test
python run_mobile_qa.py --max-steps 15

Programmatic Usage

from config import DEFAULT_LLM_CONFIG, DEFAULT_GROUNDING_CONFIG, DEFAULT_ANDROID_CONFIG
from mobile_qa_agent.main import MobileQAAgent

# Initialize system
agent_system = MobileQAAgent(
    llm_config=DEFAULT_LLM_CONFIG,
    grounding_config=DEFAULT_GROUNDING_CONFIG,
    android_config=DEFAULT_ANDROID_CONFIG,
    max_steps=20
)

# Run a test case
result = agent_system.run_test_case("Open Obsidian and create a vault")
print(f"Result: {result['result']}")

ADB Commands Used

The system uses various ADB commands:

  • adb exec-out screencap -p - Take screenshot
  • adb shell input tap <x> <y> - Tap at coordinates
  • adb shell input swipe <x1> <y1> <x2> <y2> <duration> - Swipe/drag
  • adb shell input text "<text>" - Type text
  • adb shell input keyevent <code> - Press key (66=Enter, 4=Back, 3=Home)
  • adb shell wm size - Get screen size

Test Results

Results include:

  • status: Overall status (PASS/FAIL/ERROR)
  • result: Test result
  • reason: Explanation of result
  • plan: Step-by-step plan from Planner
  • execution_history: History of all executed steps
  • verification: Supervisor verification results

Prerequisites

  • Android device/emulator connected via ADB
  • Gemini API key configured in .env
  • vLLM server running with UI-TARS model (for grounding)

About

Multi-Agent system for Automate mobile QA testing using LLMs, ADB, and vision-grounded execution with Simular Agent S3 framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages