A Supervisor Planner-Executor architecture for automated mobile app testing using LLMs and ADB commands with vision grounding. Built on Simular Agent S3 framework for vision-grounded execution and reliable task verification.
This system implements a Planner-Executor-Supervisor multi-agent architecture on top of Simular Agent S3's vision-grounded execution framework. Agent S3 provides the core grounding and verification capabilities, while the custom multi-agent system handles test case planning, step-by-step execution, and assertion verification.
- Simular Agent S3: Provides vision-grounded execution using LMMAgent and UI-TARS model for reliable element localization
- Custom Multi-Agent System: Implements explicit Planner-Executor-Supervisor roles for structured test execution
- See
report.mdfor detailed framework selection rationale
-
Planner Agent
- Breaks down test cases into actionable steps
- Uses Gemini LLM for planning
- Outputs structured step-by-step plans
-
Executor Agent
- Executes planned actions using ADB commands
- Uses grounding model (vLLM/UI-TARS) for element localization
- Converts natural language actions to ADB commands
-
Supervisor Agent
- Verifies state transitions
- Checks assertions
- Distinguishes between execution failures and assertion failures
- Determines final test results (PASS/FAIL)
The AndroidACI (Android Action Command Interface) class:
- Handles ADB command execution
- Uses Simular Agent S3's LMMAgent with UI-TARS model for vision-grounded coordinate generation
- Provides action primitives: tap, swipe, type, press_key, etc.
- Based on Agent S3's grounding infrastructure for reliable element localization
Install Python dependencies:
pip install -r requirements.txt
Requires: 3.9 =< Python <=3.12Make sure the Agent-S directory is available (contains the gui_agents framework).
Set up environment variables in .env:
GEMINI_API_KEY- Gemini API key for LLMHF_TOKEN- HuggingFace token (optional, for model access)
Edit config.py to configure:
- LLM Config: Gemini API settings for planning and execution
- Grounding Config: vLLM endpoint with UI-TARS model for element localization
- Android Config: Device settings (screen size, device ID)
- Prommpts: Prompts for planner, executer and supervisor
# Run all test cases from test_cases.py
python run_mobile_qa.py
# Run with specific device
python run_mobile_qa.py --device-id emulator-5554
# Run single test case
python run_mobile_qa.py --test-case "Open Obsidian and create a vault"
# Limit steps per test
python run_mobile_qa.py --max-steps 15from config import DEFAULT_LLM_CONFIG, DEFAULT_GROUNDING_CONFIG, DEFAULT_ANDROID_CONFIG
from mobile_qa_agent.main import MobileQAAgent
# Initialize system
agent_system = MobileQAAgent(
llm_config=DEFAULT_LLM_CONFIG,
grounding_config=DEFAULT_GROUNDING_CONFIG,
android_config=DEFAULT_ANDROID_CONFIG,
max_steps=20
)
# Run a test case
result = agent_system.run_test_case("Open Obsidian and create a vault")
print(f"Result: {result['result']}")The system uses various ADB commands:
adb exec-out screencap -p- Take screenshotadb shell input tap <x> <y>- Tap at coordinatesadb shell input swipe <x1> <y1> <x2> <y2> <duration>- Swipe/dragadb shell input text "<text>"- Type textadb shell input keyevent <code>- Press key (66=Enter, 4=Back, 3=Home)adb shell wm size- Get screen size
Results include:
status: Overall status (PASS/FAIL/ERROR)result: Test resultreason: Explanation of resultplan: Step-by-step plan from Plannerexecution_history: History of all executed stepsverification: Supervisor verification results
- Android device/emulator connected via ADB
- Gemini API key configured in
.env - vLLM server running with UI-TARS model (for grounding)