Agent Orchestration Framework with GUI Automation
Bridging Claude Code's orchestration patterns with OpenClaw's computer-use tooling
Quick Start · Architecture · Examples · API Reference · Contributing
Claude-Claw is an open-source Python framework for building autonomous GUI automation agents. It draws from two source architectures:
- Claude Code's agent orchestration model — AsyncGenerator-driven autonomous loops, multi-agent coordination (Tengu), smart tool partitioning, and 4-layer context compression
- OpenClaw's device control layer — screen capture, visual grounding, input simulation, and session lifecycle management
The framework enables AI agents to capture screen state, identify UI elements, and execute interactions — with built-in verification and support for parallel multi-agent workflows.
from claude_claw import Orchestrator
from claude_claw.agents import GUIAgent
from claude_claw.gui import get_all_gui_tools
agent = GUIAgent(llm_provider=my_llm, enable_verification=True)
result = await agent.run("Open Chrome, navigate to GitHub, and star this repo")The agent loop is built on an AsyncGenerator — not a one-shot request, but an autonomous cycle that continues until the task is complete or a turn limit is reached:
┌─────────────────────────────────────────────────────────────┐
│ AsyncGenerator Loop │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Assemble │──▶│ Call LLM │──▶│ Execute │──▶│ Verify │ │
│ │ Prompt │ │ (API) │ │ Tools │ │ Results │ │
│ └──────────┘ └──────────┘ └──────────┘ └────┬────┘ │
│ ▲ │ │
│ └─────────────────────────────────────────────┘ │
│ Loop until done │
└─────────────────────────────────────────────────────────────┘
Each tool is annotated with is_read_only. The orchestrator uses this to batch read-only calls concurrently and serialize state-modifying ones:
Tool Calls: [screenshot, find_element, click, type_text, screenshot]
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─ read-only ──┐ ┌── state-modifying ──┐ ┌─ read-only ─┐
│ PARALLEL │ │ SERIAL │ │ PARALLEL │
│ Batch 1 │ │ Batch 2 │ │ Batch 3 │
└───────────────┘ └─────────────────────┘ └─────────────┘
Prevents token overflow for long-running agents:
| Layer | Trigger | Action |
|---|---|---|
| Layer 1 | Tool output exceeds threshold | Persist to disk, pass file path to model |
| Layer 2 | Old tool results in context | Replace with placeholder after timeout |
| Layer 3 | Context approaching limit | LLM summarizes history in-place |
| Layer 4 | Circuit breaker | Halt after 3 consecutive failed compressions |
Coordinator agents dispatch and synthesize work from multiple specialized workers:
┌─────────────────────┐
│ Coordinator Agent │
│ (goal + synthesis) │
└────────┬────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Scout │ │ Worker │ │ Verifier │
│ Agent │ │ Agent │ │ Agent │
│ (explore) │ │ (execute) │ │ (verify) │
└─────┬─────┘ └─────┬─────┘ └───────────┘
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ GUI Tools │ │ GUI Tools │
│ read-only │ │ read+write│
└───────────┘ └───────────┘
Workers can run in parallel on independent sub-tasks. The coordinator synthesizes results rather than passing them through verbatim.
pip install claude-claw
# With LLM provider support
pip install claude-claw[anthropic] # Claude
pip install claude-claw[openai] # GPT-4
pip install claude-claw[all] # All providersgit clone https://github.com/huyuelin/Claude-Claw.git
cd Claude-Claw
pip install -e ".[dev]"Step 1 — Single tool (screenshot)
from claude_claw.gui import create_screenshot_tool
screenshot = create_screenshot_tool()
result = await screenshot.execute({"max_width": 1280})
# result['data'] → base64 PNG, result['width'], result['height']Step 2 — Tool Registry
from claude_claw.core import ToolRegistry
from claude_claw.gui import get_all_gui_tools
registry = ToolRegistry()
for tool in get_all_gui_tools():
registry.register(tool)
# Inspect partitioning
read_only = registry.get_read_only() # screenshot, find_element, wait_for
modifying = registry.get_state_modifying() # click, type_text, scroll, drag, hotkeyStep 3 — Orchestrator with agent isolation
from claude_claw import Orchestrator
orchestrator = Orchestrator()
orchestrator.register_tools(get_all_gui_tools())
# Each agent runs in an isolated ContextVar scope
async with orchestrator.spawn_agent("scout", agent_type=AgentType.EXPLORER) as agent:
worker_id = await orchestrator.spawn_worker(
task="Locate the login button coordinates",
name="button-finder",
)
result = orchestrator.get_result(agent.agent_id)Step 4 — Verification
from claude_claw.verification import VerificationAgent, CheckResult, CheckStatus
verifier = VerificationAgent()
verifier.add_gui_checks() # screen_responsive, no_error_dialog, element_visible, screen_changed
@verifier.check("login_success")
async def check_login(context):
screenshot = await take_screenshot()
text = await ocr(screenshot)
return CheckResult(
name="login_success",
status=CheckStatus.PASSED if "Dashboard" in text else CheckStatus.FAILED,
message="Expected dashboard after login",
)
result = await verifier.verify({"action": "login"})
print(result.summary) # "PASSED: 5 passed, 0 failed in 234ms"Step 5 — Multi-agent coordinator
from claude_claw.agents import CoordinatorAgent
coordinator = CoordinatorAgent(llm_provider=my_llm)
result = await coordinator.execute(
goal="Fill out the job application form and submit it",
workers=[
{"name": "scout", "task": "Screenshot the form, identify all fields",
"tools": ["screenshot", "find_element"]},
{"name": "filler", "task": "Fill each field with appropriate data",
"tools": ["click", "type_text", "find_element"]},
{"name": "submitter", "task": "Review, click Submit, verify success",
"tools": ["screenshot", "click", "wait_for"]},
],
)
print(result['verification']['summary'])claude-claw tools --list
claude-claw run "Take a screenshot of my desktop"
claude-claw version| Example | Description |
|---|---|
| basic_gui_automation.py | Single agent: screenshot, click, type, verify |
| multi_agent_coordinator.py | Coordinator dispatching parallel workers |
| Class | Description |
|---|---|
Orchestrator |
Spawn, kill, steer, and monitor agents |
AgentContext |
Per-agent isolated execution scope (ContextVar) |
ToolRegistry |
Register, discover, and export tool schemas |
Tool |
Base tool: name, schema, execute, permissions |
| Tool | Category | Read-Only | Description |
|---|---|---|---|
screenshot |
gui | ✅ | Capture full screen or region |
click |
gui | ❌ | Click at (x, y) coordinates |
type_text |
gui | ❌ | Send keyboard input |
scroll |
gui | ❌ | Scroll vertically |
hotkey |
gui | ❌ | Keyboard shortcuts (e.g. Ctrl+C) |
drag |
gui | ❌ | Drag from point to point |
find_element |
gui | ✅ | Locate UI element via OCR or vision LLM |
wait_for |
gui | ✅ | Poll until element appears |
screen_record |
media | ✅ | Record screen activity to GIF |
| Agent | Description |
|---|---|
GUIAgent |
Autonomous loop: screenshot → analyze → act → verify |
CoordinatorAgent |
Orchestrates workers using Tengu pattern |
VerificationAgent |
Runs structured check pipelines with custom rules |
- AsyncGenerator loop — autonomous until completion, not a single inference call
- ContextVar isolation — each agent context is scoped independently, preventing state leakage
- Tool partitioning — concurrent reads, serial writes, automatic batching
- 4-layer context compression — handles long-horizon tasks without hitting token limits
- Tengu coordinator — explicit synthesizing step, not pass-through aggregation
- Device/node tools — screen, input, media as first-class tool categories
- Session lifecycle — agents as long-lived sessions with explicit status transitions
- Subagent control — list, kill, steer running agents by ID
- Gateway architecture — tools operate through a gateway with scoped permission checks
- Visual grounding — hybrid element detection: OCR, vision LLM, template matching
- Verification pipeline — structured pass/fail checks with pre/post hooks, not just assertions
- Provider-agnostic LLM interface — compatible with Claude, GPT-4, and local inference servers
- Pure Python asyncio — no Node.js or browser runtime required
- Native vision LLM grounding (Claude computer-use, GPT-4V)
- Browser automation via Playwright
- Mobile device control (Android ADB, iOS Shortcuts)
- Cross-session agent memory
- MCP server support
- Accessibility tree integration (pyatspi, pywinauto)
- Agent swarm monitoring dashboard
- AST-level command safety (tree-sitter)
Contributions welcome — bug fixes, new tools, documentation, tests, or architecture discussions. See CONTRIBUTING.md for guidelines.
MIT — see LICENSE for details.
Not affiliated with Anthropic or OpenClaw. Built from independent analysis of their published engineering.