Claude-Claw

Agent Orchestration Framework with GUI Automation

Bridging Claude Code's orchestration patterns with OpenClaw's computer-use tooling

Quick Start · Architecture · Examples · API Reference · Contributing

What is Claude-Claw?

Claude-Claw is an open-source Python framework for building autonomous GUI automation agents. It draws from two source architectures:

Claude Code's agent orchestration model — AsyncGenerator-driven autonomous loops, multi-agent coordination (Tengu), smart tool partitioning, and 4-layer context compression
OpenClaw's device control layer — screen capture, visual grounding, input simulation, and session lifecycle management

The framework enables AI agents to capture screen state, identify UI elements, and execute interactions — with built-in verification and support for parallel multi-agent workflows.

from claude_claw import Orchestrator
from claude_claw.agents import GUIAgent
from claude_claw.gui import get_all_gui_tools

agent = GUIAgent(llm_provider=my_llm, enable_verification=True)
result = await agent.run("Open Chrome, navigate to GitHub, and star this repo")

Architecture

Core Loop

The agent loop is built on an AsyncGenerator — not a one-shot request, but an autonomous cycle that continues until the task is complete or a turn limit is reached:

┌─────────────────────────────────────────────────────────────┐
│                    AsyncGenerator Loop                       │
│                                                             │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐  │
│  │ Assemble │──▶│ Call LLM │──▶│ Execute  │──▶│ Verify  │  │
│  │ Prompt   │   │  (API)   │   │  Tools   │   │ Results │  │
│  └──────────┘   └──────────┘   └──────────┘   └────┬────┘  │
│       ▲                                             │       │
│       └─────────────────────────────────────────────┘       │
│                     Loop until done                          │
└─────────────────────────────────────────────────────────────┘

Smart Tool Partitioning

Each tool is annotated with is_read_only. The orchestrator uses this to batch read-only calls concurrently and serialize state-modifying ones:

Tool Calls: [screenshot, find_element, click, type_text, screenshot]
                │              │          │        │          │
                ▼              ▼          ▼        ▼          ▼
          ┌─ read-only ──┐  ┌── state-modifying ──┐  ┌─ read-only ─┐
          │  PARALLEL     │  │    SERIAL            │  │  PARALLEL   │
          │  Batch 1      │  │    Batch 2           │  │  Batch 3    │
          └───────────────┘  └─────────────────────┘  └─────────────┘

4-Layer Context Compression

Prevents token overflow for long-running agents:

Layer	Trigger	Action
Layer 1	Tool output exceeds threshold	Persist to disk, pass file path to model
Layer 2	Old tool results in context	Replace with placeholder after timeout
Layer 3	Context approaching limit	LLM summarizes history in-place
Layer 4	Circuit breaker	Halt after 3 consecutive failed compressions

Multi-Agent Hierarchy (Tengu Pattern)

Coordinator agents dispatch and synthesize work from multiple specialized workers:

                    ┌─────────────────────┐
                    │   Coordinator Agent  │
                    │   (goal + synthesis) │
                    └────────┬────────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌───────────┐ ┌───────────┐ ┌───────────┐
        │  Scout    │ │  Worker   │ │ Verifier  │
        │  Agent    │ │  Agent    │ │  Agent    │
        │ (explore) │ │ (execute) │ │ (verify)  │
        └─────┬─────┘ └─────┬─────┘ └───────────┘
              │              │
        ┌─────┴─────┐ ┌─────┴─────┐
        │ GUI Tools │ │ GUI Tools │
        │ read-only │ │ read+write│
        └───────────┘ └───────────┘

Workers can run in parallel on independent sub-tasks. The coordinator synthesizes results rather than passing them through verbatim.

Quick Start

Installation

pip install claude-claw

# With LLM provider support
pip install claude-claw[anthropic]   # Claude
pip install claude-claw[openai]      # GPT-4
pip install claude-claw[all]         # All providers

From Source

git clone https://github.com/huyuelin/Claude-Claw.git
cd Claude-Claw
pip install -e ".[dev]"

Basic Usage

Step 1 — Single tool (screenshot)

from claude_claw.gui import create_screenshot_tool

screenshot = create_screenshot_tool()
result = await screenshot.execute({"max_width": 1280})
# result['data'] → base64 PNG, result['width'], result['height']

Step 2 — Tool Registry

from claude_claw.core import ToolRegistry
from claude_claw.gui import get_all_gui_tools

registry = ToolRegistry()
for tool in get_all_gui_tools():
    registry.register(tool)

# Inspect partitioning
read_only = registry.get_read_only()       # screenshot, find_element, wait_for
modifying = registry.get_state_modifying() # click, type_text, scroll, drag, hotkey

Step 3 — Orchestrator with agent isolation

from claude_claw import Orchestrator

orchestrator = Orchestrator()
orchestrator.register_tools(get_all_gui_tools())

# Each agent runs in an isolated ContextVar scope
async with orchestrator.spawn_agent("scout", agent_type=AgentType.EXPLORER) as agent:
    worker_id = await orchestrator.spawn_worker(
        task="Locate the login button coordinates",
        name="button-finder",
    )

result = orchestrator.get_result(agent.agent_id)

Step 4 — Verification

from claude_claw.verification import VerificationAgent, CheckResult, CheckStatus

verifier = VerificationAgent()
verifier.add_gui_checks()  # screen_responsive, no_error_dialog, element_visible, screen_changed

@verifier.check("login_success")
async def check_login(context):
    screenshot = await take_screenshot()
    text = await ocr(screenshot)
    return CheckResult(
        name="login_success",
        status=CheckStatus.PASSED if "Dashboard" in text else CheckStatus.FAILED,
        message="Expected dashboard after login",
    )

result = await verifier.verify({"action": "login"})
print(result.summary)  # "PASSED: 5 passed, 0 failed in 234ms"

Step 5 — Multi-agent coordinator

from claude_claw.agents import CoordinatorAgent

coordinator = CoordinatorAgent(llm_provider=my_llm)

result = await coordinator.execute(
    goal="Fill out the job application form and submit it",
    workers=[
        {"name": "scout",     "task": "Screenshot the form, identify all fields",
         "tools": ["screenshot", "find_element"]},
        {"name": "filler",    "task": "Fill each field with appropriate data",
         "tools": ["click", "type_text", "find_element"]},
        {"name": "submitter", "task": "Review, click Submit, verify success",
         "tools": ["screenshot", "click", "wait_for"]},
    ],
)

print(result['verification']['summary'])

CLI

claude-claw tools --list
claude-claw run "Take a screenshot of my desktop"
claude-claw version

Examples

Example	Description
basic_gui_automation.py	Single agent: screenshot, click, type, verify
multi_agent_coordinator.py	Coordinator dispatching parallel workers

API Reference

Core

Class	Description
`Orchestrator`	Spawn, kill, steer, and monitor agents
`AgentContext`	Per-agent isolated execution scope (ContextVar)
`ToolRegistry`	Register, discover, and export tool schemas
`Tool`	Base tool: name, schema, execute, permissions

GUI Tools

Tool	Category	Read-Only	Description
`screenshot`	gui	✅	Capture full screen or region
`click`	gui	❌	Click at (x, y) coordinates
`type_text`	gui	❌	Send keyboard input
`scroll`	gui	❌	Scroll vertically
`hotkey`	gui	❌	Keyboard shortcuts (e.g. Ctrl+C)
`drag`	gui	❌	Drag from point to point
`find_element`	gui	✅	Locate UI element via OCR or vision LLM
`wait_for`	gui	✅	Poll until element appears
`screen_record`	media	✅	Record screen activity to GIF

Agents

Agent	Description
`GUIAgent`	Autonomous loop: screenshot → analyze → act → verify
`CoordinatorAgent`	Orchestrates workers using Tengu pattern
`VerificationAgent`	Runs structured check pipelines with custom rules

Design Decisions

From Claude Code

AsyncGenerator loop — autonomous until completion, not a single inference call
ContextVar isolation — each agent context is scoped independently, preventing state leakage
Tool partitioning — concurrent reads, serial writes, automatic batching
4-layer context compression — handles long-horizon tasks without hitting token limits
Tengu coordinator — explicit synthesizing step, not pass-through aggregation

From OpenClaw

Device/node tools — screen, input, media as first-class tool categories
Session lifecycle — agents as long-lived sessions with explicit status transitions
Subagent control — list, kill, steer running agents by ID
Gateway architecture — tools operate through a gateway with scoped permission checks

Framework additions

Visual grounding — hybrid element detection: OCR, vision LLM, template matching
Verification pipeline — structured pass/fail checks with pre/post hooks, not just assertions
Provider-agnostic LLM interface — compatible with Claude, GPT-4, and local inference servers
Pure Python asyncio — no Node.js or browser runtime required

Roadmap

Native vision LLM grounding (Claude computer-use, GPT-4V)
Browser automation via Playwright
Mobile device control (Android ADB, iOS Shortcuts)
Cross-session agent memory
MCP server support
Accessibility tree integration (pyatspi, pywinauto)
Agent swarm monitoring dashboard
AST-level command safety (tree-sitter)

Contributing

Contributions welcome — bug fixes, new tools, documentation, tests, or architecture discussions. See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE for details.

_{Not affiliated with Anthropic or OpenClaw. Built from independent analysis of their published engineering.}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
claude_claw		claude_claw
examples		examples
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude-Claw

What is Claude-Claw?

Architecture

Core Loop

Smart Tool Partitioning

4-Layer Context Compression

Multi-Agent Hierarchy (Tengu Pattern)

Quick Start

Installation

From Source

Basic Usage

CLI

Examples

API Reference

Core

GUI Tools

Agents

Design Decisions

From Claude Code

From OpenClaw

Framework additions

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude-Claw

What is Claude-Claw?

Architecture

Core Loop

Smart Tool Partitioning

4-Layer Context Compression

Multi-Agent Hierarchy (Tengu Pattern)

Quick Start

Installation

From Source

Basic Usage

CLI

Examples

API Reference

Core

GUI Tools

Agents

Design Decisions

From Claude Code

From OpenClaw

Framework additions

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages