Skip to content

huyuelin/Claude-Claw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude-Claw Banner

Claude-Claw

Agent Orchestration Framework with GUI Automation

Bridging Claude Code's orchestration patterns with OpenClaw's computer-use tooling

Python 3.10+ License: MIT Stars

Quick Start · Architecture · Examples · API Reference · Contributing


What is Claude-Claw?

Claude-Claw is an open-source Python framework for building autonomous GUI automation agents. It draws from two source architectures:

  • Claude Code's agent orchestration model — AsyncGenerator-driven autonomous loops, multi-agent coordination (Tengu), smart tool partitioning, and 4-layer context compression
  • OpenClaw's device control layer — screen capture, visual grounding, input simulation, and session lifecycle management

The framework enables AI agents to capture screen state, identify UI elements, and execute interactions — with built-in verification and support for parallel multi-agent workflows.

from claude_claw import Orchestrator
from claude_claw.agents import GUIAgent
from claude_claw.gui import get_all_gui_tools

agent = GUIAgent(llm_provider=my_llm, enable_verification=True)
result = await agent.run("Open Chrome, navigate to GitHub, and star this repo")

Architecture

Core Loop

The agent loop is built on an AsyncGenerator — not a one-shot request, but an autonomous cycle that continues until the task is complete or a turn limit is reached:

┌─────────────────────────────────────────────────────────────┐
│                    AsyncGenerator Loop                       │
│                                                             │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐  │
│  │ Assemble │──▶│ Call LLM │──▶│ Execute  │──▶│ Verify  │  │
│  │ Prompt   │   │  (API)   │   │  Tools   │   │ Results │  │
│  └──────────┘   └──────────┘   └──────────┘   └────┬────┘  │
│       ▲                                             │       │
│       └─────────────────────────────────────────────┘       │
│                     Loop until done                          │
└─────────────────────────────────────────────────────────────┘

Smart Tool Partitioning

Each tool is annotated with is_read_only. The orchestrator uses this to batch read-only calls concurrently and serialize state-modifying ones:

Tool Calls: [screenshot, find_element, click, type_text, screenshot]
                │              │          │        │          │
                ▼              ▼          ▼        ▼          ▼
          ┌─ read-only ──┐  ┌── state-modifying ──┐  ┌─ read-only ─┐
          │  PARALLEL     │  │    SERIAL            │  │  PARALLEL   │
          │  Batch 1      │  │    Batch 2           │  │  Batch 3    │
          └───────────────┘  └─────────────────────┘  └─────────────┘

4-Layer Context Compression

Prevents token overflow for long-running agents:

Layer Trigger Action
Layer 1 Tool output exceeds threshold Persist to disk, pass file path to model
Layer 2 Old tool results in context Replace with placeholder after timeout
Layer 3 Context approaching limit LLM summarizes history in-place
Layer 4 Circuit breaker Halt after 3 consecutive failed compressions

Multi-Agent Hierarchy (Tengu Pattern)

Coordinator agents dispatch and synthesize work from multiple specialized workers:

                    ┌─────────────────────┐
                    │   Coordinator Agent  │
                    │   (goal + synthesis) │
                    └────────┬────────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌───────────┐ ┌───────────┐ ┌───────────┐
        │  Scout    │ │  Worker   │ │ Verifier  │
        │  Agent    │ │  Agent    │ │  Agent    │
        │ (explore) │ │ (execute) │ │ (verify)  │
        └─────┬─────┘ └─────┬─────┘ └───────────┘
              │              │
        ┌─────┴─────┐ ┌─────┴─────┐
        │ GUI Tools │ │ GUI Tools │
        │ read-only │ │ read+write│
        └───────────┘ └───────────┘

Workers can run in parallel on independent sub-tasks. The coordinator synthesizes results rather than passing them through verbatim.


Quick Start

Installation

pip install claude-claw

# With LLM provider support
pip install claude-claw[anthropic]   # Claude
pip install claude-claw[openai]      # GPT-4
pip install claude-claw[all]         # All providers

From Source

git clone https://github.com/huyuelin/Claude-Claw.git
cd Claude-Claw
pip install -e ".[dev]"

Basic Usage

Step 1 — Single tool (screenshot)

from claude_claw.gui import create_screenshot_tool

screenshot = create_screenshot_tool()
result = await screenshot.execute({"max_width": 1280})
# result['data'] → base64 PNG, result['width'], result['height']

Step 2 — Tool Registry

from claude_claw.core import ToolRegistry
from claude_claw.gui import get_all_gui_tools

registry = ToolRegistry()
for tool in get_all_gui_tools():
    registry.register(tool)

# Inspect partitioning
read_only = registry.get_read_only()       # screenshot, find_element, wait_for
modifying = registry.get_state_modifying() # click, type_text, scroll, drag, hotkey

Step 3 — Orchestrator with agent isolation

from claude_claw import Orchestrator

orchestrator = Orchestrator()
orchestrator.register_tools(get_all_gui_tools())

# Each agent runs in an isolated ContextVar scope
async with orchestrator.spawn_agent("scout", agent_type=AgentType.EXPLORER) as agent:
    worker_id = await orchestrator.spawn_worker(
        task="Locate the login button coordinates",
        name="button-finder",
    )

result = orchestrator.get_result(agent.agent_id)

Step 4 — Verification

from claude_claw.verification import VerificationAgent, CheckResult, CheckStatus

verifier = VerificationAgent()
verifier.add_gui_checks()  # screen_responsive, no_error_dialog, element_visible, screen_changed

@verifier.check("login_success")
async def check_login(context):
    screenshot = await take_screenshot()
    text = await ocr(screenshot)
    return CheckResult(
        name="login_success",
        status=CheckStatus.PASSED if "Dashboard" in text else CheckStatus.FAILED,
        message="Expected dashboard after login",
    )

result = await verifier.verify({"action": "login"})
print(result.summary)  # "PASSED: 5 passed, 0 failed in 234ms"

Step 5 — Multi-agent coordinator

from claude_claw.agents import CoordinatorAgent

coordinator = CoordinatorAgent(llm_provider=my_llm)

result = await coordinator.execute(
    goal="Fill out the job application form and submit it",
    workers=[
        {"name": "scout",     "task": "Screenshot the form, identify all fields",
         "tools": ["screenshot", "find_element"]},
        {"name": "filler",    "task": "Fill each field with appropriate data",
         "tools": ["click", "type_text", "find_element"]},
        {"name": "submitter", "task": "Review, click Submit, verify success",
         "tools": ["screenshot", "click", "wait_for"]},
    ],
)

print(result['verification']['summary'])

CLI

claude-claw tools --list
claude-claw run "Take a screenshot of my desktop"
claude-claw version

Examples

Example Description
basic_gui_automation.py Single agent: screenshot, click, type, verify
multi_agent_coordinator.py Coordinator dispatching parallel workers

API Reference

Core

Class Description
Orchestrator Spawn, kill, steer, and monitor agents
AgentContext Per-agent isolated execution scope (ContextVar)
ToolRegistry Register, discover, and export tool schemas
Tool Base tool: name, schema, execute, permissions

GUI Tools

Tool Category Read-Only Description
screenshot gui Capture full screen or region
click gui Click at (x, y) coordinates
type_text gui Send keyboard input
scroll gui Scroll vertically
hotkey gui Keyboard shortcuts (e.g. Ctrl+C)
drag gui Drag from point to point
find_element gui Locate UI element via OCR or vision LLM
wait_for gui Poll until element appears
screen_record media Record screen activity to GIF

Agents

Agent Description
GUIAgent Autonomous loop: screenshot → analyze → act → verify
CoordinatorAgent Orchestrates workers using Tengu pattern
VerificationAgent Runs structured check pipelines with custom rules

Design Decisions

From Claude Code

  • AsyncGenerator loop — autonomous until completion, not a single inference call
  • ContextVar isolation — each agent context is scoped independently, preventing state leakage
  • Tool partitioning — concurrent reads, serial writes, automatic batching
  • 4-layer context compression — handles long-horizon tasks without hitting token limits
  • Tengu coordinator — explicit synthesizing step, not pass-through aggregation

From OpenClaw

  • Device/node tools — screen, input, media as first-class tool categories
  • Session lifecycle — agents as long-lived sessions with explicit status transitions
  • Subagent control — list, kill, steer running agents by ID
  • Gateway architecture — tools operate through a gateway with scoped permission checks

Framework additions

  • Visual grounding — hybrid element detection: OCR, vision LLM, template matching
  • Verification pipeline — structured pass/fail checks with pre/post hooks, not just assertions
  • Provider-agnostic LLM interface — compatible with Claude, GPT-4, and local inference servers
  • Pure Python asyncio — no Node.js or browser runtime required

Roadmap

  • Native vision LLM grounding (Claude computer-use, GPT-4V)
  • Browser automation via Playwright
  • Mobile device control (Android ADB, iOS Shortcuts)
  • Cross-session agent memory
  • MCP server support
  • Accessibility tree integration (pyatspi, pywinauto)
  • Agent swarm monitoring dashboard
  • AST-level command safety (tree-sitter)

Contributing

Contributions welcome — bug fixes, new tools, documentation, tests, or architecture discussions. See CONTRIBUTING.md for guidelines.


License

MIT — see LICENSE for details.


Not affiliated with Anthropic or OpenClaw. Built from independent analysis of their published engineering.

Releases

No releases published

Packages

 
 
 

Contributors

Languages