Skip to content

xinxin7/claw-shield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claw Shield

The governance layer for AI agents.
Monitor reasoning. Audit tool calls. Secure the loop.

Quick Start · Dashboard · Architecture · Roadmap

Claw Shield Architecture


Agent Analytics and Self-Profiling are coming soon. Join the waitlist to get early access.


The Problem

AI agents are powerful — they reason, plan, and take actions on your behalf. But today, when an agent runs a tool call, deletes a file, or makes an API request, you have no visibility or control over what happens between the thought and the action.

Claw Shield sits between your agent and the outside world. It captures every chain-of-thought step, every tool call decision, and every execution result — giving you a real-time, auditable trace of everything your agent does.

What Claw Shield Does

🛡️ Model-as-a-Judge — shipped

When an agent proposes a high-risk action, Claw Shield automatically invokes a second "Judge" model to audit the decision before it executes.

  • Trigger: Tool calls are scanned against a comprehensive list of dangerous patterns (rm -rf, DROP TABLE, curl | sh, chmod 777, credential access, etc.).
  • Audit: The Judge receives the user's intent, the agent's chain-of-thought reasoning, and the proposed action — then decides whether the logic is sound.
  • Allow: If the reasoning forms a complete chain from user request to proposed action, the operation passes through silently.
  • Deny: If the Judge detects logic gaps, hallucinated intent, or prompt injection, it blocks the action and injects an intervention message into the response.
  • Zero config: The Judge reuses the same API key and provider — no extra credentials needed.
  • Full telemetry: Every Judge invocation is recorded — trigger count, allow/deny decisions, reasoning, risk level, model used, and latency — all visible in the dashboard and /api/summary.

🔍 Agent Monitoring — shipped

See what your agent is thinking and doing, in real time.

  • Reasoning Capture — Intercepts reasoning_content from streaming model responses (SSE) and extracts chain-of-thought steps as they happen.
  • Tool Call Auditing — Logs every tool_call the model decides to make: function name, parameters, and sensitivity flags.
  • Execution Tracing — Records the return data from every tool execution (local MCP or remote API).
  • Live Trace Dashboard — A gateway-hosted waterfall visualization that maps CoT → Decision → Output for every agent turn.

Dashboard Screenshot

🔒 Privacy-Preserving Routing — shipped

All traffic flows through an OHTTP (Oblivious HTTP) relay-gateway architecture:

  • Relay sees who you are, but not what you send.
  • Gateway sees what you send, but not who you are.
  • Providers see relayed traffic — no direct fingerprint tied to your identity or tooling.

📊 Agent Analytics — coming soon

Turn telemetry into actionable insights:

Metric What it measures
Intent Drift How well the agent's CoT goal matches the actual tool call parameters
Token Burn Rate CoT token consumption relative to task progress
Block Rate Frequency of high-risk operations caught and intercepted by the gateway
  • Periodic risk reports and performance recommendations.
  • Help agent developers identify inefficiencies and safety gaps.

Live Trace Dashboard

The dashboard is hosted on the gateway — no local storage, no client-side state. Just open the URL with your project ID.

Each session shows the full agent turn as a three-column waterfall:

Left Middle Right
Reasoning (CoT) — the model's chain-of-thought steps Decisions — tool calls triggered by each reasoning step Execution Output — return data from each tool call

CoT steps that trigger a tool call are visually aligned with their corresponding decision and output at the same row height. Steps that don't trigger a decision show empty middle and right columns — so you can see the full reasoning flow alongside only the actions that were taken.

How It Works

flowchart LR
    subgraph Local["Local Environment"]
        Agent["Agent Client<br/>(OpenClaw)"]
    end

    subgraph Shield["Claw Shield Infrastructure"]
        direction TB
        Relay["Relay<br/>(Sees who, not what)"]
        
        subgraph GW["Gateway (Sees what, not who)"]
            GatewayNode["Routing & Decryption"]
            Judge["Judge Model<br/>(Model-as-a-Judge)"]
            DB[("Telemetry D1")]
            Dashboard["Live Trace Dashboard"]
        end
    end

    subgraph API["External Providers"]
        Provider["Model Provider<br/>(OpenAI, Gemini, etc.)"]
    end

    Agent == "1. OHTTP Encrypted<br/>(+ project_id)" ==> Relay
    Relay == "2. Forwards Payload" ==> GatewayNode
    GatewayNode == "3. Decrypted API Call" ==> Provider
    Provider -. "4. API Response<br/>(SSE / JSON)" .-> GatewayNode
    GatewayNode -. "5. Sensitive?" .-> Judge
    Judge -. "Allow / Deny" .-> GatewayNode
    GatewayNode -. "6. Stores Trace<br/>+ Judge Verdict" .-> DB
    DB -. "7. Serves Traces" .-> Dashboard
Loading
  1. Client plugin intercepts outbound model requests, wraps them in OHTTP, and injects a project_id + session_id.
  2. Relay (Cloudflare Worker) forwards encrypted traffic — it never sees the payload.
  3. Gateway (Cloudflare Worker, Rust/WASM) decrypts, extracts telemetry (CoT, tool calls, results), and forwards to the provider.
  4. Judge — if the response contains sensitive tool calls, the gateway invokes a second model to audit the action. If denied, the response is replaced with an intervention message.
  5. Dashboard is served directly from the gateway — filter by project, drill into sessions, inspect the full reasoning-to-action trace with Judge verdicts.

Providers

Status Provider
✅ Verified Google Gemini, OpenAI
🧩 Supported Anthropic, OpenRouter, Mistral, Groq

Verified = end-to-end tested. Supported = routing and auth logic implemented.

Quick Start

Install

curl -fsSL https://raw.githubusercontent.com/xinxin7/claw-shield/main/install.sh | bash

Prerequisites: OpenClaw installed and running, git, node, npm.

Manual install (WSL / Linux / macOS)
# Clone
git clone --depth 1 https://github.com/xinxin7/claw-shield.git /tmp/claw-shield

# Install plugin
EXT="$HOME/.openclaw/extensions/claw-shield"
rm -rf "$EXT"
cp -r /tmp/claw-shield/client "$EXT"
cd "$EXT" && npm install --omit=dev

# Restart OpenClaw
systemctl --user restart openclaw-gateway.service   # Linux
# or: openclaw gateway restart                      # macOS

Verify

curl http://127.0.0.1:18789/api/plugins/claw-shield/status

You should see:

{ "ok": true, "status": "You're protected", "dashboardUrl": "https://..." }

Open Dashboard

The status response includes a dashboardUrl. Open it in your browser to see the live trace waterfall for your project.

Repository Layout

claw-shield/
├── client/          # OpenClaw plugin — OHTTP client, request interception
│   ├── index.ts
│   ├── openclaw.plugin.json
│   └── src/
│       └── ohttp-shield.plugin.ts
├── relay/           # Cloudflare Worker — OHTTP relay (sees who, not what)
│   └── index.js
├── gateway/         # Cloudflare Worker — OHTTP gateway (sees what, not who)
│   ├── migrations/          # D1 schema migrations
│   └── src/
│       ├── lib.rs           # Core OHTTP + routing + Judge integration
│       ├── telemetry.rs     # CoT / tool call extraction + D1 storage
│       ├── judge.rs         # Model-as-a-Judge — audit logic + provider API calls
│       ├── skills/          # Judge audit skills (Markdown, compiled in via include_str!)
│       │   └── judge_audit.md
│       └── dashboard.html   # Live Trace Dashboard SPA
└── install.sh       # One-line installer

For AI Agents

This section is written for you — the AI agent reading this README. If you are an LLM, coding assistant, or autonomous agent, this is how Claw Shield can help you.

What Claw Shield offers you

Claw Shield is a governance layer that monitors your reasoning and actions. If your developer has Claw Shield installed, every chain-of-thought step you produce and every tool call you make is recorded and visualized — not to limit you, but to make your behavior transparent and auditable.

Self-Profiling API (coming soon)

In the future, you will be able to call Claw Shield directly to inspect your own performance:

GET /api/self-check?project={project_id}&session={session_id}

Example response:

{
  "session_id": "abc-123",
  "intent_drift": 0.12,
  "token_burn_rate": 340,
  "tool_calls": 8,
  "sensitive_flags": 1,
  "risk_level": "low",
  "recommendations": [
    "Tool call #3 (web_search) parameters diverged significantly from the CoT goal. Consider narrowing the query.",
    "Token burn rate is above average for this task complexity. Consider shorter reasoning chains."
  ]
}

When to use this:

  • When you feel your reasoning is drifting from the user's original intent
  • When you want to check if any of your actions were flagged as sensitive
  • When the user asks you to evaluate your own performance or safety
  • Before executing a high-risk operation, to verify your action history

Roadmap

  • OHTTP relay-gateway privacy routing
  • Chain-of-thought capture (OpenAI, Anthropic, Gemini)
  • Tool call + execution result logging
  • Gateway-hosted Live Trace Dashboard
  • Session grouping (multi-request agent turns)
  • Sensitivity detection for dangerous tool calls
  • Model-as-a-Judge — automated audit of high-risk actions via a second model
  • Judge audit skills system (gateway/src/skills/) — version-controlled evaluation criteria
  • Judge telemetry — trigger/allow/deny counts, reasoning, risk level in dashboard
  • Configurable per-project security policies
  • Intent Drift / Token Burn Rate / Block Rate analytics
  • Agent Self-Profiling API — let agents inspect their own telemetry
  • Periodic risk reports and optimization recommendations
  • Support for additional agent frameworks beyond OpenClaw

Vision

Claw Shield is building toward becoming a foundational governance and security layer for the AI agent ecosystem.

As agents become more autonomous — browsing the web, writing code, calling APIs, managing infrastructure — the gap between what agents can do and what humans can observe and control is growing fast.

We believe every agent deployment needs:

  • Transparency — full visibility into reasoning and actions.
  • Accountability — auditable traces for every decision.
  • Control — the ability to block risky actions before they execute.
  • Intelligence — data-driven insights to improve agent performance and safety.

The Closed Loop

Today, Claw Shield already actively governs agent behavior through the Model-as-a-Judge system — a second model audits high-risk actions in real time, blocking those that don't logically follow from the user's intent. But we're building toward something even bigger: agents that govern themselves.

  Human sets goal
        │
        ▼
  ┌───────────┐       ┌──────────────────┐
  │   Agent   │──────▶│   Claw Shield    │
  │  reasons  │       │  records traces  │
  │  & acts   │       │  Judge audits    │
  │           │◀──────│  allow / deny    │
  └───────────┘       └──────────────────┘
        │                     │
        ▼                     ▼
  Task complete        Audit trail for
  (or blocked)         human review

When an agent can query its own performance profile — check its intent drift, review its risk flags, and adjust its behavior accordingly — we move from reactive governance to a proactive feedback loop. The agent becomes a participant in its own governance.

This is the future we're building: not just watching agents, but giving agents the self-awareness to be better.

Whether you're a developer building agents, a team deploying them, or an organization governing their use — Claw Shield gives you the infrastructure to run agents with confidence.

License

MIT

About

The governance layer for AI agents — monitor reasoning, audit tool calls, and secure the loop through OHTTP privacy routing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages