Skip to content

NeuZhou/clawguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈπŸ¦€ ClawGuard

AI Agent Immune System

285+ security patterns Β· Risk scoring Β· Policy engine Β· Insider threat detection

CI npm version License: AGPL-3.0 Zero Dependencies Node.js >= 18 Tests GitHub Stars

Quick Start Β· Features Β· Architecture Β· Comparison Β· Contributing


πŸ’‘ Why This Exists

AI agents have access to your files, tools, shell, and secrets. A single prompt injection can:

  • Exfiltrate API keys via tool calls
  • Hijack the agent's identity by overwriting personality files
  • Register shadow MCP servers to intercept tool calls
  • Install backdoored skills with obfuscated reverse shells
  • The agent itself can become the threat β€” self-preservation, deception, goal misalignment

ClawGuard catches these attacks before they execute.


πŸš€ Quick Start

As CLI Tool

# Scan a directory for threats
npx @neuzhou/clawguard scan ./path/to/scan

# Strict mode (exit code 1 on high/critical findings)
npx @neuzhou/clawguard scan ./skills/ --strict

# SARIF output for GitHub Code Scanning
npx @neuzhou/clawguard scan . --format sarif > results.sarif

# Generate default config
npx @neuzhou/clawguard init

As npm Library

npm install @neuzhou/clawguard
import { runSecurityScan, calculateRisk, evaluateToolCall } from '@neuzhou/clawguard';

// Scan content for threats
const findings = runSecurityScan(message.content, 'inbound', context);

// Get risk score
const risk = calculateRisk(findings);
if (risk.verdict === 'MALICIOUS') { /* block */ }

// Evaluate tool call safety
const decision = evaluateToolCall('exec', { command: 'rm -rf /' });
// β†’ { decision: 'deny', reason: 'Dangerous command', severity: 'critical' }

As OpenClaw Skill

clawhub install clawguard

Then ask your agent: "scan my skills for security threats"

As OpenClaw Hook Pack (Real-Time Protection)

openclaw hooks install clawguard
openclaw hooks enable clawguard-guard    # Scans every message
openclaw hooks enable clawguard-policy   # Enforces tool call policies

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ClawGuard                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  CLI     β”‚  Hooks   β”‚ Scanner  β”‚  Dashboard :19790    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ Risk Engine   β”‚ β”‚Policy Engine β”‚ β”‚Insider Threat  β”‚β”‚
β”‚  β”‚ Score 0-100   β”‚ β”‚ allow/deny   β”‚ β”‚ AI Misalign.   β”‚β”‚
β”‚  β”‚ Chain Detect  β”‚ β”‚ exec/file/   β”‚ β”‚ 5 categories   β”‚β”‚
β”‚  β”‚ Multipliers   β”‚ β”‚ browser/msg  β”‚ β”‚ 39 patterns    β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              Security Engine β€” 285+ Patterns          β”‚
β”‚  β€’ Prompt Injection (93)   β€’ Data Leakage (62)       β”‚
β”‚  β€’ Insider Threat (39)     β€’ Supply Chain (35)        β”‚
β”‚  β€’ Identity Protection (19)β€’ MCP Security (20)        β”‚
β”‚  β€’ File Protection (16)    β€’ Anomaly Detection        β”‚
β”‚  β€’ Compliance                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Exporters: JSONL Β· Syslog/CEF Β· Webhook Β· SARIF     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”’ Key Features

🎯 Risk Score Engine

Weighted scoring with attack chain detection and multiplier system:

import { calculateRisk } from '@neuzhou/clawguard';

const result = calculateRisk(findings);
// β†’ { score: 87, verdict: 'MALICIOUS', icon: '🚨',
//    attackChains: ['credential-exfiltration'],
//    enrichedFindings: [...] }
  • Severity weights: critical=40, high=15, medium=5, low=2
  • Confidence scoring: every finding carries a confidence (0–1)
  • Attack chain detection: auto-correlates findings into combo attacks
    • credential + exfiltration β†’ 2.2Γ— multiplier
    • identity-hijack + persistence β†’ score β‰₯ 90
    • prompt-injection + worm β†’ 1.2Γ— multiplier
  • Verdicts: βœ… CLEAN / 🟑 LOW / 🟠 SUSPICIOUS / 🚨 MALICIOUS

🧠 Insider Threat Detection

Based on Anthropic's research on agentic misalignment, detects when AI agents themselves become threats:

Category Patterns What It Catches
Self-Preservation 16 Kill switch bypass, self-replication
Information Leverage β€” Reading secrets + composing threats
Goal Conflict β€” Prioritizing own goals over user instructions
Deception β€” Impersonation, suppressing transparency
Unauthorized Sharing β€” Exfiltration planning, steganographic hiding
import { detectInsiderThreats } from '@neuzhou/clawguard';
const threats = detectInsiderThreats(agentOutput);

🚦 Policy Engine

Evaluate tool call safety against configurable YAML policies:

policies:
  exec:
    dangerous_commands:
      - rm -rf
      - mkfs
      - curl|bash
  file:
    deny_read:
      - /etc/shadow
      - '*.pem'
    deny_write:
      - '*.env'
  browser:
    block_domains:
      - evil.com
import { evaluateToolCall } from '@neuzhou/clawguard';
const decision = evaluateToolCall('exec', { command: 'rm -rf /' }, policies);
// β†’ { decision: 'deny', severity: 'critical' }

πŸ”₯ MCP Firewall β€” Real-Time MCP Security Proxy

Drop-in security proxy for the Model Context Protocol. Sits between MCP clients and servers, inspecting all traffic bidirectionally.

# Start MCP Firewall
clawguard firewall --config firewall.yaml --mode enforce
import { McpFirewallProxy, parseFirewallConfig } from '@neuzhou/clawguard';

const proxy = new McpFirewallProxy(parseFirewallConfig(yamlConfig));
proxy.onEvent(event => console.log(event));

// Intercept MCP JSON-RPC messages
const result = proxy.interceptClientToServer(message, 'filesystem');
// β†’ { action: 'block', findings: [...], reason: '...' }

Detection capabilities:

  • πŸ” Tool description injection β€” Scans tools/list responses for prompt injection
  • 🎭 Rug pull detection β€” Hashes and pins tool descriptions, alerts on change
  • 🧹 Parameter sanitization β€” Detects base64 exfiltration, shell injection, path traversal
  • πŸ›‘οΈ Output validation β€” Scans tool results for injection before forwarding to client

See docs/mcp-firewall.md for full usage guide.

🎣 Prompt Injection β€” 13 Sub-Categories

# Sub-Category Examples
1 Direct instruction override "ignore previous instructions"
2 Role confusion / jailbreaks DAN, developer mode
3 Delimiter attacks Chat template delimiters
4 Invisible Unicode Zero-width chars, directional overrides
5 Multi-language 12 languages (CN/JP/KR/AR/FR/DE/IT/RU…)
6 Encoding evasion Base64, hex, URL-encoded
7 Indirect / embedded HTML comments, tool output cascading
8 Multi-turn manipulation False memories, fake agreements
9 Payload cascading Template injection, string interpolation
10 Context window stuffing Oversized messages
11 Prompt worm Self-replication, agent-to-agent propagation
12 Trust exploitation Authority claims, fake audits
13 Safeguard bypass Retry-on-block, rephrase-to-bypass

πŸ“Š OWASP Agentic AI Top 10 Mapping

Rule OWASP Category Patterns Severity Range
prompt-injection LLM01: Prompt Injection 93 warning β†’ critical
data-leakage LLM06: Sensitive Information Disclosure 62 info β†’ critical
insider-threat Agentic AI: Misalignment 39 warning β†’ critical
supply-chain Agentic AI: Supply Chain 35 warning β†’ critical
mcp-security Agentic AI: Tool Manipulation 20 warning β†’ critical
identity-protection Agentic AI: Identity Hijacking 19 warning β†’ critical
file-protection LLM07: Insecure Plugin Design 16 warning β†’ critical
anomaly-detection LLM04: Model Denial of Service 6+ warning β†’ high
compliance LLM09: Overreliance 5+ info β†’ warning

⚑ How ClawGuard Compares

Feature ClawGuard Guardrails AI LLM Guard Rebuff
Scope Agent security (tools, files, MCP) LLM I/O validation Content moderation Prompt injection only
Prompt injection detection βœ… 93 patterns, 13 categories βœ… Via validators βœ… βœ…
Tool call governance βœ… Policy engine ❌ ❌ ❌
Insider threat / AI misalignment βœ… 39 patterns (Anthropic-inspired) ❌ ❌ ❌
MCP security analysis βœ… 20 patterns + MCP Firewall ❌ ❌ ❌
Supply chain scanning βœ… 35 patterns ❌ ❌ ❌
Risk scoring & attack chains βœ… Weighted + multipliers ❌ ❌ βœ… Basic
SARIF output βœ… ❌ ❌ ❌
Zero dependencies βœ… ❌ ❌ torch, transformers ❌
Real-time hooks βœ… OpenClaw hooks ❌ ❌ ❌
OWASP Agentic AI aligned βœ… Full mapping ⚠️ Partial ⚠️ Partial ❌
Language TypeScript Python Python Python

TL;DR: Guardrails AI validates LLM outputs. LLM Guard moderates content. Rebuff detects prompt injection. ClawGuard secures the entire agent β€” tools, files, MCP, identity, and the agent's own behavior.


πŸ”— GitHub Actions / SARIF Integration

- name: Security Scan
  run: npx @neuzhou/clawguard scan . --format sarif > results.sarif

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

πŸ›‘οΈ Real-Time Protection (OpenClaw Hooks)

openclaw hooks install clawguard
openclaw hooks enable clawguard-guard    # Scans every message
openclaw hooks enable clawguard-policy   # Enforces tool call policies
  • clawguard-guard β€” Hooks into message:received and message:sent, runs all 285+ patterns, logs findings, alerts on critical/high threats.
  • clawguard-policy β€” Evaluates outbound tool calls against security policies, blocks dangerous commands, protects sensitive files.

πŸ—ΊοΈ Roadmap

  • 285+ security patterns across 9 categories
  • Risk score engine with attack chain detection
  • Policy engine for tool call governance
  • Insider threat detection (Anthropic-inspired)
  • SARIF output for code scanning
  • OpenClaw hook pack for real-time protection
  • Security dashboard
  • MCP Firewall β€” real-time security proxy for Model Context Protocol
  • Custom rule authoring DSL
  • LangChain / CrewAI integration
  • VS Code extension
  • Rule marketplace
  • Machine learning-based anomaly detection
  • SOC/SIEM integration (Splunk, Elastic)

See GitHub Issues for the full list.


πŸ“š References


🀝 Contributing

git clone https://github.com/NeuZhou/clawguard.git
cd clawguard && npm install
npm test

See CONTRIBUTING.md for guidelines.


πŸ“„ License

Dual Licensed Β© NeuZhou

Contributors must agree to our CLA to enable dual licensing.

For commercial inquiries: neuzhou@users.noreply.github.com


🌐 NeuZhou Ecosystem

Project Description Link
ClawGuard πŸ›‘οΈ AI Agent Immune System (285+ patterns) You are here
AgentProbe πŸ”¬ Playwright for AI Agents GitHub
FinClaw πŸ“ˆ AI-native quantitative finance engine GitHub
repo2skill πŸ“¦ Convert any GitHub repo into an AI agent skill GitHub

The workflow: Generate skills with repo2skill β†’ Scan for vulnerabilities with ClawGuard β†’ Test behavior with AgentProbe β†’ See it in action with FinClaw.


ClawGuard β€” Because agents with shell access need a security guard. πŸ›‘οΈπŸ¦€