285+ security patterns Β· Risk scoring Β· Policy engine Β· Insider threat detection
Quick Start Β· Features Β· Architecture Β· Comparison Β· Contributing
AI agents have access to your files, tools, shell, and secrets. A single prompt injection can:
- Exfiltrate API keys via tool calls
- Hijack the agent's identity by overwriting personality files
- Register shadow MCP servers to intercept tool calls
- Install backdoored skills with obfuscated reverse shells
- The agent itself can become the threat β self-preservation, deception, goal misalignment
ClawGuard catches these attacks before they execute.
# Scan a directory for threats
npx @neuzhou/clawguard scan ./path/to/scan
# Strict mode (exit code 1 on high/critical findings)
npx @neuzhou/clawguard scan ./skills/ --strict
# SARIF output for GitHub Code Scanning
npx @neuzhou/clawguard scan . --format sarif > results.sarif
# Generate default config
npx @neuzhou/clawguard initnpm install @neuzhou/clawguardimport { runSecurityScan, calculateRisk, evaluateToolCall } from '@neuzhou/clawguard';
// Scan content for threats
const findings = runSecurityScan(message.content, 'inbound', context);
// Get risk score
const risk = calculateRisk(findings);
if (risk.verdict === 'MALICIOUS') { /* block */ }
// Evaluate tool call safety
const decision = evaluateToolCall('exec', { command: 'rm -rf /' });
// β { decision: 'deny', reason: 'Dangerous command', severity: 'critical' }clawhub install clawguardThen ask your agent: "scan my skills for security threats"
openclaw hooks install clawguard
openclaw hooks enable clawguard-guard # Scans every message
openclaw hooks enable clawguard-policy # Enforces tool call policiesβββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ClawGuard β
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββββββ€
β CLI β Hooks β Scanner β Dashboard :19790 β
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββββββ€
β βββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β β Risk Engine β βPolicy Engine β βInsider Threat ββ
β β Score 0-100 β β allow/deny β β AI Misalign. ββ
β β Chain Detect β β exec/file/ β β 5 categories ββ
β β Multipliers β β browser/msg β β 39 patterns ββ
β βββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Security Engine β 285+ Patterns β
β β’ Prompt Injection (93) β’ Data Leakage (62) β
β β’ Insider Threat (39) β’ Supply Chain (35) β
β β’ Identity Protection (19)β’ MCP Security (20) β
β β’ File Protection (16) β’ Anomaly Detection β
β β’ Compliance β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Exporters: JSONL Β· Syslog/CEF Β· Webhook Β· SARIF β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Weighted scoring with attack chain detection and multiplier system:
import { calculateRisk } from '@neuzhou/clawguard';
const result = calculateRisk(findings);
// β { score: 87, verdict: 'MALICIOUS', icon: 'π¨',
// attackChains: ['credential-exfiltration'],
// enrichedFindings: [...] }- Severity weights: critical=40, high=15, medium=5, low=2
- Confidence scoring: every finding carries a confidence (0β1)
- Attack chain detection: auto-correlates findings into combo attacks
- credential + exfiltration β 2.2Γ multiplier
- identity-hijack + persistence β score β₯ 90
- prompt-injection + worm β 1.2Γ multiplier
- Verdicts: β CLEAN / π‘ LOW / π SUSPICIOUS / π¨ MALICIOUS
Based on Anthropic's research on agentic misalignment, detects when AI agents themselves become threats:
| Category | Patterns | What It Catches |
|---|---|---|
| Self-Preservation | 16 | Kill switch bypass, self-replication |
| Information Leverage | β | Reading secrets + composing threats |
| Goal Conflict | β | Prioritizing own goals over user instructions |
| Deception | β | Impersonation, suppressing transparency |
| Unauthorized Sharing | β | Exfiltration planning, steganographic hiding |
import { detectInsiderThreats } from '@neuzhou/clawguard';
const threats = detectInsiderThreats(agentOutput);Evaluate tool call safety against configurable YAML policies:
policies:
exec:
dangerous_commands:
- rm -rf
- mkfs
- curl|bash
file:
deny_read:
- /etc/shadow
- '*.pem'
deny_write:
- '*.env'
browser:
block_domains:
- evil.comimport { evaluateToolCall } from '@neuzhou/clawguard';
const decision = evaluateToolCall('exec', { command: 'rm -rf /' }, policies);
// β { decision: 'deny', severity: 'critical' }Drop-in security proxy for the Model Context Protocol. Sits between MCP clients and servers, inspecting all traffic bidirectionally.
# Start MCP Firewall
clawguard firewall --config firewall.yaml --mode enforceimport { McpFirewallProxy, parseFirewallConfig } from '@neuzhou/clawguard';
const proxy = new McpFirewallProxy(parseFirewallConfig(yamlConfig));
proxy.onEvent(event => console.log(event));
// Intercept MCP JSON-RPC messages
const result = proxy.interceptClientToServer(message, 'filesystem');
// β { action: 'block', findings: [...], reason: '...' }Detection capabilities:
- π Tool description injection β Scans
tools/listresponses for prompt injection - π Rug pull detection β Hashes and pins tool descriptions, alerts on change
- π§Ή Parameter sanitization β Detects base64 exfiltration, shell injection, path traversal
- π‘οΈ Output validation β Scans tool results for injection before forwarding to client
See docs/mcp-firewall.md for full usage guide.
| # | Sub-Category | Examples |
|---|---|---|
| 1 | Direct instruction override | "ignore previous instructions" |
| 2 | Role confusion / jailbreaks | DAN, developer mode |
| 3 | Delimiter attacks | Chat template delimiters |
| 4 | Invisible Unicode | Zero-width chars, directional overrides |
| 5 | Multi-language | 12 languages (CN/JP/KR/AR/FR/DE/IT/RUβ¦) |
| 6 | Encoding evasion | Base64, hex, URL-encoded |
| 7 | Indirect / embedded | HTML comments, tool output cascading |
| 8 | Multi-turn manipulation | False memories, fake agreements |
| 9 | Payload cascading | Template injection, string interpolation |
| 10 | Context window stuffing | Oversized messages |
| 11 | Prompt worm | Self-replication, agent-to-agent propagation |
| 12 | Trust exploitation | Authority claims, fake audits |
| 13 | Safeguard bypass | Retry-on-block, rephrase-to-bypass |
| Rule | OWASP Category | Patterns | Severity Range |
|---|---|---|---|
prompt-injection |
LLM01: Prompt Injection | 93 | warning β critical |
data-leakage |
LLM06: Sensitive Information Disclosure | 62 | info β critical |
insider-threat |
Agentic AI: Misalignment | 39 | warning β critical |
supply-chain |
Agentic AI: Supply Chain | 35 | warning β critical |
mcp-security |
Agentic AI: Tool Manipulation | 20 | warning β critical |
identity-protection |
Agentic AI: Identity Hijacking | 19 | warning β critical |
file-protection |
LLM07: Insecure Plugin Design | 16 | warning β critical |
anomaly-detection |
LLM04: Model Denial of Service | 6+ | warning β high |
compliance |
LLM09: Overreliance | 5+ | info β warning |
| Feature | ClawGuard | Guardrails AI | LLM Guard | Rebuff |
|---|---|---|---|---|
| Scope | Agent security (tools, files, MCP) | LLM I/O validation | Content moderation | Prompt injection only |
| Prompt injection detection | β 93 patterns, 13 categories | β Via validators | β | β |
| Tool call governance | β Policy engine | β | β | β |
| Insider threat / AI misalignment | β 39 patterns (Anthropic-inspired) | β | β | β |
| MCP security analysis | β 20 patterns + MCP Firewall | β | β | β |
| Supply chain scanning | β 35 patterns | β | β | β |
| Risk scoring & attack chains | β Weighted + multipliers | β | β | β Basic |
| SARIF output | β | β | β | β |
| Zero dependencies | β | β | β torch, transformers | β |
| Real-time hooks | β OpenClaw hooks | β | β | β |
| OWASP Agentic AI aligned | β Full mapping | β | ||
| Language | TypeScript | Python | Python | Python |
TL;DR: Guardrails AI validates LLM outputs. LLM Guard moderates content. Rebuff detects prompt injection. ClawGuard secures the entire agent β tools, files, MCP, identity, and the agent's own behavior.
- name: Security Scan
run: npx @neuzhou/clawguard scan . --format sarif > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarifopenclaw hooks install clawguard
openclaw hooks enable clawguard-guard # Scans every message
openclaw hooks enable clawguard-policy # Enforces tool call policies- clawguard-guard β Hooks into
message:receivedandmessage:sent, runs all 285+ patterns, logs findings, alerts on critical/high threats. - clawguard-policy β Evaluates outbound tool calls against security policies, blocks dangerous commands, protects sensitive files.
- 285+ security patterns across 9 categories
- Risk score engine with attack chain detection
- Policy engine for tool call governance
- Insider threat detection (Anthropic-inspired)
- SARIF output for code scanning
- OpenClaw hook pack for real-time protection
- Security dashboard
- MCP Firewall β real-time security proxy for Model Context Protocol
- Custom rule authoring DSL
- LangChain / CrewAI integration
- VS Code extension
- Rule marketplace
- Machine learning-based anomaly detection
- SOC/SIEM integration (Splunk, Elastic)
See GitHub Issues for the full list.
- OWASP Top 10 for LLM Applications
- OWASP Agentic AI Top 10 (2026)
- Anthropic: Research on Agentic Misalignment
- OWASP Guide for Secure MCP Server Development
git clone https://github.com/NeuZhou/clawguard.git
cd clawguard && npm install
npm testSee CONTRIBUTING.md for guidelines.
Dual Licensed Β© NeuZhou
- Open Source: AGPL-3.0 β free for open-source use
- Commercial: Commercial License β for proprietary/SaaS use
Contributors must agree to our CLA to enable dual licensing.
For commercial inquiries: neuzhou@users.noreply.github.com
| Project | Description | Link |
|---|---|---|
| ClawGuard | π‘οΈ AI Agent Immune System (285+ patterns) | You are here |
| AgentProbe | π¬ Playwright for AI Agents | GitHub |
| FinClaw | π AI-native quantitative finance engine | GitHub |
| repo2skill | π¦ Convert any GitHub repo into an AI agent skill | GitHub |
The workflow: Generate skills with repo2skill β Scan for vulnerabilities with ClawGuard β Test behavior with AgentProbe β See it in action with FinClaw.
ClawGuard β Because agents with shell access need a security guard. π‘οΈπ¦