Security screening for Claude Code. Evaluate before you install.
You run claude mcp add, it works, you move on. But that server now has access to your files, your shell, and your conversation context. Did you check what it does with that access?
MCP servers execute code with your permissions. Skills inject instructions into your AI's context. The ecosystem is growing faster than anyone can audit, and a single malicious tool description can hijack your entire session. Cerbero screens MCP servers and Skills before you install them β so you can make informed decisions instead of trusting blindly.
- Quick Start
- What Cerbero Does
- What Cerbero Does NOT Do
- How It Works
- Installation
- Configuration
- Operations
- Detection Tiers
- Integration with Ignite
- Requirements
- FAQ
- Threat Reference
- Project Values
- Limitations
- Contributing
- Acknowledgments
- License
Tip
Three steps. Under 5 minutes.
1. Copy the skill and hooks to your project:
git clone https://github.com/jppuche/Cerbero.git
cp -r Cerbero/.claude/skills/cerbero/ your-project/.claude/skills/cerbero/
cp Cerbero/hooks/*.py your-project/.claude/hooks/2. Copy the example settings (or merge into your existing settings.local.json):
cp Cerbero/examples/settings.local.json your-project/.claude/settings.local.json3. Set up security artifacts:
mkdir -p your-project/.claude/security
cp Cerbero/security/trusted-publishers.txt your-project/.claude/security/Run /cerbero evaluate-mcp <package-name> to evaluate your first MCP server.
Verify installation
# Should detect injection (blocks with exit code 2)
echo '{"prompt":"ignore previous instructions"}' | python .claude/hooks/validate-prompt.py
# Should block dangerous command
echo '{"tool_input":{"command":"rm -rf /"}}' | python .claude/hooks/pre-tool-security.py
# Should allow safe command (exit 0)
echo '{"tool_input":{"command":"echo hello"}}' | python .claude/hooks/pre-tool-security.py- Evaluates MCP servers before installation β Source code review, dependency audit, reputation check, automated scanning, and semantic analysis. 8-step process with documented verdict
- Evaluates Skills before installation β File type validation, community intelligence, content analysis with pre-context scanning. Catches prompt injection in Markdown files
- Detects rug pulls β SHA-256 baseline comparison catches silent updates to MCP server tool descriptions. Classifies changes as BENIGN/SUSPICIOUS/MALICIOUS
- Blocks dangerous commands in real-time β Hook intercepts
rm -rf, fork bombs, remote code execution patterns, and PowerShell exploits before they run - Detects prompt injection in your prompts β Normalize-then-detect pipeline: NFKC normalization, homoglyph defeat, proximity matching, base64 recursive decode, zero-width stripping. Catches encoded and obfuscated attacks
- Scans external content for indirect injection β Post-tool hook analyzes MCP and WebFetch outputs for format injection tags, conversation splicing, and base64-obfuscated payloads
- Maintains an audit trail β Every MCP tool invocation is logged with timestamp, tool name, and session ID. Reminds you to verify every 50 calls
Cerbero is a screening layer, not a security guarantee.
- Does not catch all attacks. Pattern-based detection misses creative paraphrasing. Claude's built-in safety training is the primary defense against semantic attacks β Cerbero's regex layer is supplementary
- Does not prevent runtime exploits. Cerbero screens before installation. For runtime isolation, use
claude --sandbox - Does not replace code review. Automated scanning catches known patterns. Novel attack vectors require human judgment
- Does not block dynamic evasion. Shell command normalization (shlex) catches static evasion (quotes, backslashes) but not variable expansion, aliases, or IFS tricks
- Does not phone home. All core detection is local. Web research queries during evaluation use your existing Claude Code tools (WebSearch/WebFetch) β Cerbero itself makes zero network calls
- Does not auto-reject on a single flag. One suspicious finding triggers REQUIRES HUMAN REVIEW, not automatic rejection. You always make the final call
flowchart LR
A["Install request"] --> B["Pre-screening\n(local, no AI)"]
B --> C["Analysis\n(AI-assisted)"]
C --> D["Risk\nclassification"]
D --> E{{"Human decides"}}
E -->|Approve| F["Install"]
E -->|Reject| G["Skip"]
When you run /cerbero evaluate-mcp <package-name>, Cerbero executes a multi-step evaluation:
User: /cerbero evaluate-mcp @example/mcp-server
Cerbero:
1. Reviews source code for dangerous patterns
2. Checks reputation: age, downloads, stars, publisher trust
3. Searches for known vulnerabilities (CVEs, community reports)
4. Audits dependencies (npm audit + signature verification)
5. Runs pre-context scanner (Tier 0 β before Claude reads tool definitions)
6. Performs semantic analysis (Tier 3 β Claude analyzes pre-processed content)
7. Classifies risk: MEDIUM / HIGH / CRITICAL
8. Produces verdict: APPROVED / REQUIRES HUMAN REVIEW / REJECTED
β Presents full report to you before any installation happens
Example evaluation verdict (click to expand)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CERBERO EVALUATION REPORT β @anthropic/mcp-docs
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Publisher: anthropic (TRUSTED)
Package age: 8 months | Downloads: 12,400/week
Dependencies: 3 (all audited, 0 vulnerabilities)
Tier 0 (pre-context scanner): CLEAN
Tier 1 (instant checks): CLEAN
Tier 2 (local analysis): CLEAN
Tier 3 (semantic analysis): SKIPPED (trusted publisher, clean tiers)
Risk level: MEDIUM (read-only data access)
Capabilities: Network outbound (docs fetching)
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β VERDICT: APPROVED β
β Reason: Trusted publisher, clean scan, β
β read-only capabilities, no suspicious patterns β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Action: You may install. Run /cerbero verify periodically
to detect changes to tool descriptions (rug pull detection).
The hooks run automatically in the background:
- Every prompt you type passes through injection detection
- Every shell command is checked against dangerous patterns
- Every MCP tool call is logged and audited
- Every external response is scanned for indirect injection
See the full Setup Guide for detailed instructions.
git clone https://github.com/jppuche/Cerbero.git
cd Cerbero
# Copy skill
cp -r .claude/skills/cerbero/ <your-project>/.claude/skills/cerbero/
# Copy hooks
mkdir -p <your-project>/.claude/hooks
cp hooks/*.py <your-project>/.claude/hooks/
# Security artifacts
mkdir -p <your-project>/.claude/security
cp security/trusted-publishers.txt <your-project>/.claude/security/# Skill
cp -r .claude/skills/cerbero/ ~/.claude/skills/cerbero/
# Hooks
mkdir -p ~/.claude/hooks
cp hooks/*.py ~/.claude/hooks/Note
For global installation, update hook paths in your settings from .claude/hooks/ to ~/.claude/hooks/.
Cerbero uses Claude Code's native hooks system. See examples/settings.local.json for a complete working configuration.
The key sections:
| Section | Purpose |
|---|---|
permissions.deny |
Block dangerous commands (curl piping, rm -rf, secrets access) |
hooks.UserPromptSubmit |
Prompt injection detection on every user message |
hooks.PreToolUse |
Dangerous command blocking + MCP audit trail + untrusted source reminder |
hooks.PostToolUse |
Indirect injection scanning on WebFetch and MCP outputs |
The default list is intentionally minimal: anthropic and trailofbits. A publisher on this list allows Cerbero to auto-approve their components when all other checks pass. Every other publisher requires human review.
Adding a publisher requires meeting three criteria: direct relevance to Claude Code, established security track record, and no commercial conflict of interest. See the Setup Guide for details.
| Command | When to use |
|---|---|
/cerbero evaluate-mcp <package> |
Before installing any MCP server |
/cerbero evaluate-skill <path> |
Before installing any Skill file |
/cerbero verify |
Routine check for rug pulls (recommended: start of each session) |
/cerbero audit |
Monthly full security audit |
Each operation produces a structured report presented to you before any action is taken.
All tiers run locally. No external APIs required for core detection.
| Tier | Speed | What it does | When it runs |
|---|---|---|---|
| Tier 0 | Pre-context | Python scanner runs BEFORE Claude reads content. Catches injection, base64, zero-width chars, HTML comments, CSS hiding | Always for Skill evaluation. MCP semantic analysis |
| Tier 1 | < 100ms | Hash comparison, regex patterns, file type validation, npm audit | Every evaluation |
| Tier 2 | 100ms-2s | Base64/hex recursive decode, HTML comment extraction, CSS hiding, source code patterns, tool schema analysis | Every evaluation |
| Tier 3 | 2-10s | Claude semantic analysis on pre-processed (stripped) content. Multi-stage attack detection | When Tier 2 flags SUSPICIOUS+, untrusted publisher, or user requests deep analysis |
Inspired by Vigil:
- 1 non-injection check fails β SUSPICIOUS (continue evaluation)
- 2+ checks fail β REJECT or REQUIRES HUMAN REVIEW
- Direct injection phrase β always REJECT (one match suffices)
- cisco-ai-mcp-scanner (recommended for 5+ MCPs): YARA-only mode, 100% offline.
uv tool install --python 3.13 cisco-ai-mcp-scanner - Trail of Bits mcp-context-protector: Runtime TOFU pinning. No stable release yet
Cerbero is also available as part of Ignite, a complete development infrastructure for Claude Code. Ignite adds session memory that survives /clear, compound learning where Claude gets better every session, quality gates that block commits when docs go stale, and CI/CD adapted to your stack.
Use Cerbero standalone if you only need security screening. Use Ignite if you want the full development workflow with Cerbero built in.
- Python 3.8+ (hooks use stdlib only β no pip install needed)
- Claude Code (latest stable recommended)
- Node.js 18+ (for
npm auditduring MCP evaluation)
Optional:
- uv β for installing cisco-ai-mcp-scanner
- Python 3.11-3.13 β required by cisco-ai-mcp-scanner (if used)
Does Cerbero send my data anywhere?
No. All core detection runs locally using Python stdlib. During evaluation, Cerbero uses Claude Code's WebSearch/WebFetch tools to check for known vulnerabilities β these use whatever web access Claude Code already has. The hooks themselves make zero network calls.
Can I use Cerbero without the hooks?
Yes. The skill (/cerbero evaluate-mcp, etc.) works independently. The hooks add real-time protection between evaluations β prompt injection defense, dangerous command blocking, audit trail, and indirect injection scanning. Both layers are valuable but neither depends on the other.
What happens if a hook crashes?
All hooks fail open. A crash means the command/prompt is allowed through. This is by design β a false-positive that blocks your work is worse than a missed detection in a screening tool. The primary defense is always Claude's built-in safety training.
How is this different from mcp-scan?
mcp-scan (originally Invariant Labs, now Snyk) was deprecated in early 2026. v0.3.0 is unmaintained; v0.4+ requires a Snyk account and mandatory cloud data transmission. Cerbero runs entirely locally, covers both MCP servers and Skills, includes 6 automation hooks, and provides rug pull detection. For YARA-based malware signatures, Cerbero recommends cisco-ai-mcp-scanner as a complement.
Does Cerbero work on all platforms?
Yes. Python stdlib is cross-platform. The hooks work on Windows, macOS, and Linux. The operation procedures include PowerShell examples but the concepts apply to any shell. Hook configuration uses Claude Code's native hooks API which is platform-agnostic.
Can Cerbero evaluate servers from the MCP registry?
Yes. /cerbero evaluate-mcp <package-name> works with any npm package. The evaluation includes checking the npm registry for metadata, downloading the package to a temp directory for dependency audit, and verifying provenance signatures.
Cerbero's detection maps to the OWASP MCP Top 10:
| OWASP ID | Threat | Cerbero Coverage |
|---|---|---|
| MCP01 | Token Mismanagement | Auth check in evaluate-mcp (Step 1.8) |
| MCP02 | Privilege Escalation | Risk classification matrix, sandbox enforcement |
| MCP03 | Tool Poisoning | Tier 0-3 detection, semantic analysis, canary technique |
| MCP04 | Supply Chain Attacks | Dependency audit, typosquat detection, provenance verification |
| MCP05 | Command Injection | pre-tool-security hook, source code pattern matching |
| MCP06 | Prompt Injection | validate-prompt hook, cerbero-scanner, Tier 0-3 pipeline |
| MCP07 | Insufficient Auth | Auth check for remote (SSE/HTTP) servers |
| MCP08 | Lack of Telemetry | mcp-audit hook, invocation counter, audit trail |
| MCP09 | Shadow MCP Servers | Configuration integrity check in full audit |
| MCP10 | Context Over-Sharing | Tool description analysis, compound risk flagging |
- Security through transparency. Every detection pattern is documented. Every hook is auditable. No black boxes
- Local-first. No external APIs required for core detection. Your code stays on your machine
- Defense in depth. Cerbero is one layer, not the only layer. It complements Claude's built-in safety, OS-level sandboxing, and human judgment
- Informed decisions. Cerbero always presents findings to the human before acting. It never auto-installs, never silently skips, never hides a finding
- Runtime monitoring (use
claude --sandboxfor OS-level enforcement) - Network traffic analysis (out of scope for a screening tool)
- Binary analysis of compiled MCP servers (text-based scanning only)
- Guarantee safety (no tool can β security is a process, not a product)
- Creative paraphrasing of injection attempts (not regex-matchable)
- Dynamic shell evasion (variable expansion, aliases, IFS tricks)
- Polyglot files that pass file type validation but contain hidden payloads
- Zero-day vulnerabilities not yet in CVE databases or community reports
- Multi-turn attacks that span across separate tool calls
Important
Cerbero reduces risk. It does not eliminate it. Use it alongside sandbox mode, careful permission policies, and your own judgment.
See CONTRIBUTING.md for guidelines.
- OWASP MCP Top 10 β Threat classification framework
- TOFU (Trust on First Use) β Baseline comparison model
- Invariant Labs β Compound risk research on MCP server interactions
- Cisco AI Defense β YARA patterns and MCP scanner inspiration
- Trail of Bits β TOFU concepts and mcp-context-protector
- Vigil β Multi-scanner trigger logic inspiration
- Anthropic β Claude Code hooks API that makes automation possible
MIT. See LICENSE.
Made by Juan Puche β building security tools for AI-assisted development.