Skip to content

zurbrick/agent-hardening

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Agent Hardening

Lock down any LLM agent against prompt injection, data exfiltration, social engineering, and channel-based attacks.

Built from real pen-test findings, not theory. Works with OpenClaw, Claude Code, LangChain, and any agent that takes natural-language input and calls external tools.

Quick start

  1. Copy agent-hardening/ into your skills directory
  2. Run the attack surface checklist (references/attack-surface-checklist.md)
  3. Audit MCP connections (references/mcp-hardening.md)
  4. Apply the tiered behavioral rules to your agent's operating docs
  5. Verify with the automated test runner or the manual quick test
  6. Fix failures, re-test, document findings

Automated testing

# Test against any OpenAI-compatible endpoint
python agent-hardening/tools/run-security-tests.py \
    --endpoint https://api.openai.com/v1/chat/completions \
    --api-key sk-... \
    --model gpt-4 \
    --owner-name "Don" \
    --output findings.json

# Test against local Ollama
python agent-hardening/tools/run-security-tests.py \
    --endpoint http://localhost:11434/v1/chat/completions \
    --model llama3

# Test your hardened system prompt
python agent-hardening/tools/run-security-tests.py \
    --endpoint https://api.openai.com/v1/chat/completions \
    --api-key sk-... \
    --model gpt-4 \
    --system-prompt-file my-agent-prompt.txt \
    --output findings.json

Requires Python 3.10+ and requests (pip install requests).

Included files

File Purpose
SKILL.md Skill entrypoint and workflow
references/attack-surface-checklist.md Identify what the agent can access
references/channel-hardening.md Per-channel security configuration
references/mcp-hardening.md MCP server permission auditing
references/behavioral-rules.md 4-tier defensive operating rules
references/quick-test.md 10 single-shot + 5 multi-turn security tests
references/findings-template.md Structured findings documentation
tools/run-security-tests.py Automated test runner (10 single-shot tests)

What it covers

See SKILL.md for the full workflow, principles, and framework compatibility notes.

License

MIT

About

Lock down OpenClaw agents against prompt injection, data exfiltration, social engineering, and channel attacks. Built from real pen tests.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages