You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 20, 2026. It is now read-only.
Implementation detail — subordinate to all tiers above
1) Thesis Statement
Autonomous AI agents have a systematically exploitable attack surface — prompt injection, tool misuse, privilege escalation, and memory poisoning — that can be discovered through structured red-teaming, and defended against using adversarial control analysis (the same architectural principle validated on IDS and vulnerability prediction).
This is the third domain test for adversarial control analysis. If the methodology holds on agents (which have fundamentally different input/output structures than network flows or CVE metadata), it becomes a general security architecture principle — not a technique tied to any single domain.
2) Research Questions
#
Question
How You'll Answer It
Success Criteria
RQ1
What is the attack taxonomy for autonomous AI agents?
Systematically enumerate attack classes by reviewing OWASP Top 10 for LLM, MITRE ATLAS, published agent exploits (LangChain CVEs, AutoGPT issues). Build an original taxonomy extending what exists.
Taxonomy covers ≥5 attack classes not in existing frameworks
RQ2
Can these attacks be executed against real open-source agents?
Build attack scripts targeting LangChain ReAct agents, CrewAI multi-agent systems, and AutoGen. Run against controlled agent setups with measurable success rates.
≥3 attack classes demonstrated with >50% success rate
RQ3
Does adversarial control analysis apply to agent systems?
Classify agent inputs by controllability: user prompt (attacker-controlled), tool outputs (partially controllable), system prompt (defender-controlled), memory (poisonable). Evaluate robustness by input category.
Clear controllability matrix with measurable robustness differential
RQ4
What architectural defenses reduce agent attack surface?
Design and test defenses: input sanitization, tool permission boundaries, memory integrity checks, output validation. Measure attack success before/after.
≥2 defenses reduce attack success rate by >50%
3) Scope Definition
In Scope
Attack taxonomy for autonomous AI agents (extending OWASP/ATLAS)
Red-team scripts for 4+ attack classes against open-source agents
Adversarial control analysis applied to agent input/output architecture
Defensive architecture patterns with measured effectiveness
Open-source red-team framework (CLI tool, not just scripts)
Out of Scope
Attacking production/deployed agents (only local controlled setups)
Jailbreaking LLMs (prompt injection FOR agent misuse, not just harmful outputs)
Training custom models (use existing LLMs as the agent backbone)