Gate: 0.5 (must pass before Phase 1 compute) Date: 2026-03-19 Target venue: AISec Workshop (ACM CCS 2026) — Tier 2 lock_commit:
82d4a63Designed for: 8/10 from day 1 (Gate 0.5 + R34 depth escalation)
Project: FP-15 — Multi-Agent Security Testing Framework Target venue: AISec Workshop (ACM CCS 2026), Tier 2 Compute budget: ~30-50 CPU-hours (Azure VM 2 vCPU), no GPU required Framework: CrewAI + custom harness (LangGraph as secondary) LLM backend: Claude API (Haiku for agents, Sonnet for orchestrator) + local Llama 3.1 8B fallback
First open-source framework quantifying how single-agent compromise cascades through multi-agent systems, with empirical comparison of zero-trust vs implicit-trust defense architectures.
Self-test (≤25 words): We measure compromise propagation rates in multi-agent systems and show zero-trust architectures reduce cascade spread by N% vs implicit trust. ✓
| # | Method | Citation | How We Compare | Why This Baseline |
|---|---|---|---|---|
| 1 | OWASP LLM Top 10 v2 | OWASP Foundation, 2025 | Our attack taxonomy extends OWASP with 3 multi-agent-specific classes (delegation abuse, cascade poisoning, identity spoofing). Report coverage overlap and novel classes. | Industry standard threat enumeration. Reviewer expects positioning against it. |
| 2 | FP-02 single-agent baseline | Coleman, 2026 (own work) | Run identical FP-02 attacks on single agent, then same attacks on multi-agent system. Delta = cascade amplification factor. | Controls for multi-agent novelty. Without this, reviewer says "this is just FP-02 with more agents." |
| 3 | ACM Computing Surveys agent threat taxonomy | Masterman et al., 2025 | Their taxonomy is theoretical. We provide empirical validation: which of their threat categories actually manifest in a real multi-agent testbed, and at what rates? | Most comprehensive published survey. Reviewer will ask how we relate. |
| # | Criticism | Planned Mitigation | Design Decision |
|---|---|---|---|
| 1 | "This is just FP-02 with more agents — where's the new attack surface?" | Experiment 1 explicitly measures cascade amplification: same attack on 1 agent vs same attack propagated through 2, 5, 10 agents. Show that compromise rate is super-linear (not just N copies of single-agent risk). | Run single-agent baseline as control condition in every experiment. |
| 2 | "CrewAI/LangGraph are toy frameworks, not production systems." | Test on both CrewAI AND custom multi-agent harness. Show attacks generalize across frameworks. Reference real-world multi-agent deployments (Solana trading agents, DeFi bots, enterprise automation). | Two frameworks minimum. Include custom harness to prove framework-independence. |
| 3 | "87% cascade figure is from Galileo — you're just replicating their work." | Galileo's 87% was observational (one case study, one system). We provide controlled experiments across 3 network topologies, 3 trust models, and 3 agent counts with 5 seeds. Our contribution is the systematic framework, not the number. | Design experiments to VARY cascade rate by topology and trust model, not just confirm one number. |
| 4 | "No formal model — this is just empirical." | Provide a graph-based propagation model (infection probability per edge, trust weight per connection) that predicts cascade rates. Validate predictions against empirical results. The model is simple but falsifiable. | Include lightweight formal model in Phase 0 design. |
| Component / Feature | Hypothesis When Removed | Expected Effect | Priority |
|---|---|---|---|
| Inter-agent trust (set all trust = 1.0) | Implicit trust baseline: maximum cascade propagation | Fastest spread, highest downstream poisoning rate | HIGH — this is the control |
| Per-agent capability scoping | Removing capability limits lets compromised agent access all tools | Propagation rate increases; tool-based attacks become possible | HIGH |
| Agent authentication (identity verification) | Without auth, any agent can impersonate any other | Identity spoofing attacks succeed at ~100% | HIGH |
| Orchestrator oversight (Ralph loop pattern) | Without oversight agent, no quality check on delegated tasks | Poisoned outputs pass through unchecked | MEDIUM |
| Shared context/memory | Without shared memory, agents operate independently | Cascade rate should drop dramatically (isolation test) | MEDIUM — mechanism test |
| Source | Type | Estimated Count | Known Lag | Estimated Positive Rate | Limitations |
|---|---|---|---|---|---|
| Controlled injection (our testbed) | Synthetic ground truth | ~500-1000 test cases per experiment | None (synthetic) | Controlled: 0%, 10%, 25%, 50% compromise rates | Synthetic may not reflect real-world agent behavior |
| FP-02 attack success rates | Empirical transfer | 19 scenarios, 3 seeds | Tested on Claude Sonnet only | 25-100% by attack class | Single LLM backend, single framework |
| Source | Included? | Rationale |
|---|---|---|
| Real-world multi-agent incident reports | NO (not enough public data) | <10 documented incidents as of 2026. Insufficient for statistical analysis. Referenced qualitatively. |
| Galileo AI cascade study | YES (qualitative) | Provides the 87%/4hr benchmark. We reproduce the SCENARIO, not the data. |
| NIST Agentic Control Overlays | NO (not yet published) | In draft. Will reference when available. |
| Parameter | Value | Justification |
|---|---|---|
| Seeds | 5 (42, 123, 456, 789, 1024) | govML standard. Captures LLM stochasticity. |
| Significance test | Bootstrap CI (95%) | Non-parametric; no distributional assumptions on agent behavior. |
| Effect size threshold | ≥10pp cascade rate difference between trust models | Practitioner-meaningful: <10pp is too small to justify architecture change. |
| CI method | Percentile bootstrap, 10K resamples | Standard for small-N experiments with unknown distributions. |
| Multiple comparison correction | Bonferroni for pairwise trust model comparisons | 3 trust models = 3 pairwise comparisons. |
| # | Paper | Year | Relevance | How We Differ |
|---|---|---|---|---|
| 1 | Masterman et al. — "Landscape of Emerging AI Agent Architectures" | 2024 | Comprehensive agent architecture taxonomy | They enumerate architectures; we test them under adversarial conditions |
| 2 | Tian et al. — "Evil Geniuses: Delving into the Safety of LLM-based Agents" | 2024 | Multi-agent safety evaluation | Focus on jailbreaking multi-agent chat; we focus on cascade compromise in task-delegation systems |
| 3 | Cohen et al. — "Here Comes the AI Worm" | 2024 | Self-replicating adversarial inputs across agents | Demonstrates agent-to-agent propagation of adversarial inputs; we quantify propagation RATES across architectures |
| 4 | Gu et al. — "Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents" | 2024 | Shows one poisoned input can compromise many agents | Focuses on shared input channel; we focus on delegation-based cascade (different propagation mechanism) |
| 5 | Xi et al. — "Rise and Potential of Large Language Model Based Agents: A Survey" | 2023 | Foundational LLM agent survey | Broad survey; we provide focused security testing framework |
| 6 | Coleman — FP-02: Agent Red-Team Framework | 2026 | Single-agent attack baseline (7 classes) | We extend from single-agent to multi-agent cascade; FP-02 is our control condition |
| 7 | Coleman — FP-12: RL Agent Vulnerability | 2026 | RL policy attacks (observation > reward asymmetry) | We test whether RL-trained agents in multi-agent systems amplify cascade risk |
| # | Requirement | Status | Notes |
|---|---|---|---|
| 1 | Novelty claim stated in ≤25 words | [x] | §2 |
| 2 | ≥2 comparison baselines identified | [x] | §3: OWASP, FP-02 single-agent, ACM survey |
| 3 | ≥2 reviewer kill shots with mitigations | [x] | §4: 4 kill shots |
| 4 | Ablation plan with hypothesized effects | [x] | §5: 5 components |
| 5 | Ground truth audit: sources, lag, positive rate | [x] | §6: synthetic + FP-02 transfer |
| 6 | Alternative label sources considered | [x] | §6: incidents, Galileo, NIST |
| 7 | Statistical plan: seeds, tests, CIs | [x] | §7: 5 seeds, bootstrap, Bonferroni |
| 8 | Related work: ≥5 papers | [x] | §8: 7 papers |
| 9 | Hypotheses pre-registered in HYPOTHESIS_REGISTRY | [ ] | To create before Phase 1 |
| 10 | lock_commit set in HYPOTHESIS_REGISTRY | [ ] | To set before Phase 1 |
| 11 | Target venue identified | [x] | AISec Workshop (ACM CCS 2026) |
| 12 | This document committed before any training script | [x] | This commit |
Gate 0.5 verdict: [x] PASS (pending items 9-10 before Phase 1 start)
Primary finding (one sentence): Single-agent compromise cascades super-linearly through multi-agent systems under implicit trust, and zero-trust architectures reduce cascade propagation by ≥50%.
Evaluation settings (minimum 2):
| # | Setting | How It Differs | What It Tests |
|---|---|---|---|
| 1 | CrewAI task-delegation system (3, 5, 10 agents) | Primary framework, hierarchical delegation | Core cascade propagation rates |
| 2 | Custom flat-topology multi-agent harness | No hierarchy, all-to-all communication | Whether hierarchy is protective or cascade-amplifying |
| 3 | Mixed RL + LLM agent system | Includes FP-12-style RL agent alongside LLM agents | Whether RL agents amplify or dampen cascade |
| Finding | Proposed Mechanism | Experiment to Verify |
|---|---|---|
| Cascade rate is super-linear with agent count | Trust transitivity: Agent B trusts Agent A's output, so Agent C trusts it too (transitive closure) | Compare cascade under transitive trust (default) vs per-hop trust verification |
| Zero-trust reduces cascade by ≥50% | Each agent independently validates inputs, breaking transitive trust chains | Ablation: toggle zero-trust per agent and measure marginal cascade reduction |
| Shared memory accelerates cascade | Poisoned context persists and is read by all agents | Ablation: remove shared memory, measure cascade rate with isolated agent contexts |
| Robustness Claim | Weak Test (baseline) | Adaptive Test (attacker knows defense) |
|---|---|---|
| Zero-trust reduces cascade | Static attacker: same FP-02 attacks regardless of defense | Adaptive attacker: modifies attack to mimic legitimate delegation patterns, uses valid agent IDs, crafts outputs that pass zero-trust validation checks |
| Identity verification prevents spoofing | Naive spoofing: attacker uses wrong agent name | Adaptive spoofing: attacker compromises one agent's credentials and delegates as that agent |
We contribute:
- An open-source multi-agent security testing framework with configurable trust models, agent topologies, and attack injection points
- Empirical cascade propagation rates across 3 agent counts × 3 trust models × 2 frameworks × 5 seeds, showing that implicit trust amplifies single-agent compromise super-linearly
- Comparative defense analysis demonstrating that zero-trust architectures reduce cascade propagation by ≥50% but at measurable latency cost, providing practitioners with a quantified security-performance tradeoff
| Published Method | Their Benchmark | Our Reproduction Plan |
|---|---|---|
| FP-02 single-agent attacks (Coleman 2026) | 19 scenarios, 3 seeds, Claude Sonnet | Reproduce top 3 attack classes on same Claude backend in single-agent mode. Then inject same attacks into multi-agent system. Delta = cascade amplification. |
| Cohen et al. "AI Worm" propagation | GenAI ecosystem simulation | Reproduce their propagation scenario in our testbed: one poisoned agent, measure downstream impact over time. Compare their qualitative findings to our quantitative rates. |
Internal validity:
- Simulation uses fixed cascade probability (0.15) — sensitivity analysis (§ Parameter Sensitivity) shows results are robust across 0.05-0.50.
- Two-of-three capability assignment is round-robin — random assignment may produce different results (ablation planned but not yet run).
- Adaptive adversary (E4) only tested against original 3 trust models, not yet against two-of-three.
External validity:
- Simulation overestimates cascade by 37pp vs real Claude Haiku agents (§ Real Agent Validation). SE-150 results carry same synthetic qualifier.
- Single LLM backend (Haiku) for real validation. Other models (GPT-4, Llama) may show different cascade dynamics.
- Agent counts up to 50 tested. Production systems with 100+ agents may exhibit different dynamics.
Construct validity:
- "Cascade rate" measures fraction of agents compromised, not severity of compromise. A 50% cascade rate where compromised agents produce low-quality-but-harmless output differs from 50% cascade with data exfiltration.
- "Capability partitioning" in simulation is binary (has/doesn't have). Real capability enforcement may be partial or bypassable.
| # | Requirement | Status |
|---|---|---|
| 1 | ONE primary finding identified | [x] |
| 2 | ≥2 evaluation settings designed | [x] |
| 3 | Mechanism analysis planned for each major claim | [x] |
| 4 | Adaptive adversary test planned | [x] |
| 5 | Formal contribution statement drafted | [x] |
| 6 | ≥1 published baseline reproduction planned | [x] |
| Experiment | Independent Variable | Levels | Dependent Variable | Seeds |
|---|---|---|---|---|
| E1: Cascade vs agent count | Number of agents | 2, 5, 10 | % downstream decisions poisoned at t=1hr | 5 |
| E2: Trust model comparison | Trust architecture | Implicit, capability-scoped, zero-trust | Cascade propagation rate | 5 |
| E3: Topology comparison | Network structure | Hierarchical (CrewAI), flat (custom), star | Time to N% cascade | 5 |
| E4: Adaptive adversary | Attacker knowledge | None, defense-aware, credential-theft | Attack success rate under zero-trust | 5 |
| E5: Mixed agent types | Agent composition | All-LLM, LLM+RL, LLM+rule-based | Cascade rate by agent type | 5 |
| E6: Shared memory ablation | Memory isolation | Shared, partitioned, isolated | Cascade rate | 5 |
| E7: Two-of-three constraint | Trust architecture | Implicit, capability-scoped, zero-trust, two-of-three | Cascade rate, poison rate | 5 |
Total runs: 7 experiments. E1-E6: ~150 runs. E7: 4 models × 4 agent counts × 3 topologies × 5 seeds = 240 runs. Estimated compute: ~30-50 CPU-hours (API calls are the bottleneck, not compute)
[SEED: sunset after 5 projects if this never changes experimental design | PT-5]
Prior art search strategy: Google Scholar + arXiv for "capability-based security multi-agent", "least privilege agent systems", "NVIDIA NemoClaw security", "two-of-three constraint". Minimum 5 papers reviewed. Expected contribution type: Novel combination — importing capability-based security (Saltzer & Schroeder 1975, OS security) into multi-agent LLM systems. What result would surprise us?
- EXPECTED: Two-of-three performs between zero-trust and implicit trust.
- SURPRISE 1: Two-of-three OUTPERFORMS zero-trust for some topology (less restrictive but more effective?)
- SURPRISE 2: Two-of-three is WORSE than implicit for some topology (constraint creates attack surface?)
- SURPRISE 3: Non-monotonic cascade dynamics with agent count under two-of-three (emergent behavior)
- SURPRISE 4: Topology × trust model interaction — same model behaves differently across topologies
| Novel Component | How Tested (ablation) | Expected Effect If Removed |
|---|---|---|
| Two-of-three capability constraint | Remove constraint → agents get all 3 capabilities (= implicit) | Cascade rate increases to implicit baseline |
| Capability-based acceptance filter | Replace with implicit acceptance | Poison rate increases (no capability overlap check) |
| Round-robin capability assignment | Replace with random assignment | May change topology × trust interaction if capability distribution matters |
[SEED: sunset after 5 projects if this never produces a shipped artifact | PT-5]
Target practitioners: Multi-agent system builders using CrewAI, LangChain, AutoGen. Estimated ~50K+ active developers. Planned artifacts:
- TwoOfThreeConstraint class in src/trust.py (importable, documented)
- Comparison table: zero-trust vs two-of-three tradeoffs for practitioners
- Architecture recommendation: which trust model for which topology
Deployment path:
pip installor copy trust.py into existing multi-agent project. Drop-in replacement for existing trust model.
[SEED: sunset after 5 projects if this doesn't improve generalization scores | PT-5]
Evaluation conditions (target ≥2 for Tier 2+):
| Condition | Why This Tests Generalization | Data/Setup Required |
|---|---|---|
| 3 topologies (hierarchical, flat, star) | Tests whether constraint effectiveness depends on communication structure | Existing topology implementations |
| 4 agent counts (5, 10, 20, 50) | Tests scaling behavior — does constraint hold at larger systems? | Parameterized simulation |
| 4 trust models compared side-by-side | Establishes relative positioning across defense strategies | All trust models in trust.py |
What constitutes transfer evidence: Two-of-three should reduce cascade relative to implicit trust across ALL topologies and agent counts. If it fails on any topology, that's a boundary condition to document.
These experiments were identified by running the govML pipeline AFTER E7a-c, revealing conditions that governance-first design would have required upfront. Running them now to close the gaps.
Question: Does the round-robin assignment of capabilities matter, or would random assignment produce equivalent results?
| Assignment Strategy | Description | What It Tests |
|---|---|---|
| Round-robin (current) | Agent i gets combo [i % 3] | Deterministic, maximally distributed |
| Random | Each agent gets a random 2-of-3 combo | Whether structural distribution matters vs random partitioning |
| Clustered | First N/3 agents get combo 0, next N/3 get combo 1, rest get combo 2 | Whether geographic clustering of capabilities affects cascade |
Prediction: Round-robin ≈ random (partition distribution shouldn't matter if combos are equal probability). Clustered may differ if topology creates capability-homogeneous neighborhoods.
Question: Can an adversary who knows the two-of-three constraint exploit it? E4 only tested adversaries against the original 3 trust models.
| Adversary Type | Strategy Against Two-of-Three |
|---|---|
| Naive | Standard attack, unaware of capability constraints |
| Constraint-aware | Targets agents with data_access + external_communication combo (the exfiltration-capable pair). Focuses cascade on agents whose 2-of-3 permits the most dangerous action. |
Prediction: Constraint-aware attacker will partially defeat two-of-three by targeting the weakest capability combination, recovering 30-50% of the constraint's advantage (similar to E4 adaptive vs zero-trust recovery of 54%).
Question: Are 5 seeds sufficient for the effect sizes observed in E7a-c?
Method: For each key comparison, compute the 95% CI width and compare to effect size. If CI width > 50% of effect size, 5 seeds is insufficient for that comparison.
| Phase | Activities | Gate | Compute |
|---|---|---|---|
| 0 | Contracts, HYPOTHESIS_REGISTRY, testbed scaffold, data contracts | Gate 0.5 (this doc) | 0 |
| 1 | Build testbed (CrewAI + custom), implement attacks, run E1-E3 | Gate 3 (experiment fields) | ~20 CPU-hrs |
| 2 | Run E4-E6 (adaptive adversary, mixed agents, memory ablation) | Gate 4 (analysis) | ~15 CPU-hrs |
| 3 | FINDINGS.md, figures, statistical tests, blog draft | Gate 6 (publication) | ~5 CPU-hrs |
| 4 | Conference abstract, distribution, Gate 9 | Gate 9 (V-cluster) | 0 |