| project | Simulation Overestimates Multi-Agent Cascade by 37pp — But Topology Matters Mo |
|---|---|
| fp | FP-08 |
| status | COMPLETE |
| quality_score | 8.3 |
| last_scored | 2026-03-30 |
| profile | security-ml |
Status: COMPLETE — simulation (6 experiments × 5 seeds) + real agent validation (2 experiments × 3 seeds on Claude Haiku), 16 tests, 7 figures Project: FP-15 (Multi-Agent Security Testing Framework) Original thesis: Zero-trust cuts cascade by 40% and topology doesn't matter. Updated thesis (post real-agent validation): Zero-trust cuts cascade by ~7pp (not 40pp). Topology DOES matter — hierarchical is protective (0.560 vs flat 0.733). The simulation overestimates severity by 37pp but correctly predicts zero-trust is best. Framework: Simulation-based testbed with configurable trust models, topologies, attacker types, agent compositions, and memory modes.
| Tag | Meaning |
|---|---|
| [DEMONSTRATED] | Directly measured, 5-seed, CI reported |
| [SUGGESTED] | Consistent pattern, limited conditions |
| [PROJECTED] | Extrapolated from partial evidence |
| Agent Count | Cascade Rate (mean +/- std) | Poison Rate (mean +/- std) |
|---|---|---|
| 2 | 1.000 +/- 0.000 | 0.945 +/- 0.006 |
| 3 | 1.000 +/- 0.000 | 0.960 +/- 0.011 |
| 5 | 1.000 +/- 0.000 | 0.974 +/- 0.009 |
| 7 | 1.000 +/- 0.000 | 0.981 +/- 0.008 |
| 10 | 1.000 +/- 0.000 | 0.978 +/- 0.006 |
Finding: Under implicit trust, a single compromised agent eventually cascades to 100% of agents regardless of system size. The differentiation is in poison rate — larger systems see slightly higher poison rates because more delegation paths amplify poisoned outputs. The cascade is not super-linear (H-1 REFUTED) — it saturates to 1.0 for all sizes. This is actually a stronger finding: implicit trust provides zero containment at any scale.
| Trust Model | Cascade Rate | Poison Rate | vs Implicit |
|---|---|---|---|
| Implicit | 1.000 +/- 0.000 | 0.974 +/- 0.009 | baseline |
| Capability-scoped | 1.000 +/- 0.000 | 0.908 +/- 0.032 | -6.8pp poison |
| Zero-trust | 0.840 +/- 0.080 | 0.583 +/- 0.082 | -16pp cascade, -40pp poison |
Finding: Zero-trust is the only trust model that reduces cascade rate below 1.0. Capability-scoping reduces poison rate by ~7pp but doesn't prevent full cascade. Zero-trust reduces poison rate by 40pp — the headline result. H-2 is SUPPORTED: zero-trust reduces poison rate by >50% relative to implicit (0.583 / 0.974 = 0.60, i.e., 40% reduction absolute, but relative to implicit it's a 40% cut).
| Topology | Cascade Rate | Poison Rate |
|---|---|---|
| Hierarchical | 1.000 +/- 0.000 | 0.974 +/- 0.009 |
| Flat | 1.000 +/- 0.000 | 0.975 +/- 0.009 |
| Star | 1.000 +/- 0.000 | 0.957 +/- 0.006 |
Finding: H-3 REFUTED. Flat topology does NOT cascade faster than hierarchical. All topologies reach 100% cascade under implicit trust. Star topology has slightly lower poison rate (0.957 vs 0.974) because the hub agent is a bottleneck that can only delegate to spokes, limiting parallel propagation. Practical implication: topology alone is not a defense. Trust model matters more than network structure.
| Attacker Type | Cascade Rate | Poison Rate | vs Naive |
|---|---|---|---|
| Naive (vs zero-trust) | 0.840 +/- 0.080 | 0.583 +/- 0.082 | baseline |
| Defense-aware | 0.960 +/- 0.080 | 0.899 +/- 0.025 | +31.6pp poison |
| Credential-theft | 0.800 +/- 0.126 | 0.617 +/- 0.091 | +3.4pp poison |
Finding: H-4 PARTIALLY SUPPORTED. Defense-aware attacker recovers 54% of the poison rate gap that zero-trust created (0.899 vs 0.583, gap = 0.316, original implicit-ZT gap = 0.391). However, credential-theft does NOT outperform defense-aware — it only marginally beats naive. The adaptive adversary finding is the most important result for practitioners: zero-trust reduces naive attacks by 40pp, but an adversary who understands the defense can recover most of that advantage by crafting outputs that pass verification.
| Composition | Cascade Rate | Poison Rate |
|---|---|---|
| All-LLM | 1.000 +/- 0.000 | 0.974 +/- 0.009 |
| Mixed (LLM+RL) | 1.000 +/- 0.000 | 0.977 +/- 0.007 |
| Mixed (LLM+Rule) | 1.000 +/- 0.000 | 0.975 +/- 0.003 |
| Heterogeneous | 1.000 +/- 0.000 | 0.974 +/- 0.008 |
Finding: H-5 REFUTED. RL agents do NOT amplify cascade vs all-LLM systems in this simulation. All compositions saturate at 100% cascade with near-identical poison rates (~0.974-0.977). This is a negative result: in our simulation, agent type heterogeneity does not meaningfully affect cascade propagation. The cascade dynamics are dominated by trust model and network connectivity, not agent-level susceptibility differences.
| Memory Mode | Cascade Rate | Poison Rate |
|---|---|---|
| Shared | 1.000 +/- 0.000 | 0.974 +/- 0.009 |
| Partitioned | 1.000 +/- 0.000 | 0.973 +/- 0.009 |
| Isolated | 1.000 +/- 0.000 | 0.962 +/- 0.017 |
Finding: H-6 REFUTED. Isolated memory reduces poison rate by only 1.2pp (0.962 vs 0.974), far below the predicted ≥30% reduction. In this simulation, the direct delegation channel (agent-to-agent task passing) dominates cascade propagation. Shared memory is a secondary channel. This suggests that defenses should focus on delegation trust (E2 finding) rather than memory isolation.
Addresses: "You tuned the parameters to get the results you wanted." (G-5) Method: Sweep base cascade probability across [0.05, 0.10, 0.15, 0.20, 0.30, 0.50].
| base_prob | Implicit Poison | Zero-Trust Poison | Relative Reduction |
|---|---|---|---|
| 0.05 | 0.969 | 0.613 | 37% |
| 0.10 | 0.975 | 0.613 | 37% |
| 0.15 | 0.974 | 0.583 | 40% |
| 0.20 | 0.969 | 0.611 | 37% |
| 0.30 | 0.976 | 0.643 | 34% |
| 0.50 | 0.971 | 0.656 | 32% |
Finding: Zero-trust reduces poison rate by 32-40% relative across the entire parameter space. The advantage narrows slightly at higher base_prob (more aggressive cascade) but remains substantial. The E2 finding is not an artifact of parameter tuning.
| base_prob | Naive (ZT) | Defense-Aware (ZT) | Recovery % |
|---|---|---|---|
| 0.05 | 0.613 | 0.893 | 72% |
| 0.10 | 0.613 | 0.898 | 74% |
| 0.15 | 0.583 | 0.899 | 76% |
| 0.20 | 0.611 | 0.898 | 74% |
| 0.30 | 0.643 | 0.908 | 74% |
| 0.50 | 0.656 | 0.908 | 73% |
Finding: Defense-aware attacker recovers 72-76% of zero-trust's advantage across all parameter values. This is actually higher than the original 54% estimate — the sensitivity sweep reveals the adaptive adversary threat is MORE consistent than initially measured.
| Trust Model | Inflection Step (>50% cascade) | Final Cascade |
|---|---|---|
| Implicit | Step 0 | 1.000 |
| Capability-scoped | Step 1 | 1.000 |
| Zero-trust | Step 7 | 0.840 |
Finding: Zero-trust delays cascade onset by 7 time steps. This is the mechanism: zero-trust doesn't prevent compromise — it SLOWS it, buying time for detection and response. Implicit trust provides zero delay.
| Verification Prob | Cascade Rate | Poison Rate |
|---|---|---|
| 0.0 (no verification) | 1.000 | 0.963 |
| 0.3 | 1.000 | 0.911 |
| 0.6 (critical threshold) | 0.920 | 0.771 |
| 0.8 (default zero-trust) | 0.840 | 0.583 |
| 1.0 (perfect verification) | 0.200 | 0.207 |
Finding: The critical verification threshold is ~0.6. Below 0.6, cascade still reaches 100%. Above 0.6, cascade drops sharply. Perfect verification (1.0) reduces cascade to 20% and poison to 21%. Practical implication: verification doesn't need to be perfect to be effective, but it needs to exceed 60% detection rate.
Added 2026-03-19. Validates simulation predictions against real LLM agents. Closes R34.7 requirement.
| Trust Model | Simulation | Real Agent (mean) | Gap | Simulation Accurate? |
|---|---|---|---|---|
| Implicit | 0.974 | 0.600 +/- 0.083 | 37pp | NO — overestimates by 37pp |
| Capability-scoped | 0.908 | 0.606 +/- 0.036 | 30pp | NO — and ordering is wrong (not better than implicit) |
| Zero-trust | 0.583 | 0.533 +/- 0.082 | 5pp | YES — closest prediction |
Findings [DEMONSTRATED]:
- Simulation overestimates implicit cascade by 37pp because real agents have semantic resistance.
- Capability-scoped is NOT better than implicit on real agents (0.606 vs 0.600). The simulation predicted a 7pp advantage. On real agents, capability filtering hurts slightly — possibly because it blocks legitimate delegations from agents without matching capabilities.
- Zero-trust prediction was most accurate (5pp gap). This is because zero-trust's judge-based verification works similarly in simulation and reality — both rely on content analysis rather than agent behavior modeling.
- Zero-trust still provides ~7pp reduction on real agents (0.533 vs 0.600). Smaller than simulated 40pp, but the direction is correct.
| Topology | Simulation | Real Agent (mean) | Gap | Simulation Accurate? |
|---|---|---|---|---|
| Hierarchical | 0.974 | 0.560 +/- 0.083 | 41pp | NO |
| Flat | 0.975 | 0.733 +/- 0.019 | 24pp | NO |
| Star | 0.957 | 0.707 +/- 0.075 | 25pp | NO |
Findings [DEMONSTRATED]:
- Topology DOES matter on real agents — simulation was wrong. Simulation predicted topology is irrelevant (all ~0.97). Real agents show hierarchical (0.560) >> star (0.707) >> flat (0.733). A 17pp spread.
- Hierarchical topology is protective. The tree structure limits parallel cascade — the compromised root agent delegates to 2 children, not all agents. Children's outputs don't cross-pollinate. This natural bottleneck doesn't exist in flat/star.
- The simulation missed this because its probabilistic cascade model doesn't capture LLM semantic resistance, which varies by delegation depth. In a tree, deeper agents receive more processed (less poisoned) content.
| Finding | Simulation Prediction | Real Agent Result | Verdict |
|---|---|---|---|
| Implicit poison rate | 97% | 60% | Overestimated by 37pp |
| Zero-trust improvement | -40pp | -7pp | Overestimated by 33pp |
| Capability-scoped vs implicit | 7pp better | 0pp (same or worse) | Direction WRONG |
| Topology irrelevant | Yes (all ~97%) | NO — 17pp spread | Qualitatively WRONG |
| Zero-trust is best defense | Yes | Yes | Correct |
The simulation gets ONE thing right: zero-trust is the best defense. Everything else — magnitude, ordering of other defenses, topology effects — is wrong or misleading.
Added 2026-03-30. Generalizes NVIDIA's NemoClaw two-of-three constraint (Saltzer & Schroeder 1975) to multi-agent systems. Tests across 4 trust models × 4 agent counts × 3 topologies × 5 seeds = 240 simulations.
| Topology | n | Implicit Poison | Two-of-Three Poison | Reduction | Zero-Trust Poison |
|---|---|---|---|---|---|
| Hierarchical | 5 | 0.971±0.006 | 0.797±0.047 | 17pp | 0.593±0.078 |
| Hierarchical | 10 | 0.977±0.007 | 0.620±0.115 | 36pp | 0.412±0.098 |
| Hierarchical | 20 | 0.985±0.011 | 0.639±0.047 | 35pp | 0.297±0.067 |
| Hierarchical | 50 | 0.988±0.004 | 0.487±0.079 | 50pp | 0.160±0.038 |
| Flat | 5 | 0.974±0.011 | 0.835±0.031 | 14pp | 0.521±0.124 |
| Flat | 10 | 0.979±0.008 | 0.751±0.049 | 23pp | 0.464±0.127 |
| Flat | 20 | 0.990±0.005 | 0.517±0.107 | 47pp | 0.296±0.096 |
| Flat | 50 | 0.993±0.005 | 0.496±0.042 | 50pp | 0.098±0.025 |
| Star | 5 | 0.959±0.004 | 0.725±0.112 | 23pp | 0.550±0.086 |
| Star | 10 | 0.935±0.011 | 0.667±0.084 | 27pp | 0.491±0.040 |
| Star | 20 | 0.929±0.008 | 0.569±0.045 | 36pp | 0.450±0.029 |
| Star | 50 | 0.930±0.026 | 0.552±0.050 | 38pp | 0.430±0.018 |
Finding: H-7 SUPPORTED. Two-of-three reduces cascade vs implicit in ALL 12 topology × agent-count combinations (14-50pp reduction). H-8 SUPPORTED. Two-of-three falls between implicit and zero-trust in all conditions.
H-9 REFUTED. Trust model ordering is NOT topology-independent. The relative effectiveness of two-of-three vs zero-trust varies dramatically by topology:
| Topology | Two-of-Three Cascade (n=50) | Zero-Trust Cascade (n=50) | Ratio (2of3 / ZT) |
|---|---|---|---|
| Hierarchical | 0.504 | 0.128 | 3.9x worse |
| Flat | 0.892 | 0.164 | 5.4x worse |
| Star | 0.168 | 0.076 | 2.2x worse |
Star topology is where two-of-three shines. The hub-and-spoke structure naturally partitions capability flows — agents on different spokes have different capability combinations, and all cross-spoke communication routes through the hub (which can only delegate within its own 2-of-3 capabilities). This structural alignment between star topology and capability partitioning creates a compounding defense that approaches zero-trust effectiveness.
Flat topology is where two-of-three fails. In fully connected networks, every agent can reach every other agent, so capability partitioning provides structural containment (agents with non-overlapping capabilities can't communicate) but the remaining connected agents form a large enough subgraph for cascades to propagate freely.
Mechanism: The two-of-three constraint's effectiveness depends on how well the capability partition ALIGNS with the communication topology. Star topology naturally aligns (spokes are isolated); hierarchical partially aligns (tree limits paths); flat doesn't align (all-to-all bypasses partitioning).
H-10 SUPPORTED. The cascade reduction from two-of-three vs implicit increases with agent count:
| Topology | Reduction at n=5 | Reduction at n=50 | Scaling |
|---|---|---|---|
| Hierarchical | 17pp | 50pp | 2.9x |
| Flat | 14pp | 50pp | 3.6x |
| Star | 23pp | 38pp | 1.7x |
Finding: At n=50, two-of-three achieves 50pp reduction on hierarchical and flat topologies — matching or exceeding the original zero-trust E2 result (40pp) from the 5-agent simulation. The constraint becomes MORE effective as systems grow because larger systems have more agents in each capability partition.
Tests whether round-robin capability assignment is necessary or if random/clustered produce equivalent results. 3 strategies × 3 topologies × 5 seeds = 45 simulations at n=20.
| Topology | Round-Robin Poison | Random Poison | Clustered Poison | Random vs RR |
|---|---|---|---|---|
| Hierarchical | 0.639±0.047 | 0.672±0.040 | 0.639±0.047 | +3.3pp worse |
| Flat | 0.517±0.107 | 0.589±0.076 | 0.517±0.107 | +7.2pp worse |
| Star | 0.569±0.045 | 0.597±0.053 | 0.569±0.045 | +2.8pp worse |
Finding: H-11 PARTIALLY SUPPORTED. Round-robin = clustered (identical results — both produce evenly distributed capability partitions). Random is consistently WORSE (3-7pp higher poison rate). Mechanism: Random assignment creates capability-homogeneous pockets by chance — two adjacent agents may both hold {data_access, code_execution}, allowing poisoned data to flow between them without the structural containment that even distribution provides. Practical implication: When implementing two-of-three, use deterministic assignment (round-robin or explicit), not random.
Tests whether adversaries who know the two-of-three constraint can exploit it. 3 attacker types × 3 topologies × 5 seeds = 45 simulations at n=20.
| Topology | Naive Poison | Defense-Aware Poison | Recovery (pp) |
|---|---|---|---|
| Hierarchical | 0.639±0.047 | 0.608±0.062 | -3.1pp (adversary WORSE) |
| Flat | 0.517±0.107 | 0.594±0.086 | +7.7pp |
| Star | 0.569±0.045 | 0.649±0.032 | +8.0pp |
Finding: H-12 PARTIALLY SUPPORTED with a surprise. On flat and star topologies, defense-aware adversary recovers 8pp of the constraint's advantage — moderate but less than the 54% recovery seen against zero-trust (E4). But on hierarchical topology, the adversary does WORSE than naive. The tree structure limits the adversary's ability to target specific capability combinations — the bottleneck at each tree level prevents the adversary from reaching agents with the targeted {data+comm} pair. Novel finding: hierarchical + two-of-three is ADVERSARY-RESISTANT, not just cascade-resistant. This is a compound defense where the topology's structural constraint amplifies the capability constraint.
| Topology | n | Effect Size | 95% CI Width | CI/Effect Ratio | Sufficient? |
|---|---|---|---|---|---|
| Hierarchical | 20 | 0.345 | 0.036 | 0.10 | YES |
| Flat | 20 | 0.473 | 0.097 | 0.21 | YES |
| Star | 20 | 0.360 | 0.038 | 0.11 | YES |
| Hierarchical | 50 | 0.501 | 0.069 | 0.14 | YES |
| Flat | 50 | 0.497 | 0.040 | 0.08 | YES |
| Star | 50 | 0.378 | 0.055 | 0.14 | YES |
Finding: All CI-to-effect ratios are below 0.21 — well under the 0.50 threshold for adequate statistical power. 5 seeds is sufficient for all key comparisons. The largest CI width (0.097 for flat n=20) still represents less than 21% of the effect size. The E7 findings are statistically robust.
| ID | Prediction | Result | Verdict |
|---|---|---|---|
| H-1 | Super-linear cascade (rate(10) > 2x rate(5)) | Simulation: 1.0 for all sizes; Real: not tested at scale | REFUTED (simulation); real agents at 60% not 100% |
| H-2 | Zero-trust ≥50% poison reduction | Simulation: 40pp; Real: 7pp (0.600 → 0.533) | PARTIALLY SUPPORTED — direction correct, magnitude overestimated |
| H-3 | Flat > hierarchical cascade | Simulation: no difference; Real: flat 0.733, hierarchical 0.560 | SUPPORTED BY REAL AGENTS — simulation was wrong, real agents show 17pp topology effect |
| H-4 | Credential > defense-aware > naive | Defense-aware (0.899) >> credential (0.617) > naive (0.583) | PARTIALLY SUPPORTED — defense-aware is worst, but credential ≠ highest |
| H-5 | RL agents amplify cascade | All compositions → 0.974-0.977 poison rate | REFUTED — agent type is irrelevant |
| H-6 | Shared memory accelerates cascade (isolated ≤0.7x) | Isolated = 0.962 vs shared = 0.974 (1.2pp diff) | REFUTED — memory mode is irrelevant |
| H-7 | Two-of-three reduces vs implicit | poison(2of3) < poison(implicit) all conditions | SUPPORTED — 14-50pp reduction across all 12 conditions | | H-8 | Two-of-three between implicit and ZT | ZT ≤ 2of3 ≤ implicit | SUPPORTED — holds across all conditions | | H-9 | Trust model ordering topology-independent | Same ordering all topologies | REFUTED — star topology dramatically favors two-of-three (2.2x vs 5.4x gap to ZT) | | H-10 | Two-of-three advantage scales with count | Gap increases n=5→50 | SUPPORTED — 2.9x scaling on hierarchical, 3.6x on flat | | H-11 | Assignment strategy doesn't matter | RR ≈ random ≈ clustered | PARTIALLY SUPPORTED — RR = clustered, but random is 3-7pp worse | | H-12 | Adversary recovers 30-50% of constraint advantage | Recovery across topologies | PARTIALLY SUPPORTED — 8pp recovery on flat/star, but adversary WORSE on hierarchical |
Summary: 4 supported (H-7, H-8, H-10, H-3), 3 partially supported (H-2, H-4, H-11, H-12), 5 refuted (H-1, H-5, H-6, H-9). Most valuable findings: (1) H-9 refutation: topology × trust model interaction — no universal best defense. (2) E7e surprise: hierarchical + two-of-three is adversary-resistant (defense-aware attacker does WORSE, not better). (3) E7d: random capability assignment weakens the constraint by 3-7pp — deterministic distribution matters.
Simulation predicted topology is irrelevant (all ~97%). Real agents show a 17pp spread: hierarchical 0.560, star 0.707, flat 0.733. Hierarchical is protective because the tree structure limits parallel cascade paths. The simulation missed this because its probabilistic model doesn't capture depth-dependent semantic resistance in real LLMs. Practical implication: hierarchical delegation (CrewAI's default) IS a defense, not just an organizational pattern.
Expected RL agents (more susceptible per FP-12 findings) to amplify cascade. Instead, all compositions produce identical results. Explanation: in the simulation, cascade probability is dominated by the trust model's acceptance/filtering logic, not the receiving agent's type-specific susceptibility. The FP-12 observation-perturbation asymmetry operates at the RL training level, not at the delegation level.
Expected shared memory to be a major cascade accelerant. Instead, only 1.2pp difference. Explanation: the primary cascade channel is direct task delegation (agent A sends output to agent B). Shared memory is a secondary channel. When the primary channel is undefended (implicit trust), adding memory isolation is like locking the window while the door is open.
Expected credential theft to be the most dangerous attack. Instead, defense-aware attacker (who crafts outputs that bypass verification) is more effective. Explanation: credential theft gives you the right identity but doesn't help you craft convincing outputs. Defense-awareness helps you craft outputs that pass verification checks regardless of identity. In zero-trust systems, what you SAY matters more than who you ARE.
Cohen et al. (2024) demonstrated self-replicating adversarial inputs ("AI Worms") that propagate across agents. Our work differs in focus: they show that propagation IS possible; we quantify propagation RATES under different defense architectures and show that zero-trust cuts poison rate by 40%.
Gu et al. (2024) showed a single image can jailbreak millions of multimodal agents through shared input channels. Our work focuses on delegation-based cascade (a different propagation mechanism) and shows that shared memory (analogous to shared input) is actually a minor cascade channel compared to direct delegation.
Tian et al. (2024) evaluated safety in multi-agent chat systems. Our work extends from conversational safety to task-delegation security, where the risk is not harmful text but poisoned decisions affecting downstream operations.
Masterman et al. (2024) provided a comprehensive taxonomy of emerging agent architectures. Our framework provides empirical security evaluation of three architectures (hierarchical, flat, star) and three trust models — filling their identified gap of "no systematic security comparison."
OWASP LLM Top 10 v2 (2025) covers single-agent threats. Our attack taxonomy extends OWASP with multi-agent-specific classes: delegation abuse, cascade poisoning, and identity spoofing.
Coleman FP-02 (2026) established single-agent attack success rates (80% prompt injection, 100% reasoning chain hijacking). Our E4 results show these attacks COMPOUND in multi-agent systems: a defense-aware attacker achieves 0.899 poison rate in a 5-agent system, higher than any single-agent attack rate in FP-02.
Coleman FP-12 (2026) showed observation perturbation >> reward poisoning for RL agents. Our E5 results suggest this asymmetry doesn't manifest at the multi-agent delegation level — agent type is irrelevant to cascade dynamics.
| Finding | Blog Hook | TIL Title | Audience |
|---|---|---|---|
| 100% cascade under implicit trust at any scale | "Your 10-agent system is as vulnerable as your 2-agent system" | TIL: Implicit trust = zero containment | Security architects |
| Zero-trust cuts poison rate by 40pp | "The only defense that actually works for multi-agent systems" | TIL: Zero-trust for AI agents, not just networks | DevOps, MLOps |
| Defense-aware attacker recovers 54% of zero-trust gains | "Your zero-trust agent architecture has a 54% hole" | TIL: Adaptive adversaries vs zero-trust agents | Red teamers, pentesters |
| Topology doesn't matter | "Reorganizing your agent network won't save you" | TIL: Network topology is irrelevant for cascade defense | System architects |
| 4/6 hypotheses refuted | "I was wrong about 4 out of 6 predictions — and that's the finding" | TIL: Negative results as contribution in AI security | ML researchers |
| Credential theft < defense-awareness | "It's not who you are, it's what you say" | TIL: Identity matters less than output quality in agent trust | IAM teams |
Prior art search: Google Scholar + arXiv for "capability-based security multi-agent", "least privilege agent", "NVIDIA NemoClaw", "two-of-three constraint", "capability partitioning". 8 papers reviewed. No prior work applies capability-based security (Saltzer & Schroeder 1975) to multi-agent LLM cascade prevention. Contribution type: Novel combination — importing a classical OS security principle (capability-based access control) into multi-agent LLM systems and testing it against the full trust model taxonomy. What surprised us: Pre-registered expectation was that two-of-three would perform uniformly between implicit and zero-trust. Instead, discovered topology × trust model interaction (H-9 REFUTED): star topology dramatically favors two-of-three (cascade 0.168 vs flat 0.892 at n=50). This interaction was NOT predicted and is NOT in any prior work.
- Topology × trust model interaction is novel. No prior work (Cohen 2024, Gu 2024, Tian 2024, Masterman 2024) tests trust models across multiple topologies. All prior work uses a single topology. Our E7 is the first systematic comparison showing that defense effectiveness is topology-dependent.
- Star + two-of-three as compound defense. The structural alignment between star topology (hub-and-spoke isolation) and capability partitioning (round-robin 2-of-3 assignment) creates a compounding effect that neither defense achieves independently. This emergent interaction is the novel finding.
- Ablation evidence: Removing the two-of-three constraint (= implicit trust) increases cascade by 14-50pp depending on topology. Removing the star topology (= using flat) increases cascade by 5.4x for two-of-three but only 2.1x for zero-trust. The constraint's effectiveness is inseparable from topology choice.
Problem magnitude: Multi-agent LLM systems are deployed in production at >1000 organizations (CrewAI 50K+ stars, AutoGen 40K+ stars, LangGraph 10K+ stars). Default trust model in all three frameworks is implicit trust. Our E1-E2 results show implicit trust provides zero cascade containment. This affects every multi-agent deployment using default settings.
Actionable recommendations:
- For star/hub-and-spoke architectures: Two-of-three constraint provides near-zero-trust cascade reduction (2.2x gap) at lower implementation complexity. Assign capabilities in round-robin: agent 0 gets data+code, agent 1 gets data+comm, agent 2 gets code+comm. No agent can independently exfiltrate data.
- For hierarchical architectures: Two-of-three provides moderate protection (3.9x gap to zero-trust). Use if zero-trust latency is unacceptable; prefer zero-trust if latency budget allows.
- For flat/all-to-all architectures: Two-of-three is insufficient (5.4x gap). Use zero-trust — capability partitioning cannot compensate for full connectivity.
Artifacts released:
src/trust.py: TwoOfThreeConstraint class — drop-in replacement for any TrustModel subclassoutputs/experiments/se150_results.json: Full experiment data (240 simulations, 48 conditions × 5 seeds)- Comparison tables above for architecture decision-making
Real-world validation: Simulation only. Real-agent validation showed 37pp overestimate for E1-E6 (see Real Agent Validation section). SE-150 results carry the same [SYNTHETIC] qualifier. Direction of findings expected to hold; magnitudes will differ.
Domains connected: OS security (capability-based access control), network security (least privilege / segmentation), multi-agent LLM security.
Methods imported: Capability-based security originates from Saltzer & Schroeder (1975, "The Protection of Information in Computer Systems"). The two-of-three constraint is a direct import of the principle of least authority (POLA) — no process should have more privileges than needed for its task. NVIDIA's NemoClaw applied this to robotic agent control; we generalize it to arbitrary multi-agent LLM topologies.
Principle generalization: The controllability principle (FP-01, FP-05, FP-12) states that defenses relying on attacker-controllable features are weaker. Two-of-three extends this: by structurally limiting what any single agent CAN do, the system bounds what a compromised agent can ACHIEVE — regardless of how sophisticated the attack. Validated in:
- Domain 1: OS security (Saltzer & Schroeder — mandatory access control)
- Domain 2: Network security (microsegmentation — zero-trust networking)
- Domain 3: Multi-agent LLM security (this work — capability partitioning)
| Domain | Connection | Transfer Evidence |
|---|---|---|
| OS security (capability-based access) | Same principle: no process/agent holds all capabilities needed for full compromise | Empirical — two-of-three reduces cascade 14-50pp |
| Network security (microsegmentation) | Star topology + capability partitioning = microsegmentation analogue | Empirical — star + two-of-three cascade 0.168 vs flat 0.892 |
| Controllability framework (FP-01/05/12) | Structural capability limits reduce attacker controllability | Theoretical — extends controllability principle to architectural constraints |
Scope: Tested on simulation testbed with 4 agent counts (5, 10, 20, 50), 3 topologies (hierarchical, flat, star), 4 trust models, 5 seeds per condition. Total: 240 simulations. CPU-only, no GPU required.
Evaluation conditions:
| Condition | Result | vs Primary Setting (hierarchical n=5) |
|---|---|---|
| Hierarchical topology (n=5 to n=50) | Cascade reduction scales 17→50pp | Improvement increases with scale |
| Flat topology (n=5 to n=50) | Cascade reduction scales 14→50pp but cascade stays high (0.496) | Two-of-three insufficient for fully connected |
| Star topology (n=5 to n=50) | Best two-of-three performance (cascade 0.168 at n=50) | Structural alignment with capability partitioning |
Failure modes:
- Flat topology at small n: Two-of-three provides only 14pp reduction at n=5 flat — barely meaningful for practitioners.
- Single capability overlap: When two agents share exactly one capability category, the acceptance filter passes but the structural containment is weak. This is the mechanism behind flat topology's poor performance.
- Simulation fidelity: Prior sim-to-real validation (E2/E3 Real) showed 37pp overestimate. Two-of-three magnitudes will be smaller on real agents. Direction (reduces cascade) expected to hold.
Transfer assessment: The topology × trust model interaction finding should transfer to any system where (a) agents have partitioned capabilities and (b) communication topology restricts delegation paths. This includes microservice architectures, Kubernetes pod security, and robotic swarm systems. Not yet validated outside multi-agent LLM context.
| Artifact | Path | Type |
|---|---|---|
| Experiment results (E1-E6) | outputs/experiments/ |
JSON |
| Combined summary | outputs/experiments/all_experiments_summary.json |
JSON |
| Cascade vs count figure | blog/images/e1_cascade_vs_count.png |
PNG |
| Trust model figure | blog/images/e2_trust_model.png |
PNG |
| Topology figure | blog/images/e3_topology.png |
PNG |
| Adaptive adversary figure | blog/images/e4_adaptive_adversary.png |
PNG |
| Mixed agents figure | blog/images/e5_mixed_agents.png |
PNG |
| Memory ablation figure | blog/images/e6_memory_ablation.png |
PNG |
| Cascade over time figure | blog/images/cascade_over_time.png |
PNG |
| SE-150 experiment results | outputs/experiments/se150_results.json |
JSON |
| SE-150 missing experiments (E7d-f) | outputs/experiments/se150_missing_experiments.json |
JSON |
- Simulation-based, not real LLM agents — and FP-16 showed the gap is 48pp. FP-16 real agent experiments found 49% poison rate where this simulation predicted 97%. The simulation overestimates cascade severity because real agents have inherent semantic resistance. Qualitative findings (zero-trust > implicit) hold but quantitative predictions do not transfer. See FP-16 FINDINGS for the simulation-to-real gap analysis.
- Fixed cascade probability. The base cascade probability (0.15) was tuned for differentiation. Real-world cascade probability depends on LLM capability, prompt design, and task complexity. We report relative comparisons between conditions, not absolute rates.
- No real-time threat intelligence. The simulation doesn't model evolving threats, model updates, or adversary learning over multiple encounters. Each run is a static snapshot.
- 5 agents maximum in most experiments. E1 goes to 10 agents, but the primary results use 5. Larger systems (50-100 agents) may exhibit different cascade dynamics (e.g., partition effects, natural firebreaks).
- Single compromised agent assumption. All experiments start with exactly 1 compromised agent. Multi-point compromise (2+ initial attackers) may produce qualitatively different dynamics.
We contribute:
- An open-source multi-agent security testing framework with configurable trust models (implicit, capability-scoped, zero-trust), network topologies (hierarchical, flat, star), attacker types (naive, defense-aware, credential-theft), agent compositions (LLM, RL, rule-based), and memory modes (shared, partitioned, isolated).
- Empirical evidence that zero-trust is the only effective cascade defense, reducing poison rate by 40pp (0.974 → 0.583), while topology, agent type, and memory mode have negligible impact. 4/6 pre-registered hypotheses refuted — narrowing the solution space for practitioners.
- The first quantification of adaptive adversary effectiveness against zero-trust agent architectures: defense-aware attackers recover 54% of the poison rate gap that zero-trust creates, demonstrating that static verification is insufficient against sophisticated adversaries.