Authority Hierarchy
Priority Document Role Tier 1 docs/PROJECT_BRIEF.mdPrimary spec — highest authority Tier 2 null # No external FAQClarifications — cannot override Tier 1 Tier 3 docs/ADVERSARIAL_EVALUATION.mdAdvisory only — non-binding if inconsistent with Tier 1/2 Contract This document Implementation detail — subordinate to all tiers above Conflict rule: When a higher-tier document and this contract disagree, the higher tier wins. Update this contract via
CONTRACT_CHANGEor align implementation to the higher tier.
Upstream (this contract depends on):
- None — decisions may reference any contract but have no structural dependency.
Downstream (depends on this contract):
- See CHANGELOG for CONTRACT_CHANGE entries triggered by decisions (cross-reference ADR IDs)
- See RISK_REGISTER for risk entries mitigated by decisions
- See IMPLEMENTATION_PLAYBOOK §5 for change control procedure referencing ADR entries
This log records architectural and methodological decisions for the Agent Security Red-Team Framework project using a lightweight ADR (Architecture Decision Record) format. Each decision captures the context, alternatives, rationale, and consequences so that future changes are informed rather than accidental.
Relationship to CHANGELOG: When a decision triggers a CONTRACT_CHANGE commit, the change MUST also be logged in CHANGELOG with a cross-reference to the ADR ID.
Create a new ADR when:
- A decision affects multiple contracts or specs
- A decision resolves an ambiguity in authority documents
- A decision involves tradeoffs that future contributors need to understand
- A
CONTRACT_CHANGEcommit is triggered by a methodological choice - A risk mitigation strategy is selected from multiple options
Do NOT create an ADR for routine implementation choices that follow directly from a single contract requirement with no alternatives.
Proposed → Accepted → [Superseded by ADR-YYYY]
- Proposed: Under discussion; not yet binding.
- Accepted: Binding; implementation may proceed.
- Superseded: Replaced by a newer ADR. MUST cite the superseding ADR ID. Do NOT delete superseded entries.
Copy this block for each new decision:
## ADR-XXXX: [Short title]
- **Date:** YYYY-MM-DD
- **Status:** Proposed | Accepted | Superseded by ADR-YYYY
### Context
[Problem statement and constraints. Cite authority documents by tier and section.]
### Decision
[The chosen approach. Be specific enough that someone can implement it without ambiguity.]
### Alternatives Considered
| Option | Description | Verdict | Reason |
|--------|-------------|---------|--------|
| A (chosen) | [approach] | **Accepted** | [why best] |
| B | [approach] | Rejected | [why not] |
| C | [approach] | Rejected | [why not] |
### Rationale
[Why this approach is best given the project constraints. Cite authority documents.]
### Consequences
[Tradeoffs and risks. Reference RISK_REGISTER entries if applicable.]
### Contracts Affected
| Contract | Section | Change Required |
|----------|---------|----------------|
| [contract name] | §N | [what changes] |
### Evidence Plan
| Validation | Command / Artifact | Expected Result |
|------------|-------------------|-----------------|
| [what to verify] | [command or file path] | [pass criteria] |(Record decisions below. Number sequentially: ADR-0001, ADR-0002, etc.)
- Date: 2026-03-14
- Status: Accepted
PROJECT_BRIEF §7 requires testing attacks against 3 frameworks (LangChain, CrewAI, AutoGen). Each has a completely different API surface. Need a common interface so attacks are written once and run against all targets. (Tier 1 §3: "Open-source red-team framework" implies reusable architecture, not one-off scripts.)
Abstract base class AgentTarget with 6 methods: setup(), run_task(), reset(), get_system_prompt(), get_tools(), inject_context(). Each framework gets its own implementation. Attacks interact ONLY through this interface.
| Option | Description | Verdict | Reason |
|---|---|---|---|
| A (chosen) | ABC with per-framework implementations | Accepted | Attacks written once, N frameworks. Clean separation. |
| B | Direct framework calls in each attack script | Rejected | 5 attacks × 3 frameworks = 15 scripts. Duplication. |
| C | Adapter pattern with runtime dispatch | Rejected | Over-engineered for 3 targets. ABC is simpler. |
RQ2 requires demonstrating attacks across ≥2 frameworks. The abstraction makes this a config change, not a code change. Also enables the CLI tool (PROJECT_BRIEF §8: "pip install agent-redteam") to accept --agent flags cleanly.
- Each new framework target requires implementing 6 methods (~100 lines)
- Tests must cover the interface contract, not just one framework
inject_context()andget_conversation_history()have framework-specific semantics
| Contract | Section | Change Required |
|---|---|---|
| SCRIPT_ENTRYPOINTS_SPEC | §CLI flags | All scripts accept --agent flag |
| TEST_ARCHITECTURE | §integration | Per-framework smoke tests |
| CONFIGURATION_SPEC | §agents | Agent configs in config/agents/*.yaml |
| Validation | Command / Artifact | Expected Result |
|---|---|---|
| LangChain smoke test | python scripts/smoke_test_agents.py --agent langchain |
PASS — tool call + correct answer |
| Dry-run all agents | python scripts/smoke_test_agents.py --dry-run |
All 3 PASS |
- Date: 2026-03-14
- Status: Accepted
Red-teaming agents requires tools they can misuse. Using real tools (filesystem, network, shell) creates safety risks. Need tools that are interesting enough to demonstrate attacks but can't cause real harm. (PROJECT_BRIEF §3 Out of Scope: "Attacking production/deployed agents")
In-memory tool suite: calculator (safe eval), search_notes (dict lookup), file_reader/file_writer (in-memory dict). reset_tool_state() clears state between scenarios. Tool data includes simulated sensitive content ("api_keys", "secret_plan") for red-team scenarios.
| Option | Description | Verdict | Reason |
|---|---|---|---|
| A (chosen) | In-memory mock tools with simulated sensitive data | Accepted | Safe, deterministic, resettable |
| B | Sandboxed real filesystem (tempdir) | Rejected | Harder to reset, OS-dependent edge cases |
| C | Docker-isolated real tools | Rejected | Heavy infra for a research project |
Mock tools are sufficient to demonstrate all 5 attack classes. Tool misuse attacks target the permission boundary, not the tool implementation. In-memory tools give perfect determinism for multi-seed experiments.
- Findings may not generalize to real tool environments (acknowledged limitation)
reset_tool_state()must be called between scenarios or memory poisoning bleeds across tests
| Contract | Section | Change Required |
|---|---|---|
| ADVERSARIAL_EVALUATION | §3.1b Feature Controllability | Tool outputs = partially controllable |
| DATA_CONTRACT | §data sources | "Self-generated" — no external dataset |
| Validation | Command / Artifact | Expected Result |
|---|---|---|
| Tool isolation | Unit test: reset between scenarios | State cleared |
| Sensitive data accessible | search_notes("api_keys") returns result |
Agent can find simulated secrets |
- Date: 2026-03-14
- Status: Accepted
Agent targets need an LLM backend. PROJECT_BRIEF §7 says "Claude API or OpenAI API — configurable." Need to pick a primary for development velocity while maintaining portability.
Anthropic Claude (claude-sonnet-4-20250514) as primary. config/base.yaml controls provider. LangChain abstracts the LLM layer — switching is a config change. Budget: ~$20-50 in tokens.
| Option | Description | Verdict | Reason |
|---|---|---|---|
| A (chosen) | Claude primary, OpenAI secondary | Accepted | Brand alignment (AI Security Architecture on Anthropic stack), cost-effective with Sonnet |
| B | OpenAI primary | Rejected | No brand alignment, similar cost |
| C | Local LLM (Ollama) | Rejected | Weaker tool-calling, no GPU on VM. Stretch goal. |
Using Claude for red-teaming Claude-powered agents is the most authentic test. Blog post angle: "I red-teamed agents running on the same model family I used to build the framework." Also avoids OPENAI_API_KEY dependency for primary development.
- API costs tracked per-scenario in attack logs
- Must validate key findings on OpenAI backend before publishing (RQ generalization)
| Contract | Section | Change Required |
|---|---|---|
| CONFIGURATION_SPEC | §base | llm.provider and llm.model fields |
| ENVIRONMENT_CONTRACT | §API keys | ANTHROPIC_API_KEY required |