ACR Framework™ — Implementation Guide

Version: 1.0
Status: Draft
Last Updated: March 2026

This guide provides reference architectures, deployment patterns, and integration guidance for implementing the ACR runtime control plane in production environments.

Overview
Prerequisites
Architecture Decision Framework
Reference Architecture: Kubernetes + OPA
Reference Architecture: AWS Serverless + Cedar
Reference Architecture: API Gateway + Custom
Pillar-by-Pillar Implementation
Telemetry & Observability Setup
Policy Authoring Guide
Testing & Validation
Migration Path
Operational Runbook

Overview

The ACR control plane sits between autonomous AI systems and enterprise resources. It intercepts agent actions at runtime and applies identity verification, policy enforcement, drift detection, observability, containment, and human authority controls before those actions reach downstream systems.

This guide covers three reference deployment patterns, with step-by-step integration guidance for each of the six ACR pillars.

What You're Building

┌─────────────────────────────────────────────────────────┐
│                    AI Agent Runtime                       │
│  (LangGraph, CrewAI, AutoGen, custom orchestrator, etc.) │
└──────────────────────┬──────────────────────────────────┘
                       │  action request
                       ▼
┌─────────────────────────────────────────────────────────┐
│               ACR CONTROL PLANE                          │
│                                                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐  │
│  │ Identity │ │  Policy  │ │  Drift   │ │Observabil.│  │
│  │ Binding  │ │ Enforce. │ │ Detect.  │ │  Logging  │  │
│  └──────────┘ └──────────┘ └──────────┘ └───────────┘  │
│  ┌──────────────────┐ ┌────────────────────────────┐    │
│  │   Containment    │ │     Human Authority        │    │
│  └──────────────────┘ └────────────────────────────┘    │
└──────────────────────┬──────────────────────────────────┘
                       │  approved action
                       ▼
┌─────────────────────────────────────────────────────────┐
│              Enterprise Resources                        │
│  (APIs, databases, tools, external services, workflows)  │
└─────────────────────────────────────────────────────────┘

Design Principles

Fail-secure by default. If the control plane is unreachable, agent actions are blocked — not allowed.
Latency budget: <200ms total. The control plane must not make agents unusable. Target 50ms for identity, 100ms for policy, 20ms per action authorization.
Policy-as-code. All enforcement rules are versioned, testable, and deployable through CI/CD — not configured through UI clicks.
Agent-agnostic. The control plane works with any agent framework. It intercepts actions, not prompts.
Observable by default. Every control plane decision is logged with correlation IDs for end-to-end trace reconstruction.

Prerequisites

Before implementing ACR, ensure your organization has:

Prerequisite	Why It Matters	Minimum Viable
Agent inventory	You cannot control what you haven't catalogued	Spreadsheet of all deployed AI agents with owners, purposes, and tools
Identity infrastructure	Agents need machine identities	Any IAM system — cloud IAM, Vault, SPIFFE, or even API keys with rotation
Policy ownership	Someone must define what agents can and cannot do	Designated owner per agent (engineering lead or product owner)
Logging infrastructure	Observability requires a destination	Any structured log pipeline — ELK, Datadog, CloudWatch, or even structured files
Incident response process	Containment requires a playbook	Basic runbook: who gets paged, how to kill an agent, post-incident review process

Maturity Levels

Level 0 — No Control: Agents call tools directly. No identity, no policy, no logging.

Level 1 — Catalogued: Agent registry exists. Owners assigned. No runtime enforcement.

Level 2 — Observed: Actions are logged. Basic dashboards exist. No enforcement.

Level 3 — Enforced: Policy-as-code governs agent actions. Identity verified at runtime. Kill switches tested.

Level 4 — Adaptive: Drift detection active. Automated containment triggers. Continuous control monitoring.

Most organizations start at Level 0 or 1. This guide takes you to Level 3 with a clear path to Level 4.

Architecture Decision Framework

Choose your deployment pattern based on your existing infrastructure:

If You Have...	Use This Pattern	Why
Kubernetes + service mesh	K8s + OPA	Native admission control, SPIFFE identity, sidecar enforcement
AWS-heavy with Lambda/Step Functions	AWS Serverless + Cedar	IAM-native identity, Cedar for fine-grained authz, CloudWatch observability
API gateway (Kong, Envoy, custom)	API Gateway + Custom	Reverse proxy enforcement, works with any backend, enterprise IAM integration
Multiple clouds or hybrid	API Gateway + Custom	Cloud-agnostic, portable policy engine
Early stage / small team	API Gateway + Custom	Simplest to start, fewest dependencies

Reference Architecture: Kubernetes + OPA

Components

Component	Technology	ACR Pillar
Agent Identity	SPIFFE/SPIRE workload identity	Pillar 1: Identity & Purpose Binding
Policy Engine	Open Policy Agent (OPA) with Rego	Pillar 2: Behavioral Policy Enforcement
Drift Detection	Prometheus metrics + custom anomaly detector	Pillar 3: Autonomy Drift Detection
Observability	OpenTelemetry Collector → backend	Pillar 4: Execution Observability
Containment	K8s NetworkPolicy + external kill switch	Pillar 5: Self-Healing & Containment
Approval Queue	Custom service + Slack/PagerDuty integration	Pillar 6: Human Authority

Deployment Topology

┌─────────────────── Kubernetes Cluster ───────────────────┐
│                                                           │
│  ┌─────────────┐     ┌──────────────────────────────┐    │
│  │  AI Agent    │────▶│  ACR Sidecar (Envoy + OPA)   │    │
│  │  (Pod)       │     │  - Token validation           │    │
│  │             │     │  - Policy evaluation           │    │
│  │  SPIFFE ID: │     │  - Action logging              │    │
│  │  spiffe://   │     │  - Drift signal collection     │    │
│  │  acr/agent/  │     └──────────┬───────────────────┘    │
│  │  support-01  │                │                         │
│  └─────────────┘                ▼                         │
│                      ┌──────────────────┐                 │
│                      │  OPA Bundle      │                 │
│                      │  Server          │                 │
│                      │  (policy repo)   │                 │
│                      └──────────────────┘                 │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │  ACR Control Plane Services                         │  │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────────┐  │  │
│  │  │ Agent      │ │ Drift      │ │ Approval       │  │  │
│  │  │ Registry   │ │ Detector   │ │ Queue          │  │  │
│  │  └────────────┘ └────────────┘ └────────────────┘  │  │
│  │  ┌────────────┐ ┌────────────┐                     │  │
│  │  │ Kill Switch│ │ OTel       │                     │  │
│  │  │ Controller │ │ Collector  │                     │  │
│  │  └────────────┘ └────────────┘                     │  │
│  └─────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────┘

Step 1: Agent Identity (SPIFFE/SPIRE)

# spire-agent-entry.yaml
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: customer-support-agent
spec:
  spiffeIDTemplate: "spiffe://acr.example.com/agent/customer-support-01"
  podSelector:
    matchLabels:
      acr.io/agent-id: customer-support-01
  namespaceSelector:
    matchLabels:
      acr.io/environment: production

Create an ACR agent manifest that binds identity to purpose:

# acr-agent-manifest.yaml
apiVersion: acr.io/v1
kind: AgentManifest
metadata:
  name: customer-support-01
spec:
  identity:
    spiffeId: "spiffe://acr.example.com/agent/customer-support-01"
    owner: "support-engineering@example.com"
  purpose:
    description: "Handle customer support tickets and issue resolutions"
    riskTier: medium
  capabilities:
    allowedTools:
      - query_customer_db
      - send_email
      - create_ticket
      - search_knowledge_base
    forbiddenTools:
      - delete_customer
      - issue_refund_above_100
      - modify_billing
    dataAccess:
      - resource: customer_db
        permission: READ
      - resource: ticket_db
        permission: READ_WRITE
      - resource: billing_db
        permission: NONE
  boundaries:
    maxActionsPerMinute: 30
    maxCostPerHourUsd: 5.00
    allowedRegions: ["us-east-1", "us-west-2"]
    credentialRotationDays: 90

Step 2: Policy Enforcement (OPA/Rego)

Create Rego policies for the agent:

# policies/customer_support.rego
package acr.customer_support

import future.keywords.in

# Default deny
default allow := false

# Allow actions only for verified agents with valid purpose
allow {
    input.agent.spiffe_id != ""
    input.agent.purpose == "customer_support"
    tool_allowed
    not data_violation
    not rate_exceeded
}

# Tool allowlist enforcement
tool_allowed {
    input.action.tool_name in data.agent_manifest.capabilities.allowedTools
}

# Block forbidden tools
deny["Forbidden tool invocation"] {
    input.action.tool_name in data.agent_manifest.capabilities.forbiddenTools
}

# PII redaction requirement
deny["PII detected in outbound payload"] {
    input.action.tool_name == "send_email"
    regex.match(`\d{3}-\d{2}-\d{4}`, input.action.parameters.body)
}

# Spend limit enforcement
deny["Hourly spend limit exceeded"] {
    input.context.hourly_spend_usd > data.agent_manifest.boundaries.maxCostPerHourUsd
}

# Rate limiting
rate_exceeded {
    input.context.actions_this_minute > data.agent_manifest.boundaries.maxActionsPerMinute
}

Deploy policies via OPA Bundle Server:

# Build and push policy bundle
opa build -b policies/ -o bundle.tar.gz
aws s3 cp bundle.tar.gz s3://acr-policy-bundles/customer-support/bundle.tar.gz

# OPA sidecar config
opa run --server \
  --config-file=/etc/opa/config.yaml \
  --addr=localhost:8181

Step 3: Kill Switch (Independent Controller)

Deploy a kill switch controller that operates outside the agent runtime:

# acr-kill-switch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: acr-kill-switch
  namespace: acr-system
spec:
  replicas: 3  # HA deployment
  selector:
    matchLabels:
      app: acr-kill-switch
  template:
    spec:
      containers:
      - name: kill-switch
        image: acr/kill-switch-controller:1.0
        env:
        - name: KILL_SWITCH_MODE
          value: "external"  # Independent of agent runtime
        - name: NOTIFICATION_WEBHOOK
          value: "https://hooks.slack.com/services/..."
        ports:
        - containerPort: 8443

Kill switch activation via API:

# Manual kill switch activation
curl -X POST https://acr-kill-switch.acr-system.svc:8443/kill \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -d '{
    "agent_id": "customer-support-01",
    "reason": "Anomalous refund pattern detected",
    "operator": "oncall@example.com",
    "action": "isolate",
    "duration_minutes": 60
  }'

# Automated kill via drift detector webhook
# Configured in drift-detector → kill-switch integration

NetworkPolicy for immediate isolation:

# acr-isolation-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: acr-isolate-agent
spec:
  podSelector:
    matchLabels:
      acr.io/isolated: "true"
  policyTypes:
  - Egress
  egress: []  # Block all outbound traffic

Reference Architecture: AWS Serverless + Cedar

Components

Component	Technology	ACR Pillar
Agent Identity	AWS IAM Roles + Cognito machine credentials	Pillar 1
Policy Engine	Amazon Verified Permissions (Cedar)	Pillar 2
Drift Detection	CloudWatch Anomaly Detection + Lambda	Pillar 3
Observability	CloudWatch Logs + X-Ray traces	Pillar 4
Containment	Lambda kill switch + IAM policy revocation	Pillar 5
Approval Queue	Step Functions + SNS/SQS	Pillar 6

Deployment Topology

┌─────── AWS Account ────────────────────────────────────┐
│                                                         │
│  ┌──────────────┐    ┌────────────────────────────┐    │
│  │  AI Agent     │───▶│  ACR Authorizer Lambda     │    │
│  │  (Lambda /    │    │  - Validate IAM identity   │    │
│  │   ECS Task)   │    │  - Cedar policy evaluation │    │
│  │              │    │  - Action logging (X-Ray)   │    │
│  │  IAM Role:   │    └───────────┬────────────────┘    │
│  │  acr-agent-  │                │                      │
│  │  support-01  │                ▼                      │
│  └──────────────┘    ┌────────────────────────────┐    │
│                      │  Verified Permissions       │    │
│                      │  (Cedar Policy Store)       │    │
│                      └────────────────────────────┘    │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  ACR Control Services                            │   │
│  │  ┌──────────┐ ┌──────────┐ ┌────────────────┐  │   │
│  │  │ Registry │ │ Drift    │ │ Step Functions │  │   │
│  │  │ (DynamoDB)│ │ Detector │ │ (Approvals)    │  │   │
│  │  └──────────┘ └──────────┘ └────────────────┘  │   │
│  │  ┌──────────┐ ┌──────────────────────────────┐  │   │
│  │  │ Kill Sw. │ │ CloudWatch + X-Ray           │  │   │
│  │  │ (Lambda) │ │ (Observability)              │  │   │
│  │  └──────────┘ └──────────────────────────────┘  │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Cedar Policy Example

// ACR Policy: Customer Support Agent Permissions

permit(
  principal == Agent::"customer-support-01",
  action == Action::"invoke_tool",
  resource == Tool::"query_customer_db"
) when {
  principal.riskTier == "medium" &&
  principal.purpose == "customer_support" &&
  context.hourlySpendUsd < 5.00
};

permit(
  principal == Agent::"customer-support-01",
  action == Action::"invoke_tool",
  resource == Tool::"send_email"
) when {
  principal.purpose == "customer_support" &&
  !context.payload.contains_pii
};

// Explicit deny for high-risk actions
forbid(
  principal == Agent::"customer-support-01",
  action == Action::"invoke_tool",
  resource in [Tool::"delete_customer", Tool::"modify_billing"]
);

Reference Architecture: API Gateway + Custom

Best for teams wanting a cloud-agnostic, portable control plane.

Components

Component	Technology	ACR Pillar
Agent Identity	API keys with rotation + enterprise SSO	Pillar 1
Policy Engine	OPA (embedded) or custom rules engine	Pillar 2
Drift Detection	Custom metrics + threshold alerting	Pillar 3
Observability	Structured JSON logs → SIEM	Pillar 4
Containment	API key revocation + rate limiting	Pillar 5
Approval Queue	Webhook → ticketing system	Pillar 6

How It Works

The control plane deploys as a reverse proxy (Envoy, Kong, or custom) that all agent-to-resource traffic passes through:

Agent → ACR Proxy → [Identity Check] → [Policy Check] → [Log] → Resource
                         │                    │              │
                         ▼                    ▼              ▼
                    Agent Registry       Policy Engine    Log Pipeline

Minimal Viable Implementation (Python)

For teams starting small, here's a minimal ACR control plane in Python:

# acr_control_plane.py — Minimal ACR proxy

from fastapi import FastAPI, Request, HTTPException
from datetime import datetime, timezone
import json, uuid, httpx

app = FastAPI(title="ACR Control Plane")

# ─── Agent Registry (Pillar 1) ─────────────────────────
AGENT_REGISTRY = {
    "customer-support-01": {
        "owner": "support-engineering@example.com",
        "purpose": "customer_support",
        "risk_tier": "medium",
        "allowed_tools": ["query_customer_db", "send_email", "create_ticket"],
        "forbidden_tools": ["delete_customer", "issue_refund_above_100"],
        "max_actions_per_minute": 30,
    }
}

# ─── Policy Engine (Pillar 2) ──────────────────────────
def evaluate_policy(agent_id: str, action: dict) -> dict:
    agent = AGENT_REGISTRY.get(agent_id)
    if not agent:
        return {"decision": "deny", "reason": "Unknown agent"}
    
    tool = action.get("tool_name")
    if tool in agent["forbidden_tools"]:
        return {"decision": "deny", "reason": f"Forbidden tool: {tool}"}
    if tool not in agent["allowed_tools"]:
        return {"decision": "deny", "reason": f"Unauthorized tool: {tool}"}
    
    return {"decision": "allow", "reason": "Policy passed"}

# ─── Observability (Pillar 4) ──────────────────────────
def log_event(agent_id: str, action: dict, decision: dict):
    event = {
        "acr_version": "1.0",
        "event_id": str(uuid.uuid4()),
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "agent_id": agent_id,
        "action": action,
        "decision": decision,
    }
    # In production: send to your log pipeline
    print(json.dumps(event))

# ─── Control Plane Endpoint ────────────────────────────
@app.post("/acr/evaluate")
async def evaluate_action(request: Request):
    body = await request.json()
    agent_id = body.get("agent_id")
    action = body.get("action")
    
    # Pillar 1: Identity check
    if agent_id not in AGENT_REGISTRY:
        log_event(agent_id, action, {"decision": "deny", "reason": "Unknown agent"})
        raise HTTPException(status_code=403, detail="Agent not registered")
    
    # Pillar 2: Policy evaluation
    decision = evaluate_policy(agent_id, action)
    
    # Pillar 4: Log everything
    log_event(agent_id, action, decision)
    
    if decision["decision"] == "deny":
        raise HTTPException(status_code=403, detail=decision["reason"])
    
    return {"status": "approved", "correlation_id": str(uuid.uuid4())}

# ─── Kill Switch Endpoint (Pillar 5) ──────────────────
KILLED_AGENTS = set()

@app.post("/acr/kill")
async def kill_agent(request: Request):
    body = await request.json()
    agent_id = body.get("agent_id")
    KILLED_AGENTS.add(agent_id)
    log_event(agent_id, {"action": "kill_switch"}, 
              {"decision": "killed", "reason": body.get("reason")})
    return {"status": "killed", "agent_id": agent_id}

Pillar-by-Pillar Implementation