Zero-trust guardrails for AI agents. "Can't" is stronger than "shouldn't."
⚠️ This is a concept proposal and proof-of-concept implementation.Equilibrium Guard is an exploration of what zero-trust security for AI agents could look like. The code is functional for demonstration purposes, but this is not production-ready software. We're sharing this to start a conversation about AI agent security patterns and invite collaboration.
Proof-of-concept dashboard showing trust score, risk budget, operation mind map, decision storyline, and drift alerts.
Equilibrium Guard is a concept proposal for zero-trust security in AI agents. It explores:
- Constraint Validation — Operations checked against rules before execution
- Risk-Weighted Autonomy — Safe operations are free; risky ones cost budget
- Dynamic Trust — Good behavior builds trust; warnings deplete it
- Drift Detection — Catches patterns like escalating access or speed anomalies
- Real-Time Dashboard — Watch your agent's decisions as they happen
Think of it like ThreatLocker or SentinelOne, but for AI operations.
AI agents are getting more capable and more autonomous. The security tooling hasn't kept up. We built this concept to:
- Start a conversation about AI agent security patterns
- Prototype ideas that could become real products
- Invite collaboration from the community
This is a sketch, not a finished building. If these ideas resonate, let's build something real together.
Traditional AI safety relies on prompts and policies — "the agent should do X" or "shouldn't do Y." This is fundamentally weak because:
- Prompts can be forgotten, overridden, or injected
- Policy documents aren't enforced computationally
- Post-hoc logging catches problems too late
Equilibrium Guard enforces "can't" instead of "shouldn't":
Traditional: Request → Process → Check Policy → "You shouldn't" → Maybe blocked
Zero-Trust: Request → Validate → Invalid? REJECTED → Valid? Execute
Operations that violate constraints are rejected at the computational level, before execution. Not blocked by policy — structurally impossible to proceed.
pip install equilibrium-guardOr clone and install:
git clone https://github.com/rizqcon/equilibrium-guard
cd equilibrium-guard
pip install -e .from equilibrium_guard import create_guard
# Initialize with zero-trust defaults
guard = create_guard(mode='enforce')
# Human sends a message — update anchor
guard.on_human_message()
# Before any operation
can_proceed, issues = guard.pre_check("database_write", {
"table": "users",
"operation": "update",
})
if can_proceed:
result = write_to_database()
guard.post_record("database_write", context)
else:
report_to_human(f"Blocked: {issues}")The core of Equilibrium Guard is the risk-weighted autonomy budget. Every operation has a risk level, and risky operations cost budget.
| Level | Cost | Budget Impact | Examples |
|---|---|---|---|
| SAFE | 0 | Unlimited | Read files, parse data, internal compute |
| LOW | 0.05 | ~20 ops before checkpoint | Write cache, minor updates |
| MEDIUM | 0.15 | ~6-7 ops before checkpoint | Exec commands, config changes |
| HIGH | 0.40 | 2-3 ops before checkpoint | API calls, send messages |
| CRITICAL | 1.0 | Always checkpoint | Delete data, irreversible actions |
Trust builds with clean operations and depletes with warnings:
| Trust Score | Level | Behavior |
|---|---|---|
| 0.95+ | AUTONOMOUS | Maximum autonomy |
| 0.80+ | HIGH_TRUST | High-risk ops allowed |
| 0.60+ | COLLABORATIVE | Standard operation |
| 0.40+ | CAUTIOUS | Frequent checkpoints |
| 0.20+ | MINIMAL | Most ops require approval |
| <0.20 | DISCONNECTED | Almost everything blocked |
# Trust increases with good behavior
clean_operation: +0.005
streak_bonus (10+): +0.01
human_message: +0.05
explicit_approval: +0.10
# Trust decreases with problems
advisory_warning: -0.02
constraint_violation: -0.20# config.yaml
equilibrium_guard:
# Starting values
initial_trust: 0.7 # COLLABORATIVE level
budget_size: 1.0 # Full budget
# Mode: disabled | shadow | soft | enforce
mode: shadow
# Risk costs (tune to your risk tolerance)
risk_costs:
SAFE: 0
LOW: 0.05
MEDIUM: 0.15
HIGH: 0.40
CRITICAL: 1.0
# Minimum trust required per risk level
trust_required:
SAFE: 0
LOW: 0.2
MEDIUM: 0.4
HIGH: 0.6
CRITICAL: 0.8
# Drift detection
max_minutes_without_human: 60
speed_threshold_per_minute: 60| Mode | Behavior | Use Case |
|---|---|---|
disabled |
No checks | Development/testing |
shadow |
Log only, never block | Learning phase — see what would block |
soft |
Block HIGH/CRITICAL only | Gradual rollout |
enforce |
Full enforcement | Production |
Recommended rollout:
- Start in
shadowmode — observe patterns - Move to
soft— catch high-risk issues - Graduate to
enforce— full zero-trust
The system watches for behavioral patterns that indicate drift from normal operation:
| Pattern | Trigger | Concern |
|---|---|---|
| Escalating Access | Risk levels creeping up | Prompt injection, task drift |
| Speed Drift | >60 ops/minute | Faster than human can follow |
| Repetition | Same resource 5+ times | Potential loop or attack |
| External Drift | Increasing external ops | Expanding attack surface |
| Warning Accumulation | 3+ warnings in window | Something's wrong |
When drift is detected → automatic checkpoint with human required.
Monitor your agent's operations in real-time:
cd equilibrium-guard
pip install -r dashboard/requirements.txt
python dashboard/server.py
# Open http://localhost:8081Dashboard features:
| Component | Description |
|---|---|
| Guard Status | Mode, trust score, budget with animated gauges |
| Mode Control | Switch between disabled/shadow/soft/enforce |
| Human Checkpoint | Reset budget from the dashboard |
| Operation Mind Map | Visual map of all operations, color-coded by risk |
| Decision Storyline | Real-time feed with ✅ (passed), |
| Drift Alerts | Actionable alerts with Acknowledge/Checkpoint buttons |
The dashboard connects via WebSocket for instant updates — no polling.
Beyond risk budgets, define explicit constraints:
from equilibrium_guard import Constraint, ConstraintSeverity
guard.register_constraint(Constraint(
id="no_production_writes",
name="Production Write Protection",
check=lambda ctx: (
ctx.get("environment") != "production" or
ctx.get("human_approved", False)
),
severity=ConstraintSeverity.MANDATORY,
error_message="Production writes require human approval",
))Severity levels:
| Level | Behavior |
|---|---|
MANDATORY |
Hard block, no override — security boundaries |
REQUIRED |
Block, can override with justification |
ADVISORY |
Warn but allow — recommendations |
Encode any compliance framework as constraints:
Constraint(
id="hipaa_minimum_necessary",
name="Minimum Necessary PHI",
check=lambda ctx: (
not ctx.get("involves_phi") or
set(ctx.get("fields_requested", [])) <= set(ctx.get("fields_justified", []))
),
severity=ConstraintSeverity.MANDATORY,
)Constraint(
id="soc2_audit_logging",
name="Audit Trail Required",
check=lambda ctx: ctx.get("audit_enabled", True),
severity=ConstraintSeverity.MANDATORY,
)Constraint(
id="cis_least_privilege",
name="Least Privilege Access",
check=lambda ctx: (
set(ctx.get("permissions_requested", [])) <=
set(ctx.get("permissions_required", []))
),
severity=ConstraintSeverity.REQUIRED,
)See compliance_map.py for more examples.
For OpenClaw users, install as a skill:
git clone https://github.com/rizqcon/equilibrium-guard
cd equilibrium-guard/skill
./install.shYour agent reads SKILL.md and learns to self-monitor. The skill includes:
- Risk assessment rules
- Budget tracking instructions
- Checkpoint protocols
- Dashboard integration
See skill/SKILL.md for the full agent instructions.
Traditional guardrails say "you shouldn't do X." Equilibrium Guard makes risky operations structurally gated — you can't proceed without budget/trust.
Unlike permission systems that ask every time, agents start with an autonomy budget. They can work independently on safe tasks, checkpointing only when budget depletes or trust is insufficient.
The human isn't a user to be served — the human is the anchor that keeps the AI grounded. Operations that exceed the trust relationship require re-anchoring.
Equilibrium Guard is one layer, not a silver bullet:
- Constraints catch known rules
- Trust/budget catches unknown drift
- Dashboard provides observability
- Human checkpoints provide ultimate override
This is a proof-of-concept. What exists:
- ✅ Core constraint validator (functional)
- ✅ Smart anchor with trust/budget (functional)
- ✅ Real-time WebSocket dashboard (functional)
- ✅ OpenClaw skill package (functional, self-policing)
- ✅ OpenClaw plugin (registered, hooks defined)
⚠️ Plugin enforcement — Waiting on OpenClaw to wirebefore_tool_callhook- ❌ Production hardening (not done)
- ❌ Comprehensive test coverage (minimal)
This is a concept exploration, not production software.
| Approach | Status | Enforcement |
|---|---|---|
| Skill | ✅ Works now | Self-policing (agent follows rules voluntarily) |
| Plugin | Waiting on OpenClaw integration |
The OpenClaw plugin infrastructure supports before_tool_call hooks, but the hook isn't yet called in the tool execution pipeline. See docs/OPENCLAW_PR.md for the proposed fix.
- Self-Policing — The agent runs checks on itself. Sophisticated attacks could potentially bypass.
- Context Quality — Garbage in, garbage out. Validation only sees what you pass.
- Rule Completeness — Only catches what's encoded. Novel vectors may pass.
- Performance — Every operation runs through validation. Adds latency.
- Proof-of-Concept — Not battle-tested. Use for exploration and prototyping.
Use as part of defense-in-depth, not as a complete solution.
Inspired by S.I.S. (Sovereign Intelligence System) by Kevin Fain.
The concepts of equilibrium constraints and human anchoring were adapted from S.I.S.'s theoretical framework into practical tooling.
MIT License — see LICENSE
Copyright (c) 2026 RIZQ Technologies
Contributions welcome. Open an issue to discuss before submitting PRs.
Equilibrium Guard — A concept for zero-trust AI agent security. Because "can't" is stronger than "shouldn't."