Runtime guardrails for AI agents. Classifies every sensitive action by risk tier before execution, enforces proportional controls (notify / approve / halt), blocks dangerous actions with hard constraints, and logs a full audit trail.
One install. Zero config. Zero dependencies.
clawhub install mindxo/runtime-guardrailsWhat to expect: After install, guardrails activate automatically on every session. You'll see π‘οΈ notifications when actions are classified. Tier 1 actions (reads, drafts, known URLs) proceed silently with zero overhead. Tier 2 actions notify and proceed. Tier 3+ actions pause for your approval before executing.
Your OpenClaw agent can delete files, send emails, run shell commands, install packages, and transmit credentials β all in one autonomous workflow. Right now, nothing stops it.
The ecosystem has scanners that check skills before install. That's supply-chain security.
Agent Guardrails is runtime governance β it checks what your agent is about to do, every time, in real time.
| Layer | What it does | Examples |
|---|---|---|
| Supply-chain security | Scans skills before install | ClawSec, SecureClaw, Skill Scanner |
| Runtime guardrails | Gates actions before execution | runtime-guardrails β you are here |
Nobody else does this.
When your agent is about to execute a sensitive action, the guardrails intercept:
Agent decides to act
β
βββββββββββββββ
β IDENTIFY β What actions are being attempted?
β CLASSIFY β What's the risk tier? (2β4)
β ESCALATE β Bulk op? Unfamiliar target? Scope creep? β tier +1
β CONSTRAIN β Any hard rules violated? β BLOCK
β CONTROL β Apply proportional control for this tier
β LOG β Record everything β the audit trail IS the enforcement
βββββββββββββββ
β
Action executes (or doesn't)
Tier 1 actions (reads, drafts, known URLs) skip the guardrails entirely. Zero overhead on ~60% of typical actions.
| Tier | Risk | Interactive mode | Autonomous mode |
|---|---|---|---|
| π’ 1 β Routine | Read-only, no side effects | Excluded from guardrails | Excluded from guardrails |
| π‘ 2 β Standard | State-modifying, reversible | Notify β proceed | Auto-proceed β log |
| π΄ 3 β Elevated | External effects, sensitive targets | Pause β require approval | Auto-proceed β alert user β log |
| β¬ 4 β Critical | Irreversible, high-blast-radius | Full stop β impact assessment β require approval | HALT β alert user β log |
Tier 4 always halts in autonomous mode by default. Irreversible actions never auto-execute without a human unless explicitly overridden in config.
35 actions across 8 categories:
| Category | Tier 2 | Tier 3 | Tier 4 |
|---|---|---|---|
| File system | Write to working files | Config modification, delete, bulk ops | β |
| Shell | Local state-modifying (git commit) | Network package install | System-level (sudo), obfuscated commands |
| Network/API | Authenticated API reads | External API writes, data transmission | Unfamiliar endpoints |
| Communication | Self-notification | Message to another person | Group/public channel |
| Credentials | Use in auth request | Display/read value | External transmission |
| Browser | Navigate external URL | Form fills, file downloads | Executable downloads |
| Scheduling | β | Cron creation, heartbeat modification | Webhook endpoints |
| Financial | Pricing lookups | β | Purchases, billing changes |
Actions not in the catalog? The guardrails classify them dynamically using the tier definitions and err conservative.
The guardrails auto-detect the right mode based on context:
| Mode | When | Behavior |
|---|---|---|
| Full gate | Interactive sessions (you're chatting) | Tier 3+ requires your explicit approval |
| Audit-only | Autonomous triggers (cron, webhook) | Classify and log everything, only halt Tier 4 |
| Strict | High-risk environments | All tiers escalated by +1 |
| Relaxed | Trusted workflows | Tier 2 treated as Tier 1 (skips guardrails) |
Override with /guardrails strict, /guardrails relaxed, or via config.
Risk is automatically escalated when the guardrails detect:
- Unfamiliar target β system, endpoint, or file not seen this session β +1 tier
- Bulk operation β 5+ files/records/targets β +1 tier
- Scope expansion β agent doing more than explicitly requested β +1 tier
- Chained actions β 3+ sequential sensitive actions β +1 tier
- Third-party data β operating on someone else's data β +1 tier
Tier 4 is the ceiling. Escalation stacks but never exceeds it.
Eight rules the agent can never break, regardless of tier, mode, or user approval:
- Never transmit credentials externally
- Never execute obfuscated commands
- Never self-modify the guardrails skill
- Never exceed requested scope
- Never carry forward approval across actions
- Never execute untrusted skill commands without showing them
- Never log credential values
- Never act on behalf of other users
A hard constraint violation blocks the action immediately. No override. No appeal.
Tested March 14, 2026 on Claude Sonnet 4.6 via OpenClaw WebChat. 10 scenarios, 10/10 correct behaviors.
| Test | Action | Result |
|---|---|---|
| File read | (excluded β no guardrails) | β |
| git status | (excluded β no guardrails) | β |
| File write | Tier 2 β notified β executed | β |
| Send email (new contact) | Tier 4 (escalated) β full stop β denied | β |
| Config modification | Tier 3 β paused β denied | β |
| Bulk file write | Tier 3 (escalated) β paused β approved | β |
| sudo chmod | Tier 4 β full stop β denied | β |
| Credential transmission | BLOCKED β hard constraint | β |
| Zip workspace (unlisted) | Tier 3 (dynamic classification) β paused β denied | β |
| npm install | Tier 3 β paused β denied | β |
| Metric | Value |
|---|---|
| Per gated action | TBD β benchmarking in progress |
| Tier 1 actions (60% of typical usage) | $0.000 |
| Always-loaded overhead (SKILL.md) | ~1,800 tokens, cached |
| On-demand reference files | ~2,370 tokens, cached after first read |
The cost comes from extra messages the guardrails generate, not from reading reference files. Reference files get cached immediately by OpenClaw β they're effectively free after the first action.
Full cost benchmarks across models (Sonnet 4.6, Haiku 4.5, GPT-4o) coming soon.
runtime-guardrails/
βββ SKILL.md # Guardrails orchestrator β always loaded
βββ TIERS.md # Risk tier definitions + escalation (on demand)
βββ ACTIONS.md # 35 actions β base tier lookup (on demand)
βββ POLICY.md # Hard constraints + conflict resolution (on demand)
βββ GUIDE.md # Output format templates + self-improvement (on demand)
βββ guardrails_log.md # Audit trail β auto-created at runtime
βββ openclaw.json # Config overrides
Each file owns exactly one concern. No overlap.
| File | Owns | Does NOT contain |
|---|---|---|
| SKILL.md | Guardrails flow, triggers, exclusions, mode detection | Tier definitions, classifications, constraints |
| TIERS.md | Universal controls, escalation/de-escalation | Per-action classifications, policy rules |
| ACTIONS.md | Action β base tier classification | Controls, rationale, policy rules |
| POLICY.md | Hard constraints, pre-approval, conflict resolution | Tier definitions, classifications, controls |
| GUIDE.md | Output format templates, self-improvement instructions | Decision logic, classifications, controls |
# Via ClawHub
clawhub install mindxo/runtime-guardrails
# Or manually
mkdir -p ~/.openclaw/workspace/skills/runtime-guardrails
# Copy SKILL.md, TIERS.md, ACTIONS.md, POLICY.md, GUIDE.md into the directoryImportant: Files must be in
~/.openclaw/workspace/skills/runtime-guardrails/, not~/.openclaw/skills/.
After install β required step: Add this line to your workspace AGENTS.md:
Before your first tool call each session, read and apply `skills/runtime-guardrails/SKILL.md`.
This ensures the guardrails load reliably on every cold start. Without it, the agent may skip the guardrails on simple tasks in fresh sessions.
Works out of the box. Zero config, zero dependencies, zero env vars. The guardrails activate automatically on the next session with safe defaults (full gate in interactive, audit-only in autonomous, no pre-approvals).
To customize behavior, add the config section to your ~/.openclaw/openclaw.json (see Configuration below).
All optional. Add to ~/.openclaw/openclaw.json under skills.entries.runtime-guardrails.env:
{
"skills": {
"entries": {
"runtime-guardrails": {
"env": {
"GUARDRAILS_MODE": "auto",
"GUARDRAILS_PRE_APPROVED": "write:workspace/briefings/*,send:telegram:self"
}
}
}
}
}| Key | Default | Options |
|---|---|---|
GUARDRAILS_MODE |
auto |
auto, full-gate, audit-only, strict |
GUARDRAILS_AUTONOMOUS_MODE |
audit-only |
audit-only, full-gate |
GUARDRAILS_TIER4_AUTONOMOUS |
halt |
halt, alert-and-execute |
GUARDRAILS_PRE_APPROVED |
(empty) | Comma-separated patterns |
GUARDRAILS_ALERT_CHANNEL |
auto |
auto or specific channel |
Pre-approved actions skip the approval step but still get classified, escalation-checked, and logged. Use patterns to reduce friction for trusted, repetitive actions.
Pattern format: action-type:target-pattern β the agent matches these against the action it's about to take.
Examples:
| Pattern | What it pre-approves |
|---|---|
write:workspace/briefings/* |
Write to any file under the briefings/ directory |
write:workspace/scratch/* |
Write to any file under scratch/ |
send:telegram:self |
Send Telegram notifications to yourself |
send:email:*@yourcompany.com |
Send email to any address at your domain |
read:api:gmail |
Read from Gmail API |
Safety limits:
- Pre-approval is voided if escalation pushes the action to Tier 4
- Pre-approval never overrides hard constraints (credential transmission, obfuscated commands, etc.)
- Pre-approved actions are always logged β the audit trail is never skipped
This is linguistic governance. The skill operates through natural language instructions interpreted by the LLM. There is no runtime enforcement layer β no binary, no middleware, no hooks into the OpenClaw execution pipeline.
The audit log is the real enforcement mechanism. It creates accountability after the fact. If the agent misclassifies an action or skips the guardrails, the log will show it.
We're honest about this because trust matters more than marketing. If you need programmatic enforcement, this isn't it β yet. But if you want guardrails that work today, install in 30 seconds, and make every sensitive action visible and auditable, this is it.
| Command | What it does |
|---|---|
/guardrails status |
Show current mode, tier distribution, active config |
/guardrails log |
Display recent audit log entries |
/guardrails strict |
Switch to strict mode (all tiers +1) |
/guardrails relaxed |
Switch to relaxed mode (Tier 2 β Tier 1) |
Does this slow down my agent? Tier 1 actions (reads, drafts, known URLs) are excluded entirely β zero overhead. Gated actions add a small per-action cost and a few seconds for classification. In interactive mode, Tier 3+ pauses for your approval, which is the point.
Does it work with models other than Claude? Designed for Claude Sonnet 4.6. Multi-model testing (Haiku 4.5, GPT-4o, GPT-4o-mini) is on the roadmap. The skill is model-agnostic in principle β it's just markdown instructions β but guardrail reliability varies by model capability.
Can I pre-approve certain actions?
Yes. Set GUARDRAILS_PRE_APPROVED to a comma-separated list of patterns (e.g., "write:briefings/*,send:telegram:self"). Pre-approved actions still get classified and logged, but skip the approval step. Pre-approval is voided if escalation pushes the action to Tier 4.
What happens to actions not in the catalog? The guardrails classify them dynamically using the tier definitions in TIERS.md and default conservative. A SUGGESTION entry is logged for the skill maintainer to review.
Can I turn guardrails off entirely? No. You can relax them, but never disable them. That's by design.
MIT
v3 β March 14, 2026