Skip to content
This repository was archived by the owner on Apr 12, 2026. It is now read-only.
This repository was archived by the owner on Apr 12, 2026. It is now read-only.

feat: drift guard — detect and halt agent plan deviation in real-time #1522

@jpleva91

Description

@jpleva91

Context

Agents drift. They start on a CSS fix and end up rewriting the database. This is the #1 fear users have about autonomous agents. Microsoft AGT has a basic drift_threshold (difflib). We need a real drift guard.

Proposal

Add a drift detection layer to the AgentGuard kernel:

  1. Plan capture — at session start, record: task description, expected scope (files/dirs), expected capabilities, estimated turns
  2. Action monitoring — track each tool call against the plan:
    • Files touched vs expected scope
    • Commands run vs expected capabilities
    • Turn count vs budget
    • Semantic similarity of recent actions to original task
  3. Drift scoring — continuous 0-1 score. 0 = on plan, 1 = completely off
  4. Enforcement — configurable thresholds in agentguard.yaml:
    drift:
      warn_threshold: 0.3
      halt_threshold: 0.6
      scope: ["/src/auth/**"]
      max_turns: 50
  5. Actions — WARN (log + notify), PAUSE (require human confirmation), HALT (stop agent)

Architecture

  • Lives in the kernel alongside the execution firewall
  • Receives the same hook events (pre/post tool call)
  • Maintains session state (plan + action history)
  • Emits drift events to Sentinel for telemetry

Talk demo (May 6)

"I asked Copilot to fix a CSS bug. Watch what happens when it tries to modify the database."
→ Drift guard fires at 0.4, pauses agent, asks for confirmation
→ "That's drift detection. Your agent stays on plan."

Why Now

  • May 6 talk needs this as the "wow" demo
  • Microsoft AGT has basic drift_threshold — we need to be ahead
  • Directly improves bench score (agents that drift waste turns)
  • Feeds Sentinel telemetry flywheel

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:claimedAgent dispatched — do not re-dispatchenhancementNew feature or requestsprintCurrent sprint priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions