Skip to content

Refactor policy system to support multiple policy evaluation with callback results #22

@anormang1992

Description

@anormang1992

Summary

Refactor the policy evaluation system so that all triggered policies are collected, evaluated (including per-policy callbacks), and surfaced to the integrator as a complete list — enabling per-policy confirmation, richer callback output, and a single orchestration point in VRE.check_policy().

Problem Statement

The current policy system has several structural issues:

  1. Single-policy bottleneckPolicyGate.evaluate() returns only the first pending violation's message (pending[0].message). If a trace contains 3 policies requiring confirmation, 2 are silently swallowed.

  2. PolicyResult can only carry one messageconfirmation_message: str | None has no room for multiple violations.

  3. Policy logic is split between vre_guard and VRE.check_policy() — the guard handles PENDING→ask→BLOCK/PASS conversion, and so does claude_code.py, each with their own branching. This should live in check_policy().

  4. Callbacks are opaquePolicyCallback returns bool with no message. The integrator never sees why a callback suppressed or fired a violation.

  5. BLOCK is never produced by the gatePolicyGate.evaluate() only returns PASS or PENDING. BLOCK is synthesized downstream by the guard. The three-state enum is misleading at the gate level.

Proposed Solution

1. Callback return type: PolicyCallbackResult

Replace the bare bool return with a structured result:

class PolicyCallbackResult(BaseModel):
    passed: bool
    message: str | None = None

Update the PolicyCallback protocol:

class PolicyCallback(Protocol):
    def __call__(self, context: PolicyCallContext) -> PolicyCallbackResult: ...

2. Enrich PolicyViolation

Carry the callback's verdict alongside the policy and its message:

class PolicyViolation(BaseModel):
    policy: Policy
    message: str                          # from confirmation_message template
    callback_result: PolicyCallbackResult | None  # None = no callback

3. Enrich PolicyResult

Drop the single confirmation_message field. With violations carrying every violation's message and callback result, a single summary string is redundant and raises the question of which violation's message goes there. Integrators iterate violations directly; __str__ can derive a summary if needed.

class PolicyResult(BaseModel):
    action: PolicyAction
    reason: str | None = None
    violations: list[PolicyViolation] = []

4. on_policy handler signature

The integrator receives all triggered violations (not just confirmation-required ones) and returns a bool per violation:

on_policy: Callable[[list[PolicyViolation]], list[bool]]

Each PolicyViolation has callback_result populated, so the integrator sees:

  • The policy definition, message, and metadata
  • Whether a callback already resolved it (and why)
  • Whether the policy requires human confirmation

The integrator has full flexibility: wizard through one-by-one, batch confirm, render a UI, etc.

5. Evaluation flow per policy

  1. Does the policy trigger? (cardinality match)
  2. If callback exists → run it → get PolicyCallbackResult(passed, message)
  3. Build PolicyViolation with callback result attached

6. Orchestration in VRE.check_policy()

Move all policy resolution logic from vre_guard into VRE.check_policy():

def check_policy(
    self,
    concepts: list[str] | GroundingResult,
    cardinality: str | None = None,
    call_context: PolicyCallContext | None = None,
    on_policy: Callable[[list[PolicyViolation]], list[bool]] | None = None,
) -> PolicyResult:

Flow:

  1. Collect all triggered violations via PolicyGate (callbacks already executed)
  2. If no violations → PASS
  3. If any violation has requires_confirmation=True and on_policy is provided → call on_policy with all violations
  4. If any violation has requires_confirmation=True and on_policy is absent → BLOCK (fail safe, no handler)
  5. Map responses back: if any are rejected → BLOCK; otherwise → PASS
  6. If no violations require confirmation, resolve from callback results alone: any callback failure → BLOCK; all pass → PASS

7. requires_confirmation semantics

requires_confirmation and callback are orthogonal:

requires_confirmation callback result Meaning
False passed Callback resolved it — no human needed
False failed Callback blocked it — no human needed
True passed Callback says OK, but human must still confirm
True failed Callback says no AND human must confirm
True no callback Pure HITL — human decides

requires_confirmation is the escalation flag: "a machine opinion isn't sufficient here."

8. Simplify vre_guard

The guard becomes a thin consumer — it calls check_policy() (passing on_policy through) and acts on the final PolicyResult.action:

policy = vre.check_policy(grounding, resolved_cardinality, context, on_policy=on_policy)
match policy.action:
    case PolicyAction.BLOCK:
        result = policy
    case _:
        result = fn(*args, **kwargs)

No more PENDING handling in the guard.

VRE Design Alignment

  • Agent–VRE contract preserved — the agent still submits queries and receives structured results. Policy evaluation remains internal to VRE.
  • No new node types or relation types — policies remain attributes on APPLIES_TO relata.
  • Epistemic honesty preserved — policies are a mechanical safety layer (Section 9.3 of CLAUDE.md). This refactor makes the layer more complete and transparent, not more permissive.
  • Integrator flexibility — VRE surfaces all policy information; the integrator decides resolution UX. VRE doesn't prescribe wizard vs batch vs UI.

Acceptance Criteria

  • PolicyCallback returns PolicyCallbackResult(passed, message) instead of bool
  • PolicyViolation carries callback_result: PolicyCallbackResult | None
  • PolicyResult carries violations: list[PolicyViolation] (no confirmation_message field)
  • All triggered violations are collected and surfaced (not just the first)
  • VRE.check_policy() accepts on_policy: Callable[[list[PolicyViolation]], list[bool]] | None
  • on_policy receives all triggered violations with callback results pre-populated
  • If requires_confirmation=True violations exist and on_policy is absent → BLOCK
  • If no requires_confirmation violations exist, result is derived from callback verdicts alone
  • vre_guard delegates all policy logic to check_policy() — no PENDING handling in the guard
  • claude_code.py integration updated accordingly
  • Existing policy tests updated; new tests cover multi-policy and callback result scenarios
  • Backward-incompatible changes documented (callback return type, on_policy signature, removal of confirmation_message)

Open Questions

  • Should on_policy return more than list[bool]? (e.g., reasons for rejection) — current proposal says this is the integrator's problem outside VRE's scope, but worth revisiting during implementation.
  • Should PolicyAction.PENDING still exist as a gate-level concept, or should the gate only produce PASS/BLOCK now that check_policy() handles confirmation? PENDING may still be useful as an intermediate state for the PolicyResult before on_policy is called.

Dependencies

  • None — this is a self-contained refactor of the existing policy subsystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions