-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Refactor the policy evaluation system so that all triggered policies are collected, evaluated (including per-policy callbacks), and surfaced to the integrator as a complete list — enabling per-policy confirmation, richer callback output, and a single orchestration point in VRE.check_policy().
Problem Statement
The current policy system has several structural issues:
-
Single-policy bottleneck —
PolicyGate.evaluate()returns only the first pending violation's message (pending[0].message). If a trace contains 3 policies requiring confirmation, 2 are silently swallowed. -
PolicyResultcan only carry one message —confirmation_message: str | Nonehas no room for multiple violations. -
Policy logic is split between
vre_guardandVRE.check_policy()— the guard handles PENDING→ask→BLOCK/PASS conversion, and so doesclaude_code.py, each with their own branching. This should live incheck_policy(). -
Callbacks are opaque —
PolicyCallbackreturnsboolwith no message. The integrator never sees why a callback suppressed or fired a violation. -
BLOCK is never produced by the gate —
PolicyGate.evaluate()only returns PASS or PENDING. BLOCK is synthesized downstream by the guard. The three-state enum is misleading at the gate level.
Proposed Solution
1. Callback return type: PolicyCallbackResult
Replace the bare bool return with a structured result:
class PolicyCallbackResult(BaseModel):
passed: bool
message: str | None = NoneUpdate the PolicyCallback protocol:
class PolicyCallback(Protocol):
def __call__(self, context: PolicyCallContext) -> PolicyCallbackResult: ...2. Enrich PolicyViolation
Carry the callback's verdict alongside the policy and its message:
class PolicyViolation(BaseModel):
policy: Policy
message: str # from confirmation_message template
callback_result: PolicyCallbackResult | None # None = no callback3. Enrich PolicyResult
Drop the single confirmation_message field. With violations carrying every violation's message and callback result, a single summary string is redundant and raises the question of which violation's message goes there. Integrators iterate violations directly; __str__ can derive a summary if needed.
class PolicyResult(BaseModel):
action: PolicyAction
reason: str | None = None
violations: list[PolicyViolation] = []4. on_policy handler signature
The integrator receives all triggered violations (not just confirmation-required ones) and returns a bool per violation:
on_policy: Callable[[list[PolicyViolation]], list[bool]]Each PolicyViolation has callback_result populated, so the integrator sees:
- The policy definition, message, and metadata
- Whether a callback already resolved it (and why)
- Whether the policy requires human confirmation
The integrator has full flexibility: wizard through one-by-one, batch confirm, render a UI, etc.
5. Evaluation flow per policy
- Does the policy trigger? (cardinality match)
- If callback exists → run it → get
PolicyCallbackResult(passed, message) - Build
PolicyViolationwith callback result attached
6. Orchestration in VRE.check_policy()
Move all policy resolution logic from vre_guard into VRE.check_policy():
def check_policy(
self,
concepts: list[str] | GroundingResult,
cardinality: str | None = None,
call_context: PolicyCallContext | None = None,
on_policy: Callable[[list[PolicyViolation]], list[bool]] | None = None,
) -> PolicyResult:Flow:
- Collect all triggered violations via
PolicyGate(callbacks already executed) - If no violations → PASS
- If any violation has
requires_confirmation=Trueandon_policyis provided → callon_policywith all violations - If any violation has
requires_confirmation=Trueandon_policyis absent → BLOCK (fail safe, no handler) - Map responses back: if any are rejected → BLOCK; otherwise → PASS
- If no violations require confirmation, resolve from callback results alone: any callback failure → BLOCK; all pass → PASS
7. requires_confirmation semantics
requires_confirmation and callback are orthogonal:
requires_confirmation |
callback result |
Meaning |
|---|---|---|
False |
passed | Callback resolved it — no human needed |
False |
failed | Callback blocked it — no human needed |
True |
passed | Callback says OK, but human must still confirm |
True |
failed | Callback says no AND human must confirm |
True |
no callback | Pure HITL — human decides |
requires_confirmation is the escalation flag: "a machine opinion isn't sufficient here."
8. Simplify vre_guard
The guard becomes a thin consumer — it calls check_policy() (passing on_policy through) and acts on the final PolicyResult.action:
policy = vre.check_policy(grounding, resolved_cardinality, context, on_policy=on_policy)
match policy.action:
case PolicyAction.BLOCK:
result = policy
case _:
result = fn(*args, **kwargs)No more PENDING handling in the guard.
VRE Design Alignment
- Agent–VRE contract preserved — the agent still submits queries and receives structured results. Policy evaluation remains internal to VRE.
- No new node types or relation types — policies remain attributes on APPLIES_TO relata.
- Epistemic honesty preserved — policies are a mechanical safety layer (Section 9.3 of CLAUDE.md). This refactor makes the layer more complete and transparent, not more permissive.
- Integrator flexibility — VRE surfaces all policy information; the integrator decides resolution UX. VRE doesn't prescribe wizard vs batch vs UI.
Acceptance Criteria
-
PolicyCallbackreturnsPolicyCallbackResult(passed, message)instead ofbool -
PolicyViolationcarriescallback_result: PolicyCallbackResult | None -
PolicyResultcarriesviolations: list[PolicyViolation](noconfirmation_messagefield) - All triggered violations are collected and surfaced (not just the first)
-
VRE.check_policy()acceptson_policy: Callable[[list[PolicyViolation]], list[bool]] | None -
on_policyreceives all triggered violations with callback results pre-populated - If
requires_confirmation=Trueviolations exist andon_policyis absent → BLOCK - If no
requires_confirmationviolations exist, result is derived from callback verdicts alone -
vre_guarddelegates all policy logic tocheck_policy()— no PENDING handling in the guard -
claude_code.pyintegration updated accordingly - Existing policy tests updated; new tests cover multi-policy and callback result scenarios
- Backward-incompatible changes documented (callback return type,
on_policysignature, removal ofconfirmation_message)
Open Questions
- Should
on_policyreturn more thanlist[bool]? (e.g., reasons for rejection) — current proposal says this is the integrator's problem outside VRE's scope, but worth revisiting during implementation. - Should
PolicyAction.PENDINGstill exist as a gate-level concept, or should the gate only produce PASS/BLOCK now thatcheck_policy()handles confirmation? PENDING may still be useful as an intermediate state for thePolicyResultbeforeon_policyis called.
Dependencies
- None — this is a self-contained refactor of the existing policy subsystem.
Metadata
Metadata
Assignees
Labels
Projects
Status