Refactor policy system to support multiple policy evaluation with callback results

## Summary

Refactor the policy evaluation system so that all triggered policies are collected, evaluated (including per-policy callbacks), and surfaced to the integrator as a complete list — enabling per-policy confirmation, richer callback output, and a single orchestration point in `VRE.check_policy()`.

## Problem Statement

The current policy system has several structural issues:

1. **Single-policy bottleneck** — `PolicyGate.evaluate()` returns only the first pending violation's message (`pending[0].message`). If a trace contains 3 policies requiring confirmation, 2 are silently swallowed.

2. **`PolicyResult` can only carry one message** — `confirmation_message: str | None` has no room for multiple violations.

3. **Policy logic is split between `vre_guard` and `VRE.check_policy()`** — the guard handles PENDING→ask→BLOCK/PASS conversion, and so does `claude_code.py`, each with their own branching. This should live in `check_policy()`.

4. **Callbacks are opaque** — `PolicyCallback` returns `bool` with no message. The integrator never sees *why* a callback suppressed or fired a violation.

5. **BLOCK is never produced by the gate** — `PolicyGate.evaluate()` only returns PASS or PENDING. BLOCK is synthesized downstream by the guard. The three-state enum is misleading at the gate level.

## Proposed Solution

### 1. Callback return type: `PolicyCallbackResult`

Replace the bare `bool` return with a structured result:

```python
class PolicyCallbackResult(BaseModel):
    passed: bool
    message: str | None = None
```

Update the `PolicyCallback` protocol:

```python
class PolicyCallback(Protocol):
    def __call__(self, context: PolicyCallContext) -> PolicyCallbackResult: ...
```

### 2. Enrich `PolicyViolation`

Carry the callback's verdict alongside the policy and its message:

```python
class PolicyViolation(BaseModel):
    policy: Policy
    message: str                          # from confirmation_message template
    callback_result: PolicyCallbackResult | None  # None = no callback
```

### 3. Enrich `PolicyResult`

Drop the single `confirmation_message` field. With `violations` carrying every violation's message and callback result, a single summary string is redundant and raises the question of which violation's message goes there. Integrators iterate `violations` directly; `__str__` can derive a summary if needed.

```python
class PolicyResult(BaseModel):
    action: PolicyAction
    reason: str | None = None
    violations: list[PolicyViolation] = []
```

### 4. `on_policy` handler signature

The integrator receives **all** triggered violations (not just confirmation-required ones) and returns a `bool` per violation:

```python
on_policy: Callable[[list[PolicyViolation]], list[bool]]
```

Each `PolicyViolation` has `callback_result` populated, so the integrator sees:
- The policy definition, message, and metadata
- Whether a callback already resolved it (and why)
- Whether the policy requires human confirmation

The integrator has full flexibility: wizard through one-by-one, batch confirm, render a UI, etc.

### 5. Evaluation flow per policy

1. Does the policy trigger? (cardinality match)
2. If callback exists → run it → get `PolicyCallbackResult(passed, message)`
3. Build `PolicyViolation` with callback result attached

### 6. Orchestration in `VRE.check_policy()`

Move all policy resolution logic from `vre_guard` into `VRE.check_policy()`:

```python
def check_policy(
    self,
    concepts: list[str] | GroundingResult,
    cardinality: str | None = None,
    call_context: PolicyCallContext | None = None,
    on_policy: Callable[[list[PolicyViolation]], list[bool]] | None = None,
) -> PolicyResult:
```

Flow:
1. Collect all triggered violations via `PolicyGate` (callbacks already executed)
2. If no violations → PASS
3. If any violation has `requires_confirmation=True` and `on_policy` is provided → call `on_policy` with all violations
4. If any violation has `requires_confirmation=True` and `on_policy` is absent → BLOCK (fail safe, no handler)
5. Map responses back: if any are rejected → BLOCK; otherwise → PASS
6. If no violations require confirmation, resolve from callback results alone: any callback failure → BLOCK; all pass → PASS

### 7. `requires_confirmation` semantics

`requires_confirmation` and `callback` are orthogonal:

| `requires_confirmation` | `callback` result | Meaning |
|---|---|---|
| `False` | passed | Callback resolved it — no human needed |
| `False` | failed | Callback blocked it — no human needed |
| `True` | passed | Callback says OK, but human must still confirm |
| `True` | failed | Callback says no AND human must confirm |
| `True` | no callback | Pure HITL — human decides |

`requires_confirmation` is the **escalation flag**: "a machine opinion isn't sufficient here."

### 8. Simplify `vre_guard`

The guard becomes a thin consumer — it calls `check_policy()` (passing `on_policy` through) and acts on the final `PolicyResult.action`:

```python
policy = vre.check_policy(grounding, resolved_cardinality, context, on_policy=on_policy)
match policy.action:
    case PolicyAction.BLOCK:
        result = policy
    case _:
        result = fn(*args, **kwargs)
```

No more PENDING handling in the guard.

## VRE Design Alignment

- **Agent–VRE contract preserved** — the agent still submits queries and receives structured results. Policy evaluation remains internal to VRE.
- **No new node types or relation types** — policies remain attributes on APPLIES_TO relata.
- **Epistemic honesty preserved** — policies are a mechanical safety layer (Section 9.3 of CLAUDE.md). This refactor makes the layer more complete and transparent, not more permissive.
- **Integrator flexibility** — VRE surfaces all policy information; the integrator decides resolution UX. VRE doesn't prescribe wizard vs batch vs UI.

## Acceptance Criteria

- [ ] `PolicyCallback` returns `PolicyCallbackResult(passed, message)` instead of `bool`
- [ ] `PolicyViolation` carries `callback_result: PolicyCallbackResult | None`
- [ ] `PolicyResult` carries `violations: list[PolicyViolation]` (no `confirmation_message` field)
- [ ] All triggered violations are collected and surfaced (not just the first)
- [ ] `VRE.check_policy()` accepts `on_policy: Callable[[list[PolicyViolation]], list[bool]] | None`
- [ ] `on_policy` receives all triggered violations with callback results pre-populated
- [ ] If `requires_confirmation=True` violations exist and `on_policy` is absent → BLOCK
- [ ] If no `requires_confirmation` violations exist, result is derived from callback verdicts alone
- [ ] `vre_guard` delegates all policy logic to `check_policy()` — no PENDING handling in the guard
- [ ] `claude_code.py` integration updated accordingly
- [ ] Existing policy tests updated; new tests cover multi-policy and callback result scenarios
- [ ] Backward-incompatible changes documented (callback return type, `on_policy` signature, removal of `confirmation_message`)

## Open Questions

- Should `on_policy` return more than `list[bool]`? (e.g., reasons for rejection) — current proposal says this is the integrator's problem outside VRE's scope, but worth revisiting during implementation.
- Should `PolicyAction.PENDING` still exist as a gate-level concept, or should the gate only produce PASS/BLOCK now that `check_policy()` handles confirmation? PENDING may still be useful as an intermediate state for the `PolicyResult` before `on_policy` is called.

## Dependencies

- None — this is a self-contained refactor of the existing policy subsystem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor policy system to support multiple policy evaluation with callback results #22

Summary

Problem Statement

Proposed Solution

1. Callback return type: `PolicyCallbackResult`

2. Enrich `PolicyViolation`

3. Enrich `PolicyResult`

4. `on_policy` handler signature

5. Evaluation flow per policy

6. Orchestration in `VRE.check_policy()`

7. `requires_confirmation` semantics

8. Simplify `vre_guard`

VRE Design Alignment

Acceptance Criteria

Open Questions

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

`requires_confirmation`	`callback` result	Meaning
`False`	passed	Callback resolved it — no human needed
`False`	failed	Callback blocked it — no human needed
`True`	passed	Callback says OK, but human must still confirm
`True`	failed	Callback says no AND human must confirm
`True`	no callback	Pure HITL — human decides

Refactor policy system to support multiple policy evaluation with callback results #22

Description

Summary

Problem Statement

Proposed Solution

1. Callback return type: PolicyCallbackResult

2. Enrich PolicyViolation

3. Enrich PolicyResult

4. on_policy handler signature

5. Evaluation flow per policy

6. Orchestration in VRE.check_policy()

7. requires_confirmation semantics

8. Simplify vre_guard

VRE Design Alignment

Acceptance Criteria

Open Questions

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Callback return type: `PolicyCallbackResult`

2. Enrich `PolicyViolation`

3. Enrich `PolicyResult`

4. `on_policy` handler signature

6. Orchestration in `VRE.check_policy()`

7. `requires_confirmation` semantics

8. Simplify `vre_guard`