Skip to content

feat: sensei validation/resolution logging during PR review — wire reflections into orchestrator workflow #372

@cmbays

Description

@cmbays

Problem

The reflections loop has a structural gap: agents can log a decision or prediction at bet start, but the validation and resolution counterparts happen during PR review — a phase owned by sensei (the orchestrator), not an agent.

Currently there is no enforced or even suggested path for sensei to write validation or resolution reflection records when reviewing a PR. This means:

  • reflections.jsonl files exist but contain zero validation or resolution entries
  • BeltCalculator.readRunMetrics() reads predictionOutcomePairs from type: 'validation' records — always 0
  • The calibration loop is broken: predictions are made, outcomes are observed, but nothing closes the loop

Example scenario

  1. Agent bets that "refactoring X will reduce test time by 20%" — logs a prediction observation
  2. Agent completes work, PR opened
  3. Sensei reviews PR, CI shows test time reduced by 18%
  4. Gap: no mechanism or reminder for sensei to log a validation reflection linking the outcome back to the prediction
  5. Belt calculator sees 0 prediction-outcome pairs; calibration accuracy stays at 0

Desired behavior

  • When sensei runs kata kiai complete <run-id> --success (or --failure), the CLI should prompt (or accept flags) for:
    • Was there an active prediction for this run? Did it validate?
    • Were any frictions resolved? (resolution record)
  • Alternatively, kata kansatsu record validation --run-id <id> --prediction-id <id> --accuracy 0.9 should be surfaced in the kata kiai complete flow
  • Agent context should remind agents to log predictions at start AND remind sensei to close the loop at completion

Notes

Acceptance criteria

  • kata kiai complete flow includes a validation prompt or --validate-prediction flag
  • Agent context section explicitly states which reflections sensei vs agent owns
  • At least one validation record written per completed run that had a prediction

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestneeds-humanRequires human judgment or design input before implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions