[DEV-1440] M2: Daemon RunEvalCommand handler#15
Conversation
Review Summary by QodoDEV-1440 M2: Daemon eval runner with SignalR progress relay
WalkthroughsDescription• Extracted shared eval library from CLI into reusable EvalService and IEvalObserver interfaces • Implemented daemon-side eval orchestration with EvalRunner and DaemonEvalObserver for SignalR progress relay • Added wire types for eval dispatch (RunEvalCommand) and four progress events (EvalStarted, EvalQuestionCompleted, EvalFinished, EvalFailed) • Registered eval command handlers in ServerConnection and DI wiring in DaemonRunner Diagramflowchart LR
Server["Server<br/>Dashboard"] -->|RunEvalCommand| Daemon["Daemon<br/>EvalRunner"]
Daemon -->|EvalService.RunAsync| EvalSvc["EvalService<br/>Orchestration"]
EvalSvc -->|IEvalObserver| Observer["DaemonEvalObserver"]
Observer -->|EvalStarted<br/>EvalQuestionCompleted<br/>EvalFinished<br/>EvalFailed| SignalR["SignalR<br/>Connection"]
SignalR -->|Progress Events| Server
File Changes1. src/kapacitor/Commands/EvalCommand.cs
|
Code Review by Qodo
|
db791ed to
1e59042
Compare
Three findings on PR #15 (the other two — observer-throw guard and judge-fact cancellation propagation — were already addressed by the M1 follow-up in 1f655f4): 1. EvalRunId mismatch (Action required) — server dispatches RunEvalCommand with an EvalRunId, but EvalService generated its own GUID, leading to two different ids in one run's event stream (EvalStarted used the service-generated id; subsequent question / finished / failed events used the dispatched id captured in DaemonEvalObserver). Fixed by adding an optional `evalRunId` parameter to EvalService.RunAsync; CLI passes null (mints a fresh id, current behaviour) and the daemon passes cmd.EvalRunId so the whole run, including the persisted SessionEvalCompleted aggregate, shares one correlation id end-to-end. 2. Out-of-order progress events (Recommended) — DaemonEvalObserver's per-event Task.Run can interleave concurrent SignalR sends. Added a SemaphoreSlim(1,1) gate inside Relay so the background sends drain in their enqueue order — the dashboard sees EvalStarted before any question completion, and EvalFinished/EvalFailed last, deterministically. 3. Daemon evals not cancellable on shutdown (Recommended) — EvalRunner spawned Task.Run with no link to the host lifecycle. Now injects IHostApplicationLifetime, captures ApplicationStopping, and passes it as ct to EvalService.RunAsync. M1's outer try/catch turns in-flight cancellation into a clean OnFailed("cancelled") relay so the dashboard learns the eval stopped instead of waiting forever. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Daemon side of the dashboard-driven eval pipeline. Pairs with the server M3 endpoint in kurrent-io/Kurrent.Capacitor#477 and depends on the M1 shared eval library in #14. - New SignalR wire types in Models.cs match the server's DaemonCommands.cs: RunEvalCommand (server -> daemon dispatch) plus the four daemon -> server progress events (EvalStarted, EvalQuestionCompleted, EvalFinished, EvalFailed). Registered in KapacitorJsonContext for source-gen serialization. - ServerConnection registers a "RunEval" handler and exposes per-event send methods (EvalStartedAsync etc.) that mirror the existing AgentRegisteredAsync / LaunchFailedAsync pattern. - New EvalRunner singleton subscribes to OnRunEval. Each incoming command spawns a fire-and-forget Task that builds an authenticated HttpClient, instantiates a DaemonEvalObserver bound to the run, and drives EvalService.RunAsync. Unhandled exceptions are caught and translated to an EvalFailed relay so the dashboard learns about daemon-side failures rather than waiting forever. - DaemonEvalObserver maps the IEvalObserver surface to SignalR sends: OnStarted -> EvalStartedAsync, OnQuestionCompleted -> EvalQuestionCompletedAsync, OnFinished -> EvalFinishedAsync, OnFailed -> EvalFailedAsync. Info / per-question-start / per-question-failure / fact-retained callbacks just log locally — they're not interesting enough to justify SignalR chatter for every judge. - Wired into DaemonRunner DI: AddSingleton<EvalRunner> + an explicit GetRequiredService at startup so the constructor's OnRunEval subscription happens before the host starts taking traffic. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three findings on PR #15 (the other two — observer-throw guard and judge-fact cancellation propagation — were already addressed by the M1 follow-up in 1f655f4): 1. EvalRunId mismatch (Action required) — server dispatches RunEvalCommand with an EvalRunId, but EvalService generated its own GUID, leading to two different ids in one run's event stream (EvalStarted used the service-generated id; subsequent question / finished / failed events used the dispatched id captured in DaemonEvalObserver). Fixed by adding an optional `evalRunId` parameter to EvalService.RunAsync; CLI passes null (mints a fresh id, current behaviour) and the daemon passes cmd.EvalRunId so the whole run, including the persisted SessionEvalCompleted aggregate, shares one correlation id end-to-end. 2. Out-of-order progress events (Recommended) — DaemonEvalObserver's per-event Task.Run can interleave concurrent SignalR sends. Added a SemaphoreSlim(1,1) gate inside Relay so the background sends drain in their enqueue order — the dashboard sees EvalStarted before any question completion, and EvalFinished/EvalFailed last, deterministically. 3. Daemon evals not cancellable on shutdown (Recommended) — EvalRunner spawned Task.Run with no link to the host lifecycle. Now injects IHostApplicationLifetime, captures ApplicationStopping, and passes it as ct to EvalService.RunAsync. M1's outer try/catch turns in-flight cancellation into a clean OnFailed("cancelled") relay so the dashboard learns the eval stopped instead of waiting forever. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ee5bc16 to
2866c00
Compare
Summary
Daemon side of the dashboard-driven eval pipeline (DEV-1440 milestone 2). Pairs with the server M3 endpoint in kurrent-io/Kurrent.Capacitor#477 and depends on the M1 shared eval library in #14.
Changes
Wire types in `Models.cs` match server's `DaemonCommands.cs`: `RunEvalCommand` (server → daemon dispatch) plus the four daemon → server progress events (`EvalStarted`, `EvalQuestionCompleted`, `EvalFinished`, `EvalFailed`). Registered in `KapacitorJsonContext`.
`ServerConnection` registers a `RunEval` handler and exposes per-event send methods (`EvalStartedAsync` etc.) mirroring the existing `AgentRegisteredAsync` / `LaunchFailedAsync` pattern.
`EvalRunner` singleton subscribes to `OnRunEval`. Each incoming command spawns a fire-and-forget Task that builds an authenticated HttpClient, instantiates a `DaemonEvalObserver` bound to the run, and drives `EvalService.RunAsync`. Unhandled exceptions are caught and translated to an `EvalFailed` relay so the dashboard learns about daemon-side failures rather than waiting forever.
`DaemonEvalObserver` maps the `IEvalObserver` surface from M1 to SignalR sends. Info / per-question-start / per-question-failure / fact-retained callbacks just log locally — not interesting enough to justify per-judge SignalR chatter.
DI wiring: `AddSingleton` plus an explicit `GetRequiredService` at startup so the constructor's subscription happens before the daemon starts taking traffic.
Test plan
Branch base
This PR is based on the M1 branch (`alexeyzimarev/dev-1440-m1-shared-eval-library`) since it consumes the extracted `EvalService` and `IEvalObserver`. After M1 merges to main, this branch will be rebased onto main automatically by GitHub.
🤖 Generated with Claude Code