[DEV-1440] M1: Extract shared eval library by alexeyzimarev · Pull Request #14 · kurrent-io/kapacitor

alexeyzimarev · 2026-04-13T14:42:27Z

Summary

Refactors kapacitor.Commands.EvalCommand into a reusable kapacitor.Eval library so the daemon (milestone 2) can reuse the same orchestration without duplicating it. No behaviour change — `kapacitor eval ` produces identical output and the server contracts are untouched.

First milestone of DEV-1440.

New namespace layout

`kapacitor.Eval.EvalQuestions` — canonical 13-question / 4-category taxonomy and category-order helper. Single source of truth.
`kapacitor.Eval.IEvalObserver` — observer surface for progress. The CLI supplies a stderr-logging implementation; M2 will add a SignalR-pushing implementation for the daemon. Callbacks are shaped so `OnStarted` / `OnQuestionCompleted` / `OnFinished` / `OnFailed` map 1:1 to the SignalR events documented in DEV-1440.
`kapacitor.Eval.EvalService` — `RunAsync` drives the full pipeline (fetch context, fetch retained facts, run 13 judges sequentially, aggregate, persist, retain new facts) and reports every phase through `IEvalObserver`. Returns the aggregate on success, null on failure.

CLI adapter

`kapacitor.Commands.EvalCommand` shrinks to:

Create authenticated HTTP client
`ConsoleEvalObserver` — maps each callback to a timestamped stderr log line (matches pre-refactor output exactly)
Render the returned aggregate as the terminal report

Visibility

Types remain `internal` — the daemon lives in the same assembly, so `public` isn't needed yet. Revisit if/when the server repo consumes this library across assembly boundaries.

Test plan

`dotnet build src/kapacitor/kapacitor.csproj` — clean
`dotnet publish -c Release` — zero IL3050/IL2026 warnings (AOT-clean)
Full unit suite — 205/205 pass
- 21 existing eval tests (ParseVerdict, ExtractRetainFact, Aggregate, FormatKnownPatterns, BuildQuestionPrompt) migrated to target the new namespace — no assertion changes needed
- `EvalCommandTests` renamed to `EvalServiceTests` to match the actual SUT
CI
Manual smoke of `kapacitor eval ` against a local server (behaviour-preserving refactor, but the argv → observer → stderr path is worth exercising once)

What's next

M2 will introduce a `RunEvalCommand` SignalR command on the daemon side, implementing `IEvalObserver` to push progress events back to the server. The server dispatch endpoint (M3) and UI tab (M5) depend on it.

🤖 Generated with Claude Code

Refactors kapacitor.Commands.EvalCommand into a reusable kapacitor.Eval library so the daemon (milestone 2) can reuse the same orchestration without duplicating it. No behaviour change — `kapacitor eval <id>` produces identical output and the server contracts are untouched. New namespace layout: - kapacitor.Eval.EvalQuestions: canonical 13-question / 4-category taxonomy and category-order helper. The single source of truth; both prompt building and aggregation reference it. - kapacitor.Eval.IEvalObserver: observer surface for progress. The CLI supplies a stderr-logging implementation; milestone 2 will add a SignalR-pushing implementation for the daemon. Callbacks are shaped specifically so EvalStarted / OnQuestionCompleted / OnFinished / OnFailed map 1:1 to the SignalR events documented in DEV-1440. - kapacitor.Eval.EvalService: RunAsync drives the full pipeline (fetch context, fetch retained facts, run 13 judges sequentially, aggregate, persist, retain new facts) and reports every phase through IEvalObserver. Returns the aggregate on success, null on failure; OnFinished / OnFailed are fired either way so observers don't need to also inspect the return value. kapacitor.Commands.EvalCommand shrinks to a thin adapter: - Creates the authenticated HTTP client - Provides a ConsoleEvalObserver that maps each callback to a timestamped stderr log line (matching the pre-refactor output exactly) - Renders the returned aggregate as the terminal report Types remain internal — the daemon lives in the same assembly, so public isn't needed yet; revisit when/if the server repo consumes the library across assembly boundaries. Tests renamed EvalCommandTests -> EvalServiceTests and retargeted to the new namespace. All 21 existing eval tests continue to pass without changes to their assertions. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

linear · 2026-04-13T14:42:32Z

DEV-1440 Dashboard-driven eval via daemon execution

qodo-code-review · 2026-04-13T14:42:50Z

Review Summary by Qodo

Extract shared eval library for daemon reuse (DEV-1440 M1)

✨ Enhancement

Walkthroughs

Description

• Extract eval orchestration into reusable kapacitor.Eval library
• Move 13-question taxonomy to EvalQuestions for single source of truth
• Introduce IEvalObserver interface for progress reporting across environments
• Refactor EvalCommand to thin CLI adapter over EvalService
• Rename test class to EvalServiceTests with retargeted assertions

Diagram

flowchart LR
  EvalCommand["EvalCommand<br/>(thin CLI adapter)"]
  EvalService["EvalService<br/>(core orchestration)"]
  EvalQuestions["EvalQuestions<br/>(taxonomy)"]
  IEvalObserver["IEvalObserver<br/>(progress surface)"]
  ConsoleObserver["ConsoleEvalObserver<br/>(stderr logging)"]
  
  EvalCommand -- "calls RunAsync" --> EvalService
  EvalCommand -- "implements" --> IEvalObserver
  EvalCommand -- "creates" --> ConsoleObserver
  EvalService -- "references" --> EvalQuestions
  EvalService -- "reports via" --> IEvalObserver
  ConsoleObserver -- "implements" --> IEvalObserver

File Changes

1. src/kapacitor/Commands/EvalCommand.cs Refactoring +47/-395

Thin CLI adapter over EvalService library

• Refactored from 410 lines to 29 lines, moving orchestration logic to EvalService
• Removed question taxonomy, verdict parsing, aggregation, and HTTP logic
• Added ConsoleEvalObserver class implementing IEvalObserver for stderr logging
• Simplified HandleEval to create HTTP client, call EvalService.RunAsync, and render results
• Updated Render method to use EvalService.VerdictForScore instead of local method

src/kapacitor/Commands/EvalCommand.cs

2. src/kapacitor/Eval/EvalQuestions.cs ✨ Enhancement +51/-0

Canonical question taxonomy and category ordering

• New file establishing canonical 13-question taxonomy across 4 categories
• Defines Question record with Category, Id, and Text fields
• Exports All array as single source of truth for question definitions
• Provides Categories array and CategoryOrder method for consistent ordering
• Replaces inline question definitions previously in EvalCommand

src/kapacitor/Eval/EvalQuestions.cs

3. src/kapacitor/Eval/EvalService.cs ✨ Enhancement +410/-0

Core eval orchestration with observer-based progress reporting

• New file containing core eval orchestration logic extracted from EvalCommand
• Implements RunAsync method driving full pipeline: fetch context, run judges, aggregate, persist
• Includes prompt construction, verdict parsing, fact extraction, and aggregation logic
• Reports all phases through IEvalObserver callbacks for progress tracking
• Provides public static methods for verdict parsing, prompt building, and aggregation
• Handles HTTP communication with server for context, judge facts, and result persistence

src/kapacitor/Eval/EvalService.cs

View more (2)

4. src/kapacitor/Eval/IEvalObserver.cs ✨ Enhancement +45/-0

Observer interface for progress reporting across environments

• New interface defining progress surface for eval runs
• Includes 9 callback methods: OnInfo, OnStarted, OnContextFetched, OnQuestionStarted,
 OnQuestionCompleted, OnQuestionFailed, OnFactRetained, OnFinished, OnFailed
• Callbacks shaped to map 1:1 to SignalR events for daemon milestone 2
• Allows different implementations (CLI stderr logging vs daemon SignalR pushing)
• Includes comprehensive XML documentation for each callback

src/kapacitor/Eval/IEvalObserver.cs

5. test/kapacitor.Tests.Unit/EvalServiceTests.cs 🧪 Tests +25/-25

Retarget eval tests to EvalService namespace

• Renamed from EvalCommandTests to EvalServiceTests to match new SUT
• Updated all 21 test method calls from EvalCommand.* to EvalService.*
• Changed question definition from EvalCommand.EvalQuestion to EvalQuestions.Question
• All assertions remain unchanged; tests continue to pass without modification
• Covers verdict parsing, aggregation, prompt building, pattern formatting, and fact extraction

test/kapacitor.Tests.Unit/EvalServiceTests.cs

qodo-code-review · 2026-04-13T14:42:52Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Observer exceptions abort eval~~ ☑ 🐞 ☼

Description

IEvalObserver promises observer exceptions are caught and don’t abort the eval, but EvalService
invokes observer callbacks directly without any try/catch, so an observer throw will terminate
RunAsync and may skip OnFailed/OnFinished.

Code

src/kapacitor/Eval/EvalService.cs[R89-97]

+        observer.OnContextFetched(
+            context.Trace.Count,
+            traceJson.Length,
+            context.Compaction.ToolResultsTotal,
+            context.Compaction.ToolResultsTruncated,
+            context.Compaction.BytesSaved
+        );
+        observer.OnStarted(evalRunId, context.SessionId, model, EvalQuestions.All.Length);
+

Evidence

The IEvalObserver contract explicitly states the service catches observer exceptions, but
EvalService calls observer methods directly (e.g., OnContextFetched/OnStarted) with no guarding
wrapper; any exception will propagate out of RunAsync.

src/kapacitor/Eval/IEvalObserver.cs[11-16]
src/kapacitor/Eval/EvalService.cs[89-97]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`IEvalObserver` documents that observer exceptions are caught/logged and do not abort the eval, but `EvalService.RunAsync` calls observer methods directly. If an observer throws (e.g., SignalR push fails), the eval orchestration will crash and may not emit `OnFailed`/`OnFinished`.

### Issue Context
This library is intended for reuse by the daemon (M2). In that environment, observer callbacks are more likely to do I/O and fail transiently.

### Fix Focus Areas
- src/kapacitor/Eval/EvalService.cs[55-179]
- src/kapacitor/Eval/IEvalObserver.cs[11-16]

### Suggested fix
- Add a small helper in `EvalService` like `SafeNotify(Action notify, string context)` that wraps each `observer.*` call in try/catch.
- On catch, log to a safe sink (e.g., `Console.Error.WriteLine`) or a dedicated internal logger; avoid calling back into the observer in the catch path.
- Use the helper for *all* observer calls (`OnInfo`, `OnStarted`, `OnContextFetched`, `OnQuestion*`, `OnFinished`, `OnFailed`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Progress events reversed~~ ☑ 🐞 ≡

Description

EvalService emits OnContextFetched before OnStarted, but the CLI observer maps these to user-facing
lines (“Fetched …” and “Evaluating session …”), causing progress output ordering to contradict the
CLI’s stated “pre-refactor shape”.

Code

src/kapacitor/Eval/EvalService.cs[R89-97]

+        observer.OnContextFetched(
+            context.Trace.Count,
+            traceJson.Length,
+            context.Compaction.ToolResultsTotal,
+            context.Compaction.ToolResultsTruncated,
+            context.Compaction.BytesSaved
+        );
+        observer.OnStarted(evalRunId, context.SessionId, model, EvalQuestions.All.Length);
+

Evidence

EvalService calls OnContextFetched and only then OnStarted. ConsoleEvalObserver logs OnStarted
as “Evaluating session …” and OnContextFetched as “Fetched …”, while its comment claims it matches
the pre-refactor output shape.

src/kapacitor/Eval/EvalService.cs[89-97]
src/kapacitor/Commands/EvalCommand.cs[57-70]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`EvalService.RunAsync` currently calls `observer.OnContextFetched(...)` before `observer.OnStarted(...)`. The CLI observer logs these in a user-facing way, so normal runs will print “Fetched …” before “Evaluating session …”, contradicting the intent of preserving the CLI output shape.

### Issue Context
`ConsoleEvalObserver` is explicitly documented as matching the old stderr format.

### Fix Focus Areas
- src/kapacitor/Eval/EvalService.cs[89-97]
- src/kapacitor/Commands/EvalCommand.cs[62-70]

### Suggested fix
- Emit `OnStarted(...)` before `OnContextFetched(...)` (or adjust observer contract / CLI observer to preserve the intended log order).
- If `OnStarted` must remain “after context fetched” semantically, still reorder those two calls because both are already after the fetch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~401 prints extra line~~ ☑ 🐞 ◔

Description

On a 401, HandleUnauthorizedAsync already writes the server message to stderr, but EvalService
additionally calls observer.OnFailed("unauthenticated"); with ConsoleEvalObserver this prints an
extra unprefixed line.

Code

src/kapacitor/Eval/EvalService.cs[R57-61]

+            if (await HttpClientExtensions.HandleUnauthorizedAsync(resp)) {
+                observer.OnFailed("unauthenticated");
+
+                return null;
+            }

Evidence
HandleUnauthorizedAsync prints the error message to Console.Error. EvalService then calls
observer.OnFailed("unauthenticated"), and the CLI observer writes the reason directly to stderr,
resulting in duplicated/changed error output for the same condition.
src/kapacitor/HttpClientExtensions.cs[111-130]
src/kapacitor/Eval/EvalService.cs[55-61]
src/kapacitor/Commands/EvalCommand.cs[86-88]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
When the eval-context call returns 401, the code path prints to stderr via `HandleUnauthorizedAsync`, then also emits `observer.OnFailed("unauthenticated")`. In the CLI this produces an extra line and changes the error shape.

### Issue Context
The extracted library should ideally not write directly to the console; instead, errors should flow through `IEvalObserver` so CLI/daemon can render appropriately.

### Fix Focus Areas
- src/kapacitor/Eval/EvalService.cs[55-61]
- src/kapacitor/HttpClientExtensions.cs[111-130]
- src/kapacitor/Commands/EvalCommand.cs[86-88]

### Suggested fix
- Prefer a single reporting mechanism:
 - Option A (minimal): if `HandleUnauthorizedAsync(resp)` returns true, return null without calling `observer.OnFailed(...)`.
 - Option B (better for library): refactor `HandleUnauthorizedAsync` to *return* the message (or provide a non-printing overload) and let `EvalService` call `observer.OnFailed(message)` without any direct console writes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~Cancellation partly ignored~~ ☑ 🐞 ☼

Description

RunAsync accepts a CancellationToken, but judge-fact fetch/post requests don’t pass it to
GetWithRetryAsync/PostWithRetryAsync and cancellation via ThrowIfCancellationRequested can bypass
the method’s documented “OnFinished/OnFailed either way” behavior.

Code

src/kapacitor/Eval/EvalService.cs[R346-358]

+    static async Task<Dictionary<string, List<JudgeFact>>> FetchAllJudgeFactsAsync(
+            HttpClient    httpClient,
+            string        baseUrl,
+            string        encodedSessionId,
+            IEvalObserver observer
+        ) {
+        var result = new Dictionary<string, List<JudgeFact>>();
+
+        foreach (var category in EvalQuestions.Categories) {
+            try {
+                using var resp = await httpClient.GetWithRetryAsync(
+                    $"{baseUrl}/api/sessions/{encodedSessionId}/judge-facts?category={Uri.EscapeDataString(category)}"
+                );

Evidence
EvalService documents that observers receive a final OnFinished or OnFailed, but it throws on
cancellation inside the loop and doesn’t catch OperationCanceledException. Additionally, the
judge-facts HTTP helper methods omit passing ct even though the underlying extension methods
support it, so cancellation won’t be honored during those requests.
src/kapacitor/Eval/EvalService.cs[22-27]
src/kapacitor/Eval/EvalService.cs[106-110]
src/kapacitor/Eval/EvalService.cs[346-358]
src/kapacitor/Eval/EvalService.cs[377-407]
src/kapacitor/HttpClientExtensions.cs[80-91]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`EvalService.RunAsync` takes a `CancellationToken` but does not propagate it to judge-fact fetch/post HTTP calls, and it can throw `OperationCanceledException` without emitting `OnFailed`, contradicting the method’s contract comment that observers receive a final `OnFinished`/`OnFailed`.

### Issue Context
This matters more for the daemon use case where users may cancel long-running evals.

### Fix Focus Areas
- src/kapacitor/Eval/EvalService.cs[29-180]
- src/kapacitor/Eval/EvalService.cs[346-408]

### Suggested fix
- Thread `CancellationToken ct` into `FetchAllJudgeFactsAsync(...)` and `PostJudgeFactAsync(...)` and pass `ct: ct` to `GetWithRetryAsync`/`PostWithRetryAsync` and `ReadAsStringAsync(ct)`.
- Wrap the body of `RunAsync` (or at least the question loop) in a `try { ... } catch (OperationCanceledException) { observer.OnFailed("cancelled"); return null; }` (using the same safe-notify wrapper from the observer-exception fix).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

Daemon side of the dashboard-driven eval pipeline. Pairs with the server M3 endpoint in kurrent-io/Kurrent.Capacitor#477 and depends on the M1 shared eval library in #14. - New SignalR wire types in Models.cs match the server's DaemonCommands.cs: RunEvalCommand (server -> daemon dispatch) plus the four daemon -> server progress events (EvalStarted, EvalQuestionCompleted, EvalFinished, EvalFailed). Registered in KapacitorJsonContext for source-gen serialization. - ServerConnection registers a "RunEval" handler and exposes per-event send methods (EvalStartedAsync etc.) that mirror the existing AgentRegisteredAsync / LaunchFailedAsync pattern. - New EvalRunner singleton subscribes to OnRunEval. Each incoming command spawns a fire-and-forget Task that builds an authenticated HttpClient, instantiates a DaemonEvalObserver bound to the run, and drives EvalService.RunAsync. Unhandled exceptions are caught and translated to an EvalFailed relay so the dashboard learns about daemon-side failures rather than waiting forever. - DaemonEvalObserver maps the IEvalObserver surface to SignalR sends: OnStarted -> EvalStartedAsync, OnQuestionCompleted -> EvalQuestionCompletedAsync, OnFinished -> EvalFinishedAsync, OnFailed -> EvalFailedAsync. Info / per-question-start / per-question-failure / fact-retained callbacks just log locally — they're not interesting enough to justify SignalR chatter for every judge. - Wired into DaemonRunner DI: AddSingleton<EvalRunner> + an explicit GetRequiredService at startup so the constructor's OnRunEval subscription happens before the host starts taking traffic. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Four findings on PR #14: 1. Observer exceptions abort eval (Action required) — IEvalObserver documented that observer throws are caught and don't abort the eval, but EvalService called callbacks directly. A SignalR push failure on the daemon would have crashed the run mid-flight, possibly skipping OnFailed. Fixed via a SafeObserver wrapper inside RunAsync that delegates to the caller's observer with a try/catch around each call; exceptions log to stderr (with a nested try/catch in case stderr itself fails) and the eval continues. 2. Progress events reversed (Recommended) — OnContextFetched was emitted before OnStarted. The CLI observer maps these to "Fetched..." then "Evaluating session..." log lines, so the user-facing output order was the reverse of the pre-refactor shape. Swapped — now OnStarted fires first, then OnContextFetched. 3. 401 prints extra line (Recommended) — HandleUnauthorizedAsync writes to stderr directly, then EvalService called observer.OnFailed with "unauthenticated", which the CLI observer also wrote to stderr — resulting in two lines for the same condition. Replaced the HandleUnauthorizedAsync call with a direct StatusCode == 401 check and a single observer.OnFailed("authentication failed — run 'kapacitor login' to re-authenticate"). The observer is now the single reporting channel; daemon callers also benefit (they get EvalFailed instead of nothing for 401s). 4. Cancellation partly ignored (Action required) — RunAsync took a CancellationToken but didn't forward it to FetchAllJudgeFactsAsync / PostJudgeFactAsync, and ThrowIfCancellationRequested could escape without firing OnFailed. Now: ct threads through both helpers (and their HTTP calls + ReadAsStringAsync), and the body of RunAsync is wrapped in a try/catch (OperationCanceledException) that fires observer.OnFailed("cancelled") before returning null — observers always see exactly one terminal callback. Doc updated to reflect that the SafeObserver guarantee + cancellation contract are now actually enforced. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Daemon side of the dashboard-driven eval pipeline. Pairs with the server M3 endpoint in kurrent-io/Kurrent.Capacitor#477 and depends on the M1 shared eval library in #14. - New SignalR wire types in Models.cs match the server's DaemonCommands.cs: RunEvalCommand (server -> daemon dispatch) plus the four daemon -> server progress events (EvalStarted, EvalQuestionCompleted, EvalFinished, EvalFailed). Registered in KapacitorJsonContext for source-gen serialization. - ServerConnection registers a "RunEval" handler and exposes per-event send methods (EvalStartedAsync etc.) that mirror the existing AgentRegisteredAsync / LaunchFailedAsync pattern. - New EvalRunner singleton subscribes to OnRunEval. Each incoming command spawns a fire-and-forget Task that builds an authenticated HttpClient, instantiates a DaemonEvalObserver bound to the run, and drives EvalService.RunAsync. Unhandled exceptions are caught and translated to an EvalFailed relay so the dashboard learns about daemon-side failures rather than waiting forever. - DaemonEvalObserver maps the IEvalObserver surface to SignalR sends: OnStarted -> EvalStartedAsync, OnQuestionCompleted -> EvalQuestionCompletedAsync, OnFinished -> EvalFinishedAsync, OnFailed -> EvalFailedAsync. Info / per-question-start / per-question-failure / fact-retained callbacks just log locally — they're not interesting enough to justify SignalR chatter for every judge. - Wired into DaemonRunner DI: AddSingleton<EvalRunner> + an explicit GetRequiredService at startup so the constructor's OnRunEval subscription happens before the host starts taking traffic. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [DEV-1440] milestone 2: daemon RunEvalCommand handler Daemon side of the dashboard-driven eval pipeline. Pairs with the server M3 endpoint in kurrent-io/Kurrent.Capacitor#477 and depends on the M1 shared eval library in #14. - New SignalR wire types in Models.cs match the server's DaemonCommands.cs: RunEvalCommand (server -> daemon dispatch) plus the four daemon -> server progress events (EvalStarted, EvalQuestionCompleted, EvalFinished, EvalFailed). Registered in KapacitorJsonContext for source-gen serialization. - ServerConnection registers a "RunEval" handler and exposes per-event send methods (EvalStartedAsync etc.) that mirror the existing AgentRegisteredAsync / LaunchFailedAsync pattern. - New EvalRunner singleton subscribes to OnRunEval. Each incoming command spawns a fire-and-forget Task that builds an authenticated HttpClient, instantiates a DaemonEvalObserver bound to the run, and drives EvalService.RunAsync. Unhandled exceptions are caught and translated to an EvalFailed relay so the dashboard learns about daemon-side failures rather than waiting forever. - DaemonEvalObserver maps the IEvalObserver surface to SignalR sends: OnStarted -> EvalStartedAsync, OnQuestionCompleted -> EvalQuestionCompletedAsync, OnFinished -> EvalFinishedAsync, OnFailed -> EvalFailedAsync. Info / per-question-start / per-question-failure / fact-retained callbacks just log locally — they're not interesting enough to justify SignalR chatter for every judge. - Wired into DaemonRunner DI: AddSingleton<EvalRunner> + an explicit GetRequiredService at startup so the constructor's OnRunEval subscription happens before the host starts taking traffic. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [DEV-1440] address review feedback on daemon eval runner Three findings on PR #15 (the other two — observer-throw guard and judge-fact cancellation propagation — were already addressed by the M1 follow-up in 1f655f4): 1. EvalRunId mismatch (Action required) — server dispatches RunEvalCommand with an EvalRunId, but EvalService generated its own GUID, leading to two different ids in one run's event stream (EvalStarted used the service-generated id; subsequent question / finished / failed events used the dispatched id captured in DaemonEvalObserver). Fixed by adding an optional `evalRunId` parameter to EvalService.RunAsync; CLI passes null (mints a fresh id, current behaviour) and the daemon passes cmd.EvalRunId so the whole run, including the persisted SessionEvalCompleted aggregate, shares one correlation id end-to-end. 2. Out-of-order progress events (Recommended) — DaemonEvalObserver's per-event Task.Run can interleave concurrent SignalR sends. Added a SemaphoreSlim(1,1) gate inside Relay so the background sends drain in their enqueue order — the dashboard sees EvalStarted before any question completion, and EvalFinished/EvalFailed last, deterministically. 3. Daemon evals not cancellable on shutdown (Recommended) — EvalRunner spawned Task.Run with no link to the host lifecycle. Now injects IHostApplicationLifetime, captures ApplicationStopping, and passes it as ct to EvalService.RunAsync. M1's outer try/catch turns in-flight cancellation into a clean OnFailed("cancelled") relay so the dashboard learns the eval stopped instead of waiting forever. Full suite 205/205, AOT publish clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qodo-code-review bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/kapacitor/Eval/EvalService.cs

alexeyzimarev mentioned this pull request Apr 13, 2026

[DEV-1440] M2: Daemon RunEvalCommand handler #15

Merged

5 tasks

alexeyzimarev merged commit dec2102 into main Apr 13, 2026
3 checks passed

alexeyzimarev deleted the alexeyzimarev/dev-1440-m1-shared-eval-library branch April 13, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV-1440] M1: Extract shared eval library#14

[DEV-1440] M1: Extract shared eval library#14
alexeyzimarev merged 2 commits intomainfrom
alexeyzimarev/dev-1440-m1-shared-eval-library

alexeyzimarev commented Apr 13, 2026

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexeyzimarev commented Apr 13, 2026

Summary

New namespace layout

CLI adapter

Visibility

Test plan

What's next

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading