feat: investigation feedback mechanism + fix DP checklist auto-complete#101
feat: investigation feedback mechanism + fix DP checklist auto-complete#101jacoblee-io merged 3 commits intomainfrom
Conversation
Add a low-friction feedback loop that lets SREs rate investigation accuracy. Feedback modifies retrieval weighting so correct diagnoses get boosted and wrong ones get suppressed in future investigations. - Add FeedbackStatus type and FEEDBACK_SIGNALS constant (1.5/0.5/0.1) - Add 3 column migrations: feedback_signal, feedback_note, feedback_at - Add getInvestigationById() and updateInvestigationFeedback() to indexer - Apply feedback weighting in pattern aggregation, keyword search, and validated hypothesis extraction - Plumb investigationId through InvestigationResult → tool details - Create investigation_feedback tool and register in agent-factory - Add Phase 5 feedback guidance to deep-investigation SKILL.md - Add WebUI feedback strip (Correct/Partial/Wrong) to InvestigationCard
jacoblee-io
left a comment
There was a problem hiding this comment.
Two distinct features in one PR — the checklist fix is clean, but the feedback mechanism has a few issues.
PR description only covers half the changes
The title/body describe the checklist auto-complete fix (~10 lines in usePilot.ts), but the bulk of the PR is the investigation feedback mechanism (~300 lines across 10 files: new tool, schema migration, indexer methods, engine weighting, UI component, skill doc). These are independent features and would be easier to review as separate PRs. At minimum, the PR description should cover the feedback mechanism.
Checklist auto-complete fix ✅
The guard condition change from deep_search.status === 'done' to hypotheses.status !== 'done' && !== 'skipped' is correct. The old condition created a deadlock since deep_search was the stuck item itself.
Investigation feedback mechanism — issues
1. Bug: extractValidatedHypotheses count semantics
count changed from integer (number of occurrences) to float (weighted by feedbackSignal). The display label still says "validated N times":
`- "${displayText}" (validated ${Math.round(count)} time${...})`Example: a hypothesis validated in 3 confirmed investigations:
- count = 1.5 + 1.5 + 1.5 = 4.5 → "validated 5 times" (was actually 3)
A rejected investigation's hypothesis:
- count = 0.1 → "validated 0 times"
Suggestion: keep count as a raw integer for display, apply feedbackSignal weighting separately in the sort key:
existing.count++;
existing.weightedScore = (existing.weightedScore ?? 0) + feedbackSignal;
// Sort by weightedScore, display raw count2. Nit: feedback via chat message is indirect
The UI sends feedback as a user chat message ([investigation feedback: confirmed] investigationId=...) — the agent must interpret it and call investigation_feedback. If the agent misinterprets or responds conversationally, the DB update doesn't happen (only the UI metadata is saved via updateMessageMeta).
This follows the existing HypothesesCard pattern so it's architecturally consistent, but worth noting that the updateMessageMeta call creates a false sense of persistence — the real weight adjustment depends on the LLM cooperating.
3. Nit: duplicate submit logic in InvestigationCard
The Enter-key handler and Submit button onClick contain identical logic (~6 lines each). Consider extracting to a handleSubmit() function.
Everything else looks good — schema migration is safe (ALTER TABLE ADD COLUMN with existence checks), getInvestigationById/updateInvestigationFeedback use parameterized queries, FEEDBACK_SIGNALS constants are well-chosen, and the TypeBox tool auto-adapts for claude-sdk brain via adaptToolsForSdk.
chent1996
left a comment
There was a problem hiding this comment.
Two clean changes:
-
Investigation feedback: Schema migration safe (3 nullable columns with existence check). Feedback signals (1.5/0.5/0.1) applied consistently across all 3 retrieval paths (pattern aggregation, keyword search, validated hypothesis extraction). Tool validates investigation existence before update.
investigationIdplumbing from engine → tool result → details → frontend is complete. -
DP checklist auto-complete: Guard condition change from
deep_searchtohypothesesphase is logically correct — only block auto-complete when waiting for user hypothesis confirmation.
Minor observation: frontend feedback is sent via sendMessage() (chat message to agent), with optimistic UI update via updateMessageMeta. If the agent doesn't act on the message, the investigation DB won't be updated but UI shows success. Acceptable trade-off for conversational UX. Non-blocking.
LGTM.
8cdaa89 to
0ca1914
Compare
- extractValidatedHypotheses: separate rawCount (display) from weightedScore (sorting) to avoid inflated/deflated occurrence counts - InvestigationCard: extract handleFeedbackSubmit() to deduplicate Enter-key and Submit button logic
The safety-net in prompt_done required deep_search checklist item to be marked 'done' before auto-completing remaining items. But if the agent presented findings without explicitly calling manage_checklist to mark deep_search as done, the checklist card stayed stuck at 2/4 indefinitely. Relax the guard condition: only block auto-complete if the hypotheses phase isn't done yet (agent waiting for user confirmation). Once hypotheses are done, any remaining in_progress/pending items get auto-completed when the agent finishes responding.
0ca1914 to
e5e2a66
Compare
jacoblee-io
left a comment
There was a problem hiding this comment.
Previous issues resolved:
- Count semantics ✅ — Split into
rawCount(integer, display) andweightedScore(float, sorting). "validated N times" now reflects actual occurrences. Sorting uses feedback-weighted score as intended. - Duplicate submit logic ✅ — Extracted to
handleFeedbackSubmit().
LGTM.
Summary
manage_checklistfor remaining phases.Changes
Investigation Feedback (commit 1 + 2)
FeedbackStatustype andFEEDBACK_SIGNALSconstant (1.5 / 0.5 / 0.1)feedback_signal,feedback_note,feedback_atgetInvestigationById()andupdateInvestigationFeedback()to indexerinvestigationIdthroughInvestigationResult→ tool detailsinvestigation_feedbacktool and register in agent-factoryReview fixes (commit 2)
extractValidatedHypotheses: separaterawCount(display) fromweightedScore(sorting) to avoid inflated/deflated occurrence countsInvestigationCard: extracthandleFeedbackSubmit()to deduplicate Enter-key and Submit button logicDP Checklist (commit 3)
prompt_donesafety-net guard: only block auto-complete whenhypothesesphase isn't done (waiting for user confirmation), not whendeep_searchisn't doneTest plan
tsc --noEmitpassesinvestigation_feedbacktool called → check SQLitefeedback_signalupdatedsearchInvestigationsandgetInvestigationPatterns