Skip to content

Commit 06163b5

Browse files
committed
Update analysis docs with status and fixes
Update two analysis docs (chat-to-proposal gap and manual testing findings) to reflect recent fixes and testing status. Key changes: add Last Updated and status notes; mark Tier 1 improvements shipped (intent classifier regex/stemming/negation fixes, substring ordering bug, PR #579), UX parse hints shipped (PR #582), unit/integration tests shipped (PR #580), and note PR range #578#582. In manual testing findings mark OBS-2/OBS-3 resolved (PR #581) and BUG-M5 resolved (PR #578), update resolutions and remove duplicate checklist items. Minor editorial clarifications and test counts added.
1 parent 8037d5d commit 06163b5

File tree

2 files changed

+22
-20
lines changed

2 files changed

+22
-20
lines changed

docs/analysis/2026-03-29_chat_nlp_proposal_gap.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
# Chat-to-Proposal NLP Gap Analysis
22

33
**Date:** 2026-03-29
4+
**Last Updated:** 2026-03-29
45
**Trigger:** Manual testing — user asked chat "can you create new onboarding tasks for people who aren't technical?" and received a parse failure error instead of a proposal.
56

7+
> **Status update (2026-03-29):** Tier 1 improvements and testing are now shipped via PRs #578#582. See resolution notes inline below.
8+
69
## Observed Behavior
710

811
The chat assistant replied conversationally but appended:
@@ -15,14 +18,11 @@ The user's natural language request was not translatable into the regex-based in
1518

1619
There is an architectural gap between three components in the chat-to-proposal pipeline:
1720

18-
### 1. Intent Classifier (`LlmIntentClassifier.Classify`)
21+
### 1. Intent Classifier (`LlmIntentClassifier.Classify`) — ✓ IMPROVED
1922

20-
A static keyword matcher that scans the raw user message for substrings like `"create task"`, `"new card"`, `"move card"`, etc. Returns `(IsActionable, ActionIntent)`.
23+
~~A static keyword matcher that scans the raw user message for substrings like `"create task"`, `"new card"`, `"move card"`, etc.~~ Now uses compiled regex patterns with word-distance matching (`(\w+\s+){0,5}` gaps), stemming/plurals (`tasks?`, `cards?`), broader verb coverage ("set up", "build", "make", "generate"), and negative context filtering for negations ("don't create") and other-tool questions ("how do I create a card in Jira?"). Returns `(IsActionable, ActionIntent)`.
2124

22-
**Problem:** Brittle substring matching. The message "can you create new onboarding tasks for people who aren't technical?" does NOT match any pattern because:
23-
- `"create task"` requires those two words adjacent — `"create new oboarding tasks"` has words between them
24-
- `"new task"` fails for the same reason (plural `"tasks"` with preceding adjective)
25-
- No fuzzy matching, stemming, or word-distance tolerance
25+
**Resolved (PR #579):** The classifier now matches "can you create new onboarding tasks for people who aren't technical?" as `card.create`. Null input returns `(false, null)` instead of throwing. Substring ordering bug fixed ("remove card" correctly classified as `card.archive`). 86 unit tests cover all patterns. Redundant `Contains()` fallbacks removed.
2626

2727
### 2. Instruction Parser (`AutomationPlannerService.ParseInstructionAsync`)
2828

@@ -75,8 +75,8 @@ User message (natural language)
7575
- "how do I create a card in Jira?" (asking about a different tool) → triggers `card.create`
7676
- "don't create task yet, just tell me about it" → triggers `card.create`
7777

78-
### B2. Substring ordering bug
79-
- "remove card abc123" → classified as `card.move` (not `card.archive`) because `"move card"` is a substring of `"remove card"` and is checked first in the classifier's if-chain. This is a priority-ordering bug where `remove` is misclassified.
78+
### B2. Substring ordering bug — ✓ RESOLVED
79+
- ~~"remove card abc123" → classified as `card.move` (not `card.archive`)~~ Fixed in PR #579 — archive patterns now checked before move patterns; "remove card" correctly classifies as `card.archive`.
8080

8181
### C. Parser failures on detected intent
8282
- Even when intent IS detected, parser fails unless user writes exact syntax
@@ -95,13 +95,13 @@ User message (natural language)
9595

9696
## Improvement Ideas
9797

98-
### Tier 1: Quick wins (no LLM changes)
98+
### Tier 1: Quick wins (no LLM changes) — ✓ SHIPPED
9999

100-
1. **Improve intent classifier coverage**Add stemming, word-distance tolerance, regex-based matching instead of substring. Handle plurals (`tasks``task`), word gaps (`create ... task`), and common phrasings.
100+
1. **Improve intent classifier coverage (PR #579)**Shipped: compiled regex with word-distance tolerance, stemming/plurals, broader verbs, negative context filtering. 86 unit tests.
101101

102-
2. **Better error UX**When parsing fails, show the user the expected format with examples and offer a "rephrase as command" helper in the UI rather than a raw error dump.
102+
2. **Better error UX (PR #582)**Shipped: structured `[PARSE_HINT]` JSON payloads with `supportedPatterns` array, `closestPattern`, `exampleInstruction`, and `detectedIntent`; frontend hint card with "try this instead" button that pre-fills chat input; collapsible pattern list.
103103

104-
3. **Frontend instruction builder** — Add a structured command palette / form UI that lets users build instructions visually instead of typing natural language.
104+
3. **Frontend instruction builder**Not yet started. Add a structured command palette / form UI that lets users build instructions visually instead of typing natural language.
105105

106106
### Tier 2: LLM-assisted instruction extraction
107107

@@ -119,10 +119,10 @@ User message (natural language)
119119

120120
9. **Board-context-aware prompting** — Include current board columns, card titles, and labels in the LLM system prompt so it can generate contextually valid instructions (e.g., knowing which columns exist).
121121

122-
## Testing Considerations
122+
## Testing Considerations — ✓ SHIPPED (PR #580)
123123

124-
- **Unit tests for classifier edge cases**Document current gaps with failing-by-design tests that prove the classifier misses natural language
125-
- **Integration tests for ChatService proposal flow**Verify the end-to-end path from natural language → classifier → parser error/success
124+
- **Unit tests for classifier edge cases**86 tests in `LlmIntentClassifierTests.cs` covering all current patterns, natural language, negation, other-tool questions, edge cases (null, empty, long input, special chars)
125+
- **Integration tests for ChatService proposal flow**28 tests in `ChatServiceTests.cs` covering structured syntax → proposal creation, natural language → classifier miss, explicit request → parser failure, and graceful error paths
126126
- **Mock provider is sufficient for most tests** — The classifier behavior is identical across all providers
127127
- **Live LLM tests would require** provider configuration and are better suited for manual/E2E testing with `TASKDECK_LLM_PROVIDER=OpenAI` env var
128128

docs/analysis/2026-03-29_manual_testing_consolidated_findings.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -203,15 +203,17 @@ The most critical findings are:
203203

204204
---
205205

206-
### OBS-2: Activity view defaults to an archived board
206+
### OBS-2: Activity view defaults to an archived board ✓ RESOLVED
207207
**Surface:** `/workspace/activity`
208208
**Detail:** The board dropdown pre-selects the first board alphabetically, which may be "calendar (Archived)". This shows "No board activity yet" as the cold state. Default should be the most-recently-active non-archived board.
209+
**Resolution:** Fixed in PR #581 — board selector now sorts non-archived first, then by most-recently-updated descending.
209210

210211
---
211212

212-
### OBS-3: Activity view shows no history for boards with real mutations
213+
### OBS-3: Activity view shows no history for boards with real mutations ✓ RESOLVED
213214
**Surface:** `/workspace/activity`
214215
**Detail:** Boards with confirmed mutations (column creates, card adds/moves/edits, label assigns) show "No board activity yet". Either audit events aren't being recorded for these operations, or the board-history fetch has a scoping bug.
216+
**Resolution:** Fixed in PR #581 — audit logging wired for all board/card/column/label mutations via `IHistoryService.LogActionAsync` with `SafeLogAsync` resilience wrapper to prevent audit failures from crashing mutations.
215217

216218
---
217219

@@ -350,7 +352,7 @@ On `mousedown` on the drag handle, call `window.getSelection()?.removeAllRanges(
350352

351353
**BUG-M4 (chat markdown):** Add a markdown-to-HTML renderer (e.g., `marked` or `markdown-it`) to the chat message renderer component.
352354

353-
**BUG-M5 (archive 30s freeze):** Profile the archive action. Likely a synchronous DOM mutation or an un-awaited SignalR broadcast. Move archive board response handling to async/non-blocking.
355+
**BUG-M5 (archive 30s freeze):** ✓ RESOLVED in PR #578. Root cause: sequential reactive mutations in `deleteBoard()` while BoardView was still mounted caused cascading Vue reactive flushes. Fix: navigate away before clearing state, reorder mutations in `boardCrudStore`, add `finally` block for loading state reset.
354356

355357
**BUG-M6 (no restore toast):** Add a success toast in the archive store's `restoreBoard` action, matching the pattern used in `createBoard`.
356358

@@ -359,8 +361,8 @@ On `mousedown` on the drag handle, call `window.getSelection()?.removeAllRanges(
359361
- BUG-L1: Trim card title before toast interpolation.
360362
- BUG-L2: Hide "DRAG CARD" text on non-hover; show only on `:hover` of the drag handle zone.
361363
- OBS-1: Show label color swatch in card modal label picker.
362-
- OBS-2: Default activity view to most-recently-active non-archived board.
363-
- OBS-3: Investigate audit event recording — ensure board mutations emit audit entries.
364+
- ~~OBS-2: Default activity view to most-recently-active non-archived board.~~ ✓ Fixed in PR #581
365+
- ~~OBS-3: Investigate audit event recording — ensure board mutations emit audit entries.~~ ✓ Fixed in PR #581
364366
- OBS-4: Route-guard Ops Console (and other feature-flagged surfaces) so direct URL access is also gated.
365367
- OBS-5: Add a tooltip to "PRECISION MODE ACTIVE" explaining its meaning and how to disable it.
366368

0 commit comments

Comments
 (0)