You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update two analysis docs (chat-to-proposal gap and manual testing findings) to reflect recent fixes and testing status. Key changes: add Last Updated and status notes; mark Tier 1 improvements shipped (intent classifier regex/stemming/negation fixes, substring ordering bug, PR #579), UX parse hints shipped (PR #582), unit/integration tests shipped (PR #580), and note PR range #578–#582. In manual testing findings mark OBS-2/OBS-3 resolved (PR #581) and BUG-M5 resolved (PR #578), update resolutions and remove duplicate checklist items. Minor editorial clarifications and test counts added.
Copy file name to clipboardExpand all lines: docs/analysis/2026-03-29_chat_nlp_proposal_gap.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,11 @@
1
1
# Chat-to-Proposal NLP Gap Analysis
2
2
3
3
**Date:** 2026-03-29
4
+
**Last Updated:** 2026-03-29
4
5
**Trigger:** Manual testing — user asked chat "can you create new onboarding tasks for people who aren't technical?" and received a parse failure error instead of a proposal.
5
6
7
+
> **Status update (2026-03-29):** Tier 1 improvements and testing are now shipped via PRs #578–#582. See resolution notes inline below.
8
+
6
9
## Observed Behavior
7
10
8
11
The chat assistant replied conversationally but appended:
@@ -15,14 +18,11 @@ The user's natural language request was not translatable into the regex-based in
15
18
16
19
There is an architectural gap between three components in the chat-to-proposal pipeline:
A static keyword matcher that scans the raw user message for substrings like `"create task"`, `"new card"`, `"move card"`, etc. Returns `(IsActionable, ActionIntent)`.
23
+
~~A static keyword matcher that scans the raw user message for substrings like `"create task"`, `"new card"`, `"move card"`, etc.~~ Now uses compiled regex patterns with word-distance matching (`(\w+\s+){0,5}` gaps), stemming/plurals (`tasks?`, `cards?`), broader verb coverage ("set up", "build", "make", "generate"), and negative context filtering for negations ("don't create") and other-tool questions ("how do I create a card in Jira?"). Returns `(IsActionable, ActionIntent)`.
21
24
22
-
**Problem:** Brittle substring matching. The message "can you create new onboarding tasks for people who aren't technical?" does NOT match any pattern because:
23
-
-`"create task"` requires those two words adjacent — `"create new oboarding tasks"` has words between them
24
-
-`"new task"` fails for the same reason (plural `"tasks"` with preceding adjective)
25
-
- No fuzzy matching, stemming, or word-distance tolerance
25
+
**Resolved (PR #579):** The classifier now matches "can you create new onboarding tasks for people who aren't technical?" as `card.create`. Null input returns `(false, null)` instead of throwing. Substring ordering bug fixed ("remove card" correctly classified as `card.archive`). 86 unit tests cover all patterns. Redundant `Contains()` fallbacks removed.
- "how do I create a card in Jira?" (asking about a different tool) → triggers `card.create`
76
76
- "don't create task yet, just tell me about it" → triggers `card.create`
77
77
78
-
### B2. Substring ordering bug
79
-
- "remove card abc123" → classified as `card.move` (not `card.archive`) because `"move card"` is a substring of `"remove card"` and is checked first in the classifier's if-chain. This is a priority-ordering bug where `remove` is misclassified.
78
+
### B2. Substring ordering bug — ✓ RESOLVED
79
+
-~~"remove card abc123" → classified as `card.move` (not `card.archive`)~~ Fixed in PR #579 — archive patterns now checked before move patterns; "remove card" correctly classifies as `card.archive`.
80
80
81
81
### C. Parser failures on detected intent
82
82
- Even when intent IS detected, parser fails unless user writes exact syntax
@@ -95,13 +95,13 @@ User message (natural language)
2.**Better error UX** — When parsing fails, show the user the expected format with examples and offer a "rephrase as command" helper in the UI rather than a raw error dump.
102
+
2.**✓ Better error UX (PR #582)** — Shipped: structured `[PARSE_HINT]` JSON payloads with `supportedPatterns` array, `closestPattern`, `exampleInstruction`, and `detectedIntent`; frontend hint card with "try this instead" button that pre-fills chat input; collapsible pattern list.
103
103
104
-
3.**Frontend instruction builder** — Add a structured command palette / form UI that lets users build instructions visually instead of typing natural language.
104
+
3.**Frontend instruction builder** — Not yet started. Add a structured command palette / form UI that lets users build instructions visually instead of typing natural language.
105
105
106
106
### Tier 2: LLM-assisted instruction extraction
107
107
@@ -119,10 +119,10 @@ User message (natural language)
119
119
120
120
9.**Board-context-aware prompting** — Include current board columns, card titles, and labels in the LLM system prompt so it can generate contextually valid instructions (e.g., knowing which columns exist).
121
121
122
-
## Testing Considerations
122
+
## Testing Considerations — ✓ SHIPPED (PR #580)
123
123
124
-
-**Unit tests for classifier edge cases** — Document current gaps with failing-by-design tests that prove the classifier misses natural language
125
-
-**Integration tests for ChatService proposal flow** — Verify the end-to-end path from natural language → classifier → parser → error/success
124
+
-**✓ Unit tests for classifier edge cases** — 86 tests in `LlmIntentClassifierTests.cs` covering all current patterns, natural language, negation, other-tool questions, edge cases (null, empty, long input, special chars)
125
+
-**✓ Integration tests for ChatService proposal flow** — 28 tests in `ChatServiceTests.cs` covering structured syntax → proposal creation, natural language → classifier miss, explicit request → parser failure, and graceful error paths
126
126
-**Mock provider is sufficient for most tests** — The classifier behavior is identical across all providers
127
127
-**Live LLM tests would require** provider configuration and are better suited for manual/E2E testing with `TASKDECK_LLM_PROVIDER=OpenAI` env var
Copy file name to clipboardExpand all lines: docs/analysis/2026-03-29_manual_testing_consolidated_findings.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -203,15 +203,17 @@ The most critical findings are:
203
203
204
204
---
205
205
206
-
### OBS-2: Activity view defaults to an archived board
206
+
### OBS-2: Activity view defaults to an archived board ✓ RESOLVED
207
207
**Surface:**`/workspace/activity`
208
208
**Detail:** The board dropdown pre-selects the first board alphabetically, which may be "calendar (Archived)". This shows "No board activity yet" as the cold state. Default should be the most-recently-active non-archived board.
209
+
**Resolution:** Fixed in PR #581 — board selector now sorts non-archived first, then by most-recently-updated descending.
209
210
210
211
---
211
212
212
-
### OBS-3: Activity view shows no history for boards with real mutations
213
+
### OBS-3: Activity view shows no history for boards with real mutations ✓ RESOLVED
213
214
**Surface:**`/workspace/activity`
214
215
**Detail:** Boards with confirmed mutations (column creates, card adds/moves/edits, label assigns) show "No board activity yet". Either audit events aren't being recorded for these operations, or the board-history fetch has a scoping bug.
216
+
**Resolution:** Fixed in PR #581 — audit logging wired for all board/card/column/label mutations via `IHistoryService.LogActionAsync` with `SafeLogAsync` resilience wrapper to prevent audit failures from crashing mutations.
215
217
216
218
---
217
219
@@ -350,7 +352,7 @@ On `mousedown` on the drag handle, call `window.getSelection()?.removeAllRanges(
350
352
351
353
**BUG-M4 (chat markdown):** Add a markdown-to-HTML renderer (e.g., `marked` or `markdown-it`) to the chat message renderer component.
352
354
353
-
**BUG-M5 (archive 30s freeze):**Profile the archive action. Likely a synchronous DOM mutation or an un-awaited SignalR broadcast. Move archive board response handling to async/non-blocking.
355
+
**BUG-M5 (archive 30s freeze):**✓ RESOLVED in PR #578. Root cause: sequential reactive mutations in `deleteBoard()` while BoardView was still mounted caused cascading Vue reactive flushes. Fix: navigate away before clearing state, reorder mutations in `boardCrudStore`, add `finally` block for loading state reset.
354
356
355
357
**BUG-M6 (no restore toast):** Add a success toast in the archive store's `restoreBoard` action, matching the pattern used in `createBoard`.
356
358
@@ -359,8 +361,8 @@ On `mousedown` on the drag handle, call `window.getSelection()?.removeAllRanges(
359
361
- BUG-L1: Trim card title before toast interpolation.
360
362
- BUG-L2: Hide "DRAG CARD" text on non-hover; show only on `:hover` of the drag handle zone.
361
363
- OBS-1: Show label color swatch in card modal label picker.
362
-
- OBS-2: Default activity view to most-recently-active non-archived board.
0 commit comments