Skip to content

Improve LlmIntentClassifier NLP coverage#579

Merged
Chris0Jeky merged 6 commits intomainfrom
enhance/571-intent-classifier-nlp
Mar 29, 2026
Merged

Improve LlmIntentClassifier NLP coverage#579
Chris0Jeky merged 6 commits intomainfrom
enhance/571-intent-classifier-nlp

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • Fixes Improve LlmIntentClassifier coverage for natural language phrasing #571
  • Replace brittle exact-substring matching with compiled regex patterns supporting word-distance gaps (up to 5 intervening words), plural forms (cards?, tasks?, items?), and broader verb coverage (generate, build, prepare, set up)
  • Add negative context filtering: negations (don't create task yet) and questions about other tools (how do I create a card in Jira?) are suppressed
  • Fix ordering bug where remove card was misclassified as card.move due to substring overlap
  • All regex patterns use RegexOptions.Compiled and a 100ms timeout to prevent catastrophic backtracking
  • Backward compatible: all original exact-match patterns still work

Test plan

  • "can you create new onboarding tasks for non-technical people?" classifies as card.create
  • "I need three new cards for the sprint" classifies as card.create
  • "how do I create a card in Jira?" does NOT classify as actionable
  • "don't create task yet, just explain" does NOT classify as actionable
  • "remove card abc123" now correctly classifies as card.archive (was card.move)
  • All 14 existing exact-match card.create patterns still pass
  • All existing board, reorder, update, move, archive patterns still pass
  • 86 unit tests covering exact patterns, NLP patterns, negations, other-tool questions, plurals, word-distance gaps, broader verbs, case insensitivity, edge cases
  • Full backend test suite passes (all projects)

…entClassifier

Replace brittle exact-substring matching with compiled regex patterns that
support word-distance gaps, plural forms, and broader verb coverage. Add
negative context filtering for negations and questions about other tools.
Fix "remove card" being matched as "move card" by reordering checks.
Cover natural language phrasing, plural forms, broader verbs, word-distance
gaps, negation filtering, other-tool question filtering, edge cases, and
backward compatibility with all existing exact-match patterns.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Remove overly broad `move.*to` pattern from CardMovePattern to prevent
matching "move on to the next topic". Replace unbounded `.*` in
NegationPattern with bounded word-distance to avoid suppressing intents
in compound sentences like "stop the sprint and then create cards".
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the LlmIntentClassifier by transitioning from basic substring matching to a more robust Regex-based approach. Key enhancements include the ability to detect natural language patterns, handle negations, and filter out questions regarding external tools. It also resolves a classification bug by reordering intent checks. The review feedback suggests simplifying the new helper methods by removing redundant substring checks that are now encompassed by the regex patterns.

Comment on lines +131 to +157
private static bool MatchesCardCreate(string lower)
{
// Exact substring matches (backward compatible)
if (lower.Contains("create card") || lower.Contains("add card")
|| lower.Contains("create a card") || lower.Contains("add a card")
|| lower.Contains("create task") || lower.Contains("add task")
|| lower.Contains("create a task") || lower.Contains("add a task")
|| lower.Contains("new card") || lower.Contains("new task")
|| lower.Contains("make a card") || lower.Contains("make a task")
|| lower.Contains("make card") || lower.Contains("make task"))
return (true, "card.create");
return true;

if (lower.Contains("move card"))
return (true, "card.move");
// Regex-based natural language matching
try
{
if (CardCreatePattern.IsMatch(lower))
return true;
if (NewCardPattern.IsMatch(lower))
return true;
}
catch (RegexMatchTimeoutException)
{
// Fall through — don't match on timeout
}

return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method, and the other Matches... methods that follow, can be significantly simplified.

  1. The initial if block with lower.Contains(...) is now redundant. The new regular expressions are supersets that cover these exact-match cases while also providing broader NLP capabilities. Removing these checks simplifies the code without losing backward compatibility.

  2. The try-catch logic can be made more direct by combining the regex checks and returning false from within the catch block.

This refactoring makes the intent clearer and reduces code duplication across all the Matches... methods. The same pattern can be applied to MatchesCardMove, MatchesCardArchive, etc.

Here is a suggested simplification for this method:

    private static bool MatchesCardCreate(string lower)
    {
        try
        {
            // The regex patterns cover both the old exact matches and new NLP variations.
            return CardCreatePattern.IsMatch(lower) || NewCardPattern.IsMatch(lower);
        }
        catch (RegexMatchTimeoutException)
        {
            // On timeout, treat as a non-match for safety.
            return false;
        }
    }

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Self-Review Findings

Regex Performance

  • All patterns use bounded repetition (\s+\w+){0,N} (max 4-6 words) — no unbounded .* in hot paths
  • All regex instances use RegexOptions.Compiled and a 100ms TimeSpan timeout
  • RegexMatchTimeoutException is caught at every call site, falling through gracefully
  • Tested with 200-word input to verify no catastrophic backtracking — completes without hanging
  • Fixed in commit 3: Removed the \bmove\b.*\bto\b alternation from CardMovePattern (unbounded .*) and replaced unbounded .* in NegationPattern with (\s+\w+){0,6}

False Positive Analysis

  • "I deleted the create card button by accident": Still matches as card.archive because "delete" is adjacent to "card". This is a known limitation — detecting past tense / UI references would require POS tagging, which is out of scope for a keyword classifier. Acceptable trade-off.
  • "move on to the next topic": Previously would have matched card.move via the move.*to pattern. Fixed by removing that alternation.
  • Other-tool suppression: Only triggers when BOTH a question pattern (how/what/where...?) AND an other-tool name are present. Commands mentioning other tools (e.g., "create a card like in Jira") correctly pass through. Tested.
  • Negation suppression: Bounded to 6 words between negation and verb, preventing false suppression in compound sentences.

False Negative Analysis

  • "please add these items: ...": Now matches via add ... items regex pattern
  • "can you create new onboarding tasks...": Now matches — "create" + words + "tasks"
  • Remaining gap: Extremely indirect phrasing like "I could use some help organizing my sprint" (no verb+noun pattern) won't match. This is by design — the classifier is intentionally conservative to avoid false positives.

Breaking Changes

  • "remove card" now correctly classifies as card.archive instead of card.move. This fixes the documented ordering bug. Test updated to reflect correct behavior.
  • All 14 original exact-match card.create patterns still pass unchanged.
  • All other existing patterns (move, archive, update, board, reorder) still pass.

Test Coverage

  • 86 tests total covering: exact patterns (14), natural language (14), negation (6), other-tool questions (3), other-tool commands (2), non-actionable (7), edge cases (3), case insensitivity (5), plural forms (6), broader verbs (4), word-distance (5), card operations (7), board operations (6), reorder (4)
  • Edge cases include: empty string, whitespace-only, very long input, mixed intents

Remaining Risk

  • The items? noun in create patterns could match overly broad phrases like "add these items to my shopping list" — low risk in a Taskdeck chat context where all input is board-related
  • No coverage for multi-line input (messages with newlines) — regex patterns work per-line by default, which is fine for chat messages

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Review — PR #579

Critical

None found.

Major

M1. Negation test "stop creating cards" passes for the wrong reason

The test at line 175 asserts this returns non-actionable and attributes it to negation. But NegationPattern requires \b(create|add|...)\b — the word boundary after "create" fails on "creating" (the "i" is a word char). The test actually passes because no positive pattern matches either ("creating" also fails \bcreate\b in CardCreatePattern). This means:

  • If someone later adds gerund support to positive patterns, this test will break and the negation won't save it.
  • The test gives false confidence that negation handles "stop + gerund" forms.

Recommendation: Either (a) add gerund forms to NegationPattern group 3 (e.g., creating|adding|...), or (b) change the test to document that it's non-actionable because neither negation nor positive patterns match, not because negation works.

M2. Double negatives bypass negation → false positive

"don't not create a task" — the NegationPattern matches "don't" + "not" (1 word) + "create", so it's suppressed. But semantically, "don't not create" means "do create." Similarly, "I can't avoid creating a task" — "avoid" triggers negation but the actual intent is to create. These are admittedly rare in chat, but worth documenting as a known limitation at minimum.

M3. "I deleted the create card button by accident" is a false positive

This input contains the literal substring "create card" so MatchesCardCreate returns card.create. The self-review acknowledges this (calling it a "past tense / UI reference" limitation) but attributes it to card.archive — it's actually card.create because archive checks \bdelete\b which doesn't match "deleted", and the exact substring "create card" fires in the create path. No test covers this case. The old test suite had this as a documented edge case but the PR removed it without replacement.

Recommendation: Add a test documenting this known false positive, even if you choose not to fix it.

Minor

m1. QuestionAboutHowPattern uses unbounded .*

Pattern: ^\s*(how|what|where|when|why|can\s+i|is\s+it)\b.*\?$

The .* is technically safe here because: (a) it only fires when OtherToolPattern already matched, (b) on inputs ending with ? the backtracking is O(1), and (c) the 100ms timeout is the backstop. But it's inconsistent with the disciplined bounded-repetition approach used everywhere else. Consider [^?]* or [\s\S]{0,500} for consistency.

m2. NegationPattern does not cover "no" as a negation word

"no need to create a task" would match CardCreatePattern (create + gap + task) and fire as card.create. The word "no" is missing from the negation group. Similarly, "not" alone (without "do") is missing: "not creating tasks right now" — though this case is saved by the gerund not matching.

m3. Ordering sensitivity with "update board" and BoardRenamePattern

"update board settings" matches BoardRenamePattern (\b(rename|update|edit)\b ... \bboards?\b) and returns board.update. But "update the board's card list" could also match — is "update board" always a rename? The semantic mapping of update → board.update (which the test calls "rename") is potentially confusing. Not a bug, but the intent label could mislead consumers.

m4. No test for regex special characters in input

Input like "create card (urgent) [P0]" or "add task: fix the $PATH issue" — the substring checks would still work fine and fire before regex, but if someone crafts input with regex metacharacters that bypass substring matching and hit the regex path, e.g., "create a card" (multiple spaces), the \s+\w+ groups would handle it correctly. But "create\tcard" (tab character) — ToLowerInvariant() preserves tabs, \s+ matches tabs, so this works. Low risk but worth a test for documentation.

m5. Compiled regex count (10 patterns) — static memory cost

10 compiled regex patterns are held in static fields. Each compiled regex allocates IL code. This is fine for a singleton classifier but worth noting — if this pattern proliferates to other classifiers, consider RegexOptions.NonBacktracking (.NET 7+) as an alternative that avoids the compilation cost and eliminates backtracking risk entirely.

Nits

n1. The Classify_NullInput_ShouldReturnFalse test name says "null input" but actually tests whitespace. Classify accepts string (non-nullable), so true null would be a compiler warning. The test name is misleading.

n2. Classify_VeryLongInput_ShouldNotHang constructs a 200-word input but doesn't assert timing. A Stopwatch assertion (e.g., < 500ms) would make this test actually verify the performance claim rather than just "it didn't crash."

n3. The comment on line 38 of the classifier says "or move + card/task context + to" but that alternation was removed. Stale comment.

Overall Assessment

Pass with fixes. The implementation is solid overall — bounded quantifiers, compiled regexes with timeouts, timeout catch blocks, backward-compatible exact-match fast paths, and sensible ordering. The self-review was thorough and caught real issues.

The Major items are primarily about test accuracy (M1, M3) and documentation of known limitations (M2). M1 is the most concerning because it represents a false sense of security in the negation logic — the test passes by coincidence rather than by the mechanism it claims to test. M3 is a regression in test coverage from the old suite.

Recommended before merge:

  1. Fix or re-document the "stop creating cards" test (M1)
  2. Add a test for "I deleted the create card button by accident" documenting the false positive (M3)
  3. Fix stale comment (n3)

…x stale comment

- Split "stop creating cards" out of negation tests into its own test that
  documents the real reason it's non-actionable (gerund form invisible to
  both negation and positive patterns, not negation suppression)
- Add test for known false positive: "I deleted the create card button"
  matches card.create due to literal substring "create card"
- Fix stale comment on CardMovePattern that referenced removed alternation
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Follow-up: Fixes pushed for review findings

Commit 27aed019 addresses the three actionable findings from the adversarial review:

M1 — Gerund test accuracy

Moved "stop creating cards" out of the negation test group into a standalone test Classify_GerundForm_IsNonActionable_ButNotDueToNegation that documents the real mechanism: gerund forms like "creating" don't match \bcreate\b in either the negation or positive patterns. The test assertion message now explains this clearly.

M3 — Missing false-positive documentation

Added Classify_PastTenseNarrative_IsKnownFalsePositive test for "I deleted the create card button by accident" — documents that it matches card.create because the literal substring "create card" is present. Asserts the current (incorrect but known) behavior.

n3 — Stale comment

Fixed the CardMovePattern comment that still referenced the removed move.*to alternation.

All 87 tests pass. No behavior changes — only test accuracy and documentation improvements.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Addressed Gemini review feedback: removed redundant Contains() substring checks from all Matches... methods in LlmIntentClassifier.cs. The regex patterns already cover those exact-match cases. All 1627 backend tests pass.

@Chris0Jeky Chris0Jeky merged commit 8037d5d into main Mar 29, 2026
8 checks passed
@Chris0Jeky Chris0Jeky deleted the enhance/571-intent-classifier-nlp branch March 29, 2026 22:23
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 29, 2026
Chris0Jeky added a commit that referenced this pull request Mar 29, 2026
Update two analysis docs (chat-to-proposal gap and manual testing findings) to reflect recent fixes and testing status. Key changes: add Last Updated and status notes; mark Tier 1 improvements shipped (intent classifier regex/stemming/negation fixes, substring ordering bug, PR #579), UX parse hints shipped (PR #582), unit/integration tests shipped (PR #580), and note PR range #578#582. In manual testing findings mark OBS-2/OBS-3 resolved (PR #581) and BUG-M5 resolved (PR #578), update resolutions and remove duplicate checklist items. Minor editorial clarifications and test counts added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Improve LlmIntentClassifier coverage for natural language phrasing

1 participant