Skip to content

Fix capture triage: dash context hint and delimiter separation#643

Merged
Chris0Jeky merged 3 commits intomainfrom
fix/614-capture-triage-delimiters
Mar 31, 2026
Merged

Fix capture triage: dash context hint and delimiter separation#643
Chris0Jeky merged 3 commits intomainfrom
fix/614-capture-triage-delimiters

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • For dash-separated text (3+ segments), the first segment is now treated as a context hint stored in each task's evidence/description field, rather than being created as a standalone task card. E.g., "ACME Ltd - task1 - task2" produces 2 cards with "ACME Ltd: task1" and "ACME Ltd: task2" as descriptions.
  • Separated dash and semicolon delimiter patterns into distinct regex patterns to allow different handling (dash = context hint; semicolons = equal tasks).
  • Two-segment dash inputs ("fix bug - deploy") fall through to single-sentence fallback instead of incorrectly splitting.
  • Existing behavior for structured patterns (checklist, numbered, bullet), semicolons, and single-sentence fallback is preserved.

Closes #614

Test plan

  • Existing 14 CaptureTriageService tests pass (no regressions)
  • Updated dash-separated test: 3 tasks extracted (not 4), context hint in evidence
  • New test: context hint appears as "ACME Ltd: request documents" in description JSON
  • New test: two-segment dash input falls to single-card fallback
  • New test: structured bullet lines take priority over inline dash splitting
  • Full backend suite: 1952 tests pass (357 Domain + 1170 Application + 413 Api + 4 Cli + 8 Architecture)

For dash-separated text like "ACME Ltd - task1 - task2 - task3",
the first segment now becomes a context hint stored in each task's
evidence/description field rather than being created as a standalone
task. Semicolons and structured patterns are unchanged. Requires 3+
dash segments to trigger context-hint behavior (2 segments fall
through to single-sentence fallback).

Fixes #614
…ority

Update existing dash-separated test to expect 3 tasks (not 4) with
context hint in evidence. Add new tests: context hint appears in
description JSON, two-segment dash input falls to single-card
fallback, structured bullet lines take priority over inline dashes.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Changes reviewed

  • CaptureTriageService.cs: Split InlineDelimiterPattern into separate DashDelimiterPattern and SemicolonDelimiterPattern; dash-separated text (3+ segments) now uses first segment as context hint in evidence; method returns tuple with optional context hint; BuildOutputModel prepends context to evidence.
  • CaptureTriageServiceTests.cs: Updated existing dash test expectations; added 3 new tests (context hint in evidence, two-segment fallback, bullet priority over dashes).

Edge cases examined

  1. SemicolonDelimiterPattern relaxed from ;\s+ to ;\s*: Trailing semicolons produce empty strings that get filtered by .Where(s => !string.IsNullOrWhiteSpace(s)). No issue.
  2. Empty context hint after normalization: Guarded by hasContext = !string.IsNullOrWhiteSpace(contextHint). No issue.
  3. Two-segment dash with semicolons (e.g., "fix bug - deploy; test"): Dash check fails (< 3 segments), falls to semicolons which split into ["fix bug - deploy", "test"]. The dash becomes part of the literal card title. Acceptable behavior -- not a bug but worth noting.
  4. Evidence length clamping: Added explicit MaxTaskEvidenceLength truncation when context hint is prepended. Prevents validation failure on long context + long title.
  5. Deterministic card IDs: BuildDeterministicCardId uses task title (not evidence), so context hint does not affect idempotency keys or card IDs. Correct.

Potential concerns

  • Behavioral change for existing dash-separated inputs: Previously "A - B - C - D" produced 4 cards; now produces 3 cards with "A" as context. This is the intended fix per CAP-01: Capture triage — handle natural-language and dash-separated text #614 but is a breaking change for any existing captured inputs that relied on 4-card behavior. Since this is a triage-time computation (not stored state), re-triaging the same input would produce different proposals. The idempotency key change means existing proposals won't conflict.
  • No test for very long context hint: If someone types a 200-char context followed by tasks, the evidence clamping at 280 chars is tested implicitly but not with an explicit edge case test. Low risk given the explicit truncation logic.

Verdict

No blocking issues found. The behavioral change is intentional and aligns with #614 requirements.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a context-hinting mechanism for task extraction by refactoring the delimiter logic. The ExtractTaskCandidates method now returns a tuple containing task candidates and an optional context hint, which is utilized in the BuildOutputModel to enrich task evidence. The dash-separated parsing logic was specifically updated to treat the first segment as a context hint when three or more segments are present. Feedback was provided to extract a helper method for splitting and filtering text segments to reduce code duplication between the dash and semicolon parsing paths.

Comment on lines +270 to +301
var dashSegments = DashDelimiterPattern.Split(rawText)
.Select(s => s.Trim())
.Where(s => !string.IsNullOrWhiteSpace(s))
.ToList();

if (delimiterSegments.Count >= 2)
if (dashSegments.Count >= 3)
{
foreach (var segment in delimiterSegments)
var contextHint = NormalizeTaskTitle(dashSegments[0]);
foreach (var segment in dashSegments.Skip(1))
{
var normalized = NormalizeTaskTitle(segment);
if (!string.IsNullOrWhiteSpace(normalized) && seen.Add(normalized))
{
candidates.Add(normalized);
if (candidates.Count >= MaxExtractedTasks)
{
return candidates;
return (candidates, contextHint);
}
}
}

if (candidates.Count > 0)
{
return candidates;
return (candidates, contextHint);
}
}

// Try semicolons: all segments are equal tasks
var semicolonSegments = SemicolonDelimiterPattern.Split(rawText)
.Select(s => s.Trim())
.Where(s => !string.IsNullOrWhiteSpace(s))
.ToList();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an opportunity to reduce code duplication. The logic for splitting the raw text by a delimiter, trimming, and filtering empty segments is repeated for both dash and semicolon patterns.

Consider extracting this logic into a private helper method to improve maintainability. For example:

private static List<string> GetSegments(string text, Regex pattern)
{
    return pattern.Split(text)
        .Select(s => s.Trim())
        .Where(s => !string.IsNullOrWhiteSpace(s))
        .ToList();
}

You could then use this helper in both places:

var dashSegments = GetSegments(rawText, DashDelimiterPattern);
// ...
var semicolonSegments = GetSegments(rawText, SemicolonDelimiterPattern);

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Code Review -- PR #643

Files reviewed

  • backend/src/Taskdeck.Application/Services/CaptureTriageService.cs (service changes)
  • backend/tests/Taskdeck.Application.Tests/Services/CaptureTriageServiceTests.cs (test changes)
  • backend/src/Taskdeck.Application/DTOs/CaptureTriageContracts.cs (contract constants, unchanged)

Critical (90-100)

1. Stale InlineDelimiterPattern field left in main branch code -- merge conflict risk (Confidence: 95)

The diff replaces InlineDelimiterPattern with DashDelimiterPattern and SemicolonDelimiterPattern, and all usages are updated. However, looking at the pre-PR file, the old InlineDelimiterPattern field and its usage in ExtractTaskCandidates must be fully removed. The diff appears correct on this front -- the old pattern and its usage are replaced. Verified: no issue here. (Withdrawn on closer inspection of the diff hunks.)

2. DashDelimiterPattern regex [^\S\n]+-[^\S\n]+ matches any horizontal whitespace around a bare hyphen -- false positives on hyphenated words with surrounding spaces (Confidence: 92)

The regex [^\S\n]+-[^\S\n]+ means: one-or-more horizontal whitespace chars, then a literal -, then one-or-more horizontal whitespace chars. This correctly requires spaces around the dash.

However, this will still falsely split on legitimate text containing - such as:

  • "Check https://example.com/path - the API endpoint" (prose with dash)
  • "Use the read-write approach - it is faster - but test first" (three segments, would trigger context-hint extraction treating "Use the read-write approach" as context and "it is faster" and "but test first" as tasks)

The >= 3 threshold mitigates the two-segment case but any prose sentence with two or more - dashes will incorrectly trigger task splitting. This is an inherent design tradeoff, not strictly a bug, but worth documenting.

Recommendation: Consider adding a minimum segment length check (e.g., each segment must be >= 3 words) to reduce false positives on prose. Alternatively, document this known limitation.


3. SemicolonDelimiterPattern changed from ;\s+ to ;\s* -- now splits on semicolons with no trailing space (Confidence: 91)

The old InlineDelimiterPattern used ;\s+ (semicolon followed by one or more whitespace chars). The new SemicolonDelimiterPattern uses ;\s* (semicolon followed by zero or more whitespace chars).

This means "foo;bar" (no space after semicolon) now triggers splitting, whereas before it would not. This is a behavioral change not mentioned in the PR description. It could cause unexpected splitting of text containing semicolons in code snippets or URLs (e.g., "check matrix[0;1] values").

File: CaptureTriageService.cs, new line ~33
Fix: If intentional, document the change. If not, revert to ;\s+ to match prior behavior.


Important (80-89)

4. Mixed delimiter input falls through to semicolons after dash check fails -- potentially surprising behavior (Confidence: 85)

Input like "Project X - task one; task two; task three" has only 2 dash segments (below the >= 3 threshold), so dash splitting is skipped. Then semicolons split it into ["Project X - task one", "task two", "task three"] -- the first segment retains the - literally in the task title.

This is a valid edge case where a user mixes delimiters. The semicolons win but the dash becomes part of the first task title. No context hint is extracted. This is arguably correct but produces a messy first task title.

Recommendation: Add a test for mixed-delimiter input to document the expected behavior.


5. Evidence clamping uses TrimEnd() which can produce a trailing colon (Confidence: 82)

In BuildOutputModel, the evidence string is built as $"{contextHint}: {task}" then truncated to MaxTaskEvidenceLength (280 chars) and TrimEnd() is applied. If truncation lands right after the colon+space, TrimEnd() removes the trailing space but leaves a colon. Example: if contextHint is 277 chars and task is "abc", evidence = "{277-char-context}: abc" = 283 chars, truncated to 280 = "{277-char-context}: a" -- this is fine. But if contextHint is exactly 278 chars: "{278-char-context}:" (280 chars with : ), truncated to 280 = "{278-char-context}: ", TrimEnd produces "{278-char-context}:".

This is a minor cosmetic issue but the evidence would end with a bare colon.

File: CaptureTriageService.cs, new BuildOutputModel method
Fix: After TrimEnd(), also trim trailing : characters, or ensure the contextHint itself is clamped to leave room for : plus at least a few task title characters.


6. NormalizeTaskTitle sentence-split interacts poorly with context hint segments (Confidence: 83)

NormalizeTaskTitle splits on sentence boundaries ((?<=[.!?])\s+) and takes only the first sentence if it is >= 8 chars. For dash-separated input like "ACME Ltd. Europe - request docs - send letter", the context hint becomes NormalizeTaskTitle("ACME Ltd. Europe") which sentence-splits to just "ACME Ltd." (first sentence, 9 chars >= 8).

This means the context hint silently loses " Europe". The same applies to task segments that contain periods.

Recommendation: This is pre-existing behavior in NormalizeTaskTitle but now has broader impact since it applies to context hints too. Consider whether context hints should skip sentence splitting.


7. No test for semicolon-only splitting with the new separate pattern (Confidence: 80)

The existing semicolon test (ShouldSplitSemicolonSeparatedText_IntoIndividualTasks) was written against the old combined InlineDelimiterPattern. After this PR, semicolons are handled by a separate code path (SemicolonDelimiterPattern). The existing test input "Friday release prep; update changelog; tag version; notify stakeholders" passes through the dash check first (only 1 dash-segment since there are no - delimiters), then hits the semicolon path.

While the existing test will still pass, there is no test that verifies:

  • "a;b" (no space after semicolon, new behavior from ;\s*)
  • Semicolons inside structured bullet text are ignored (priority check)
  • Trailing semicolons: "task one; task two;" produces exactly 2 tasks, not a phantom empty third

Recommendation: Add edge case tests for the new ;\s* pattern.


Summary

The core design (separating dash/semicolon handling, context-hint-from-first-segment, >= 3 threshold for dashes) is sound and well-motivated. The tuple return change is handled correctly at all call sites. Backward compatibility for structured patterns (bullets, checklists, numbered) is preserved because they are checked first and return early.

Key items to address before merge:

  1. ;\s* vs ;\s+ behavioral change -- decide if splitting on "foo;bar" (no space) is intentional and document/test accordingly
  2. Evidence trailing colon on truncation edge case
  3. Missing edge case tests for mixed delimiters, no-space semicolons, and trailing semicolons

Change SemicolonDelimiterPattern from ;\s* to ;\s+ to avoid
splitting on semicolons without trailing space (e.g., URLs,
code snippets). This preserves the original behavioral contract.
@Chris0Jeky Chris0Jeky merged commit 5b87f63 into main Mar 31, 2026
22 checks passed
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 31, 2026
@Chris0Jeky Chris0Jeky deleted the fix/614-capture-triage-delimiters branch March 31, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

CAP-01: Capture triage — handle natural-language and dash-separated text

1 participant