Fix capture triage: dash context hint and delimiter separation#643
Fix capture triage: dash context hint and delimiter separation#643Chris0Jeky merged 3 commits intomainfrom
Conversation
For dash-separated text like "ACME Ltd - task1 - task2 - task3", the first segment now becomes a context hint stored in each task's evidence/description field rather than being created as a standalone task. Semicolons and structured patterns are unchanged. Requires 3+ dash segments to trigger context-hint behavior (2 segments fall through to single-sentence fallback). Fixes #614
…ority Update existing dash-separated test to expect 3 tasks (not 4) with context hint in evidence. Add new tests: context hint appears in description JSON, two-segment dash input falls to single-card fallback, structured bullet lines take priority over inline dashes.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Adversarial Self-ReviewChanges reviewed
Edge cases examined
Potential concerns
VerdictNo blocking issues found. The behavioral change is intentional and aligns with #614 requirements. |
There was a problem hiding this comment.
Code Review
This pull request introduces a context-hinting mechanism for task extraction by refactoring the delimiter logic. The ExtractTaskCandidates method now returns a tuple containing task candidates and an optional context hint, which is utilized in the BuildOutputModel to enrich task evidence. The dash-separated parsing logic was specifically updated to treat the first segment as a context hint when three or more segments are present. Feedback was provided to extract a helper method for splitting and filtering text segments to reduce code duplication between the dash and semicolon parsing paths.
| var dashSegments = DashDelimiterPattern.Split(rawText) | ||
| .Select(s => s.Trim()) | ||
| .Where(s => !string.IsNullOrWhiteSpace(s)) | ||
| .ToList(); | ||
|
|
||
| if (delimiterSegments.Count >= 2) | ||
| if (dashSegments.Count >= 3) | ||
| { | ||
| foreach (var segment in delimiterSegments) | ||
| var contextHint = NormalizeTaskTitle(dashSegments[0]); | ||
| foreach (var segment in dashSegments.Skip(1)) | ||
| { | ||
| var normalized = NormalizeTaskTitle(segment); | ||
| if (!string.IsNullOrWhiteSpace(normalized) && seen.Add(normalized)) | ||
| { | ||
| candidates.Add(normalized); | ||
| if (candidates.Count >= MaxExtractedTasks) | ||
| { | ||
| return candidates; | ||
| return (candidates, contextHint); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (candidates.Count > 0) | ||
| { | ||
| return candidates; | ||
| return (candidates, contextHint); | ||
| } | ||
| } | ||
|
|
||
| // Try semicolons: all segments are equal tasks | ||
| var semicolonSegments = SemicolonDelimiterPattern.Split(rawText) | ||
| .Select(s => s.Trim()) | ||
| .Where(s => !string.IsNullOrWhiteSpace(s)) | ||
| .ToList(); |
There was a problem hiding this comment.
There's an opportunity to reduce code duplication. The logic for splitting the raw text by a delimiter, trimming, and filtering empty segments is repeated for both dash and semicolon patterns.
Consider extracting this logic into a private helper method to improve maintainability. For example:
private static List<string> GetSegments(string text, Regex pattern)
{
return pattern.Split(text)
.Select(s => s.Trim())
.Where(s => !string.IsNullOrWhiteSpace(s))
.ToList();
}You could then use this helper in both places:
var dashSegments = GetSegments(rawText, DashDelimiterPattern);
// ...
var semicolonSegments = GetSegments(rawText, SemicolonDelimiterPattern);
Adversarial Code Review -- PR #643Files reviewed
Critical (90-100)1. Stale The diff replaces 2. The regex However, this will still falsely split on legitimate text containing
The Recommendation: Consider adding a minimum segment length check (e.g., each segment must be >= 3 words) to reduce false positives on prose. Alternatively, document this known limitation. 3. The old This means File: Important (80-89)4. Mixed delimiter input falls through to semicolons after dash check fails -- potentially surprising behavior (Confidence: 85) Input like This is a valid edge case where a user mixes delimiters. The semicolons win but the dash becomes part of the first task title. No context hint is extracted. This is arguably correct but produces a messy first task title. Recommendation: Add a test for mixed-delimiter input to document the expected behavior. 5. Evidence clamping uses In This is a minor cosmetic issue but the evidence would end with a bare colon. File: 6.
This means the context hint silently loses Recommendation: This is pre-existing behavior in 7. No test for semicolon-only splitting with the new separate pattern (Confidence: 80) The existing semicolon test ( While the existing test will still pass, there is no test that verifies:
Recommendation: Add edge case tests for the new SummaryThe core design (separating dash/semicolon handling, context-hint-from-first-segment, Key items to address before merge:
|
Change SemicolonDelimiterPattern from ;\s* to ;\s+ to avoid splitting on semicolons without trailing space (e.g., URLs, code snippets). This preserves the original behavioral contract.
Summary
Closes #614
Test plan