fix: add CJK false positive filtering for question detection#19
Closed
yulin0629 wants to merge 1 commit intoBayramAnnakov:mainfrom
Closed
fix: add CJK false positive filtering for question detection#19yulin0629 wants to merge 1 commit intoBayramAnnakov:mainfrom
yulin0629 wants to merge 1 commit intoBayramAnnakov:mainfrom
Conversation
The existing `\?$` pattern only matches ASCII question marks, causing CJK questions (e.g. "這是什麼?", "何ですか") to bypass the false positive filter and be incorrectly queued as corrections. Changes: - Support full-width question mark (?, U+FF1F) - Add CJK question particles: 嗎/吗 (zh), 呢 (zh), か (ja), 까 (ko) - Reject very short messages (<=4 chars) like "OK", "好" as non-actionable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
|
@claude pls review this PR |
4 tasks
3 tasks
shohu
added a commit
to shohu/claude-reflect
that referenced
this pull request
Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction detection — not just false positive filtering. Three areas of improvement: 1. False positive prevention (superset of BayramAnnakov#19): - Full-width question mark (?) and CJK question particles (嗎吗呢か까) - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII) - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.) that previously triggered CORRECTION_PATTERNS in mixed CJK-English text 2. CJK correction pattern detection (new): - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った - Chinese: 不是、错了/錯了、不要X要Y - Korean: 아니、틀렸 3. Test coverage: - 22 new tests in TestCJKPatternDetection class - Short message rejection, question particles, non-correction phrases - All CJK correction patterns with confidence assertions - English pattern regression tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shohu
added a commit
to shohu/claude-reflect
that referenced
this pull request
Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction detection — not just false positive filtering. Three areas of improvement: 1. False positive prevention (superset of BayramAnnakov#19): - Full-width question mark (?) and CJK question particles (嗎吗呢か까) - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII) - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.) that previously triggered CORRECTION_PATTERNS in mixed CJK-English text 2. CJK correction pattern detection (new): - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った - Chinese: 不是、错了/錯了、不要X要Y - Korean: 아니、틀렸 3. Test coverage: - 22 new tests in TestCJKPatternDetection class - Short message rejection, question particles, non-correction phrases - All CJK correction patterns with confidence assertions - English pattern regression tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shohu
added a commit
to shohu/claude-reflect
that referenced
this pull request
Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction detection — not just false positive filtering. Three areas of improvement: 1. False positive prevention (superset of BayramAnnakov#19): - Full-width question mark (?) and CJK question particles (嗎吗呢か까) - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII) - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.) that previously triggered CORRECTION_PATTERNS in mixed CJK-English text 2. CJK correction pattern detection (new): - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った - Chinese: 不是、错了/錯了、不要X要Y - Korean: 아니、틀렸 3. Test coverage: - 22 new tests in TestCJKPatternDetection class - Short message rejection, question particles, non-correction phrases - All CJK correction patterns with confidence assertions - English pattern regression tests
3 tasks
xkonjin
reviewed
Mar 8, 2026
xkonjin
left a comment
There was a problem hiding this comment.
Quick review pass:
- Main risk area here is UI state transitions, empty/error states, and interaction regressions.
- Good to see test coverage move with the code; I’d still make sure it exercises the unhappy path around UI state transitions, empty/error states, and interaction regressions rather than only the happy path.
- Before merge, I’d smoke-test the behavior touched by reflect_utils.py, test_reflect_utils.py with malformed input / retry / rollback cases, since that’s where this class of change usually breaks.
Owner
|
Thank you for identifying this CJK gap, @yulin0629! Your PR correctly spotted the full-width PR #24 by @shohu includes all three of your changes (full-width Closing in favor of #24 — your contribution helped kickstart this improvement and will be credited. |
BayramAnnakov
added a commit
that referenced
this pull request
Mar 16, 2026
Merging comprehensive CJK support. All 219 tests pass. Credit to @yulin0629 for identifying the initial CJK gap in PR #19.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
The existing
\?$pattern only matches ASCII question marks. CJK questions like "這是什麼?" or "何ですか" bypass the false positive filter and get incorrectly queued as corrections.Changes
scripts/lib/reflect_utils.py(+5 lines)FALSE_POSITIVE_PATTERNS:\?$→[?\uff1f]$+[嗎吗呢か까]$detect_patterns(): early return for messages <=4 charstests/test_reflect_utils.py(+18 lines)Test plan
?, CJK particles嗎呢か까, short messagesOK/好🤖 Generated with Claude Code