Skip to content

fix: add CJK false positive filtering for question detection#19

Closed
yulin0629 wants to merge 1 commit intoBayramAnnakov:mainfrom
yulin0629:fix/cjk-false-positive-filter
Closed

fix: add CJK false positive filtering for question detection#19
yulin0629 wants to merge 1 commit intoBayramAnnakov:mainfrom
yulin0629:fix/cjk-false-positive-filter

Conversation

@yulin0629
Copy link
Copy Markdown

Summary

  • Support full-width question mark (?, U+FF1F) in false positive filter
  • Add CJK question particles: 嗎/吗 (zh), 呢 (zh), か (ja), 까 (ko)
  • Reject very short messages (<=4 chars) like "OK", "好" as non-actionable

Problem

The existing \?$ pattern only matches ASCII question marks. CJK questions like "這是什麼?" or "何ですか" bypass the false positive filter and get incorrectly queued as corrections.

Changes

scripts/lib/reflect_utils.py (+5 lines)

  • FALSE_POSITIVE_PATTERNS: \?$[?\uff1f]$ + [嗎吗呢か까]$
  • detect_patterns(): early return for messages <=4 chars

tests/test_reflect_utils.py (+18 lines)

  • 3 new tests: short message rejection, CJK question particle, full-width question mark

Test plan

  • All 200 existing tests pass
  • New tests cover full-width , CJK particles 嗎呢か까, short messages OK/
  • No behavior change for existing English detection patterns

🤖 Generated with Claude Code

The existing `\?$` pattern only matches ASCII question marks, causing
CJK questions (e.g. "這是什麼?", "何ですか") to bypass the false
positive filter and be incorrectly queued as corrections.

Changes:
- Support full-width question mark (?, U+FF1F)
- Add CJK question particles: 嗎/吗 (zh), 呢 (zh), か (ja), 까 (ko)
- Reject very short messages (<=4 chars) like "OK", "好" as non-actionable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@BayramAnnakov
Copy link
Copy Markdown
Owner

@claude pls review this PR

shohu added a commit to shohu/claude-reflect that referenced this pull request Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction
detection — not just false positive filtering.

Three areas of improvement:

1. False positive prevention (superset of BayramAnnakov#19):
   - Full-width question mark (?) and CJK question particles (嗎吗呢か까)
   - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII)
   - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.)
     that previously triggered CORRECTION_PATTERNS in mixed CJK-English text

2. CJK correction pattern detection (new):
   - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った
   - Chinese: 不是、错了/錯了、不要X要Y
   - Korean: 아니、틀렸

3. Test coverage:
   - 22 new tests in TestCJKPatternDetection class
   - Short message rejection, question particles, non-correction phrases
   - All CJK correction patterns with confidence assertions
   - English pattern regression tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shohu added a commit to shohu/claude-reflect that referenced this pull request Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction
detection — not just false positive filtering.

Three areas of improvement:

1. False positive prevention (superset of BayramAnnakov#19):
   - Full-width question mark (?) and CJK question particles (嗎吗呢か까)
   - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII)
   - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.)
     that previously triggered CORRECTION_PATTERNS in mixed CJK-English text

2. CJK correction pattern detection (new):
   - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った
   - Chinese: 不是、错了/錯了、不要X要Y
   - Korean: 아니、틀렸

3. Test coverage:
   - 22 new tests in TestCJKPatternDetection class
   - Short message rejection, question particles, non-correction phrases
   - All CJK correction patterns with confidence assertions
   - English pattern regression tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shohu added a commit to shohu/claude-reflect that referenced this pull request Feb 28, 2026
Extends BayramAnnakov#19's scope to cover Japanese, Chinese, and Korean correction
detection — not just false positive filtering.

Three areas of improvement:

1. False positive prevention (superset of BayramAnnakov#19):
   - Full-width question mark (?) and CJK question particles (嗎吗呢か까)
   - Short message rejection with CJK-aware threshold (2 chars for CJK vs 4 for ASCII)
   - Non-correction English phrases ("No problem", "don't worry", "never mind", etc.)
     that previously triggered CORRECTION_PATTERNS in mixed CJK-English text

2. CJK correction pattern detection (new):
   - Japanese: いや、違う、そうじゃなくて、間違ってる、じゃなくて〜にして、やめて、って言った
   - Chinese: 不是、错了/錯了、不要X要Y
   - Korean: 아니、틀렸

3. Test coverage:
   - 22 new tests in TestCJKPatternDetection class
   - Short message rejection, question particles, non-correction phrases
   - All CJK correction patterns with confidence assertions
   - English pattern regression tests
Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review pass:

  • Main risk area here is UI state transitions, empty/error states, and interaction regressions.
  • Good to see test coverage move with the code; I’d still make sure it exercises the unhappy path around UI state transitions, empty/error states, and interaction regressions rather than only the happy path.
  • Before merge, I’d smoke-test the behavior touched by reflect_utils.py, test_reflect_utils.py with malformed input / retry / rollback cases, since that’s where this class of change usually breaks.

@BayramAnnakov
Copy link
Copy Markdown
Owner

Thank you for identifying this CJK gap, @yulin0629! Your PR correctly spotted the full-width issue and CJK question particles.

PR #24 by @shohu includes all three of your changes (full-width , CJK question particles, short message rejection) plus comprehensive CJK correction pattern detection (13 patterns across Japanese, Chinese, Korean), non-correction English phrase filtering, and 22 tests.

Closing in favor of #24 — your contribution helped kickstart this improvement and will be credited.

BayramAnnakov added a commit that referenced this pull request Mar 16, 2026
Merging comprehensive CJK support. All 219 tests pass. Credit to @yulin0629 for identifying the initial CJK gap in PR #19.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants