perf: skip no-op merge passes in analysis pipeline#119
Open
dbwls99706 wants to merge 1 commit intochenglou:mainfrom
Open
perf: skip no-op merge passes in analysis pipeline#119dbwls99706 wants to merge 1 commit intochenglou:mainfrom
dbwls99706 wants to merge 1 commit intochenglou:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The six post-segmentation passes in
buildMergedSegmentationunconditionally allocate and populate new output arrays even when the input contains no patterns that would trigger a merge or split. This adds a linear early-exit guard at the top of each function that returns the input segmentation unchanged when the relevant pattern is absent.Each guard is intentionally a cheap necessary-condition scan: if the guard returns false, the pass cannot produce any change. A guard may return true when no actual merge happens (false positive), but it will never skip a pass that would have produced a change (no false negative).
Motivation
For text that never hits a given pass — e.g. pure CJK has no URLs, no numeric runs, no ASCII punctuation chains, no hyphenated numbers — each pass still copies four arrays (
texts,isWordLike,kinds,starts) element-by-element to produce identical output. The guards are O(n) with early break on first match, and in the no-op cases targeted here, they are cheaper than allocating and populating replacement arrays.Note:
carryTrailingForwardStickyAcrossCJKBoundaryis the exception — its guard triggers on CJK text (adjacent CJK text pairs), so the CJK improvement comes primarily from the other five guards. This guard benefits non-CJK text that would otherwise pay for.slice()copies without any carries to perform.Changes
src/analysis.ts— added early-exit guards to 6 internal functions:mergeUrlLikeRuns: skip when no URL-like run starts existmergeUrlQueryRuns: skip when no URL query boundary segments exist (conservative —isUrlQueryBoundarySegmentalready requires://orwww.prefix, so the guard is at least as wide as the actual merge condition)mergeNumericRuns: skip when no numeric run segments with decimal digits existmergeAsciiPunctuationChains: skip when no trailing-joiner wordlike text is followed by another wordlike text (necessary condition for the inner while loop to merge anything)splitHyphenatedNumericRuns: skip when no text contains both-and a decimal digitcarryTrailingForwardStickyAcrossCJKBoundary: skip when no adjacent CJK text pairs existlayout()hot path changesBenchmark
Environment: Windows 11, Bun 1.3.11, fake canvas backend.
Method:
analyzeText()× 5000 iters, trimmed mean of 20 rounds, alternating patched (P) / original (O) across 3 independent process pairs.No-pattern text (guards skip all passes):
Pattern-heavy text (guards pass through to existing logic):
Bottom line: In this local benchmark, English plain prose consistently improved across all 3 pairs (~10% in the 150c case). Pattern-free CJK inputs showed improvement in most pairs. No consistent regression was observed on pattern-heavy inputs.
Test plan
bun test— 84 tests pass, 0 fail