perf: skip no-op merge passes in analysis pipeline by dbwls99706 · Pull Request #119 · chenglou/pretext

dbwls99706 · 2026-04-09T08:47:39Z

Summary

The six post-segmentation passes in buildMergedSegmentation unconditionally allocate and populate new output arrays even when the input contains no patterns that would trigger a merge or split. This adds a linear early-exit guard at the top of each function that returns the input segmentation unchanged when the relevant pattern is absent.

Each guard is intentionally a cheap necessary-condition scan: if the guard returns false, the pass cannot produce any change. A guard may return true when no actual merge happens (false positive), but it will never skip a pass that would have produced a change (no false negative).

Motivation

For text that never hits a given pass — e.g. pure CJK has no URLs, no numeric runs, no ASCII punctuation chains, no hyphenated numbers — each pass still copies four arrays (texts, isWordLike, kinds, starts) element-by-element to produce identical output. The guards are O(n) with early break on first match, and in the no-op cases targeted here, they are cheaper than allocating and populating replacement arrays.

Note: carryTrailingForwardStickyAcrossCJKBoundary is the exception — its guard triggers on CJK text (adjacent CJK text pairs), so the CJK improvement comes primarily from the other five guards. This guard benefits non-CJK text that would otherwise pay for .slice() copies without any carries to perform.

Changes

src/analysis.ts — added early-exit guards to 6 internal functions:
- mergeUrlLikeRuns: skip when no URL-like run starts exist
- mergeUrlQueryRuns: skip when no URL query boundary segments exist (conservative — isUrlQueryBoundarySegment already requires :// or www. prefix, so the guard is at least as wide as the actual merge condition)
- mergeNumericRuns: skip when no numeric run segments with decimal digits exist
- mergeAsciiPunctuationChains: skip when no trailing-joiner wordlike text is followed by another wordlike text (necessary condition for the inner while loop to merge anything)
- splitHyphenatedNumericRuns: skip when no text contains both - and a decimal digit
- carryTrailingForwardStickyAcrossCJKBoundary: skip when no adjacent CJK text pairs exist
No changes to existing merge/split logic — guards only add an early return path
No public API changes, no layout() hot path changes

Benchmark

Environment: Windows 11, Bun 1.3.11, fake canvas backend.
Method: analyzeText() × 5000 iters, trimmed mean of 20 rounds, alternating patched (P) / original (O) across 3 independent process pairs.

No-pattern text (guards skip all passes):

Text	P1	O1	P2	O2	P3	O3
Chinese 150c (µs)	122.3	150.9	138.0	117.8	112.5	121.1
English 150c (µs)	39.0	44.4	38.5	41.2	34.6	39.8
Long Chinese 1500c (µs)	1176.4	1226.0	1231.1	1212.9	1050.5	1274.7

Pattern-heavy text (guards pass through to existing logic):

Text	P1	O1	P2	O2	P3	O3
AllPatterns (µs)	51.9	54.4	45.7	42.7	43.2	50.6
URLs (µs)	31.2	36.4	28.7	28.7	27.7	34.2
AppText (µs)	43.4	35.3	41.2	45.0	40.4	49.0

Bottom line: In this local benchmark, English plain prose consistently improved across all 3 pairs (~10% in the 150c case). Pattern-free CJK inputs showed improvement in most pairs. No consistent regression was observed on pattern-heavy inputs.

Test plan

bun test — 84 tests pass, 0 fail
Benchmark: pattern-free text shows improvement, no worst-case regression
Browser benchmark verification on macOS (not available to contributor)

perf: skip no-op merge passes in analysis pipeline

45929cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: skip no-op merge passes in analysis pipeline#119

perf: skip no-op merge passes in analysis pipeline#119
dbwls99706 wants to merge 1 commit intochenglou:mainfrom
dbwls99706:perf/skip-no-op-merge-passes

dbwls99706 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dbwls99706 commented Apr 9, 2026

Summary

Motivation

Changes

Benchmark

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant