Skip to content

Conversation

@simonklee
Copy link
Contributor

Note: I wasn't sure weather to add this as an issue or a PR, so I went with PR,
but feel free to close if you want to discuss approach first.

The gist of this change is to improve word wrapping performance on large
single-line chunks by switching to a streaming approach instead of
precomputing all word boundaries.

before

main.mp4

after

word-wrap.mp4

Word wrapping large single-line files (minified JavaScript, continuous
logs) was slow. The old getWrapOffsets() precomputed all word boundary
positions for the entire chunk before wrapping began. Multi-megabyte
files produced arrays with tens of thousands of entries, most unused
since wrapping only needs boundaries within the current wrap width.

Added a hybrid strategy based on chunk size. Chunks larger than
64KB now use findWordWrapPosition(), which scans only up to wrap_width
columns per line and returns the last word boundary within that
window. This stops early instead of walking the full chunk. Smaller
chunks keep the cached approach where the upfront cost pays off
through cache locality.

Note: wrap-break detection now honors width_method in the cached
path. This changes semantics for .wcwidth and .no_zwj
(per‑codepoint breaks; ZWJ forces a break), while .unicode behavior
is unchanged. This aligns cached offsets with the streaming path and
cursor movement.

The streaming path uses per-codepoint widths without full grapheme
state, so complex emoji or Indic sequences may wrap differently than
cached, but only in chunks over 64KB containing such sequences at wrap
boundaries.

Benchmarks:

Baseline 31a5cc2 (main) -> Current c688b4a (perf/word-wrap)

Benchmark Baseline Current Delta Memory
TextBufferView wrap (word, width=120, single-line) 7.76ms 1.41ms -81.8% -15.19 MB
TextBufferView wrap (word, width=80, single-line) 7.76ms 1.45ms -81.3% -15.19 MB
TextBufferView wrap (word, width=40, single-line) 7.84ms 1.57ms -79.9% -15.19 MB
TextBufferView wrap (char, width=120, single-line) 666.44µs 649.26µs -2.6% 888 B
TextBufferView wrap (char, width=80, multi-line) 8.40ms 8.27ms -1.5% 2.84 MB
TextBufferView wrap (char, width=80, single-line) 688.99µs 685.68µs -0.5% 888 B
TextBufferView wrap (char, width=40, single-line) 779.51µs 777.64µs -0.2% 888 B
TextBufferView wrap (char, width=40, multi-line) 9.39ms 9.38ms -0.0% 4.37 MB
TextBufferView wrap (char, width=120, multi-line) 7.75ms 7.75ms -0.0% 2.84 MB
TextBufferView wrap (word, width=40, multi-line) 8.12ms 8.16ms +0.5% 7.09 MB
TextBufferView wrap (word, width=120, multi-line) 5.12ms 5.17ms +1.0% 7.09 MB
TextBufferView wrap (word, width=80, multi-line) 5.42ms 5.49ms +1.2% 7.09 MB

Word wrapping large single-line files (minified JavaScript, continuous
logs) was slow. The old getWrapOffsets() precomputed all word boundary
positions for the entire chunk before wrapping began. Multi-megabyte
files produced arrays with tens of thousands of entries, most unused
since wrapping only needs boundaries within the current wrap width.

Added a hybrid strategy based on chunk size. Chunks larger than
64KB now use findWordWrapPosition(), which scans only up to wrap_width
columns per line and returns the last word boundary within that
window. This stops early instead of walking the full chunk. Smaller
chunks keep the cached approach where the upfront cost pays off
through cache locality.

Note: wrap-break detection now honors `width_method` in the cached
path. This changes semantics for `.wcwidth` and `.no_zwj`
(per‑codepoint breaks; ZWJ forces a break), while `.unicode` behavior
is unchanged. This aligns cached offsets with the streaming path and
cursor movement.

The streaming path uses per-codepoint widths without full grapheme
state, so complex emoji or Indic sequences may wrap differently than
cached, but only in chunks over 64KB containing such sequences at wrap
boundaries.
@kommander
Copy link
Collaborator

Cool! I like the approach, I've been meaning to do something like that for graphemes as well.

@kommander
Copy link
Collaborator

@simonklee I added some improvements in #473 after investigating this. It fixes the resize wrapping performance for the minfied JS in the demo for me. Does it help for you here as well? I'll work on memory usage for this as well as not all wrap points are used, I'd go with incremental calculation and caching, to not introduce two different paths in the already complex wrapping logic, so it only calculates one wrap break per line and caches it, accumulating in the cache when resizing a lot (not common), so it can re-use cached wrap breaks and only tries to find a new one if no suiting one is found for the range.

@simonklee
Copy link
Contributor Author

@simonklee I added some improvements in #473 after investigating this. It fixes the resize wrapping performance for the minfied JS in the demo for me. Does it help for you here as well? I'll work on memory usage for this as well as not all wrap points are used, I'd go with incremental calculation and caching, to not introduce two different paths in the already complex wrapping logic, so it only calculates one wrap break per line and caches it, accumulating in the cache when resizing a lot (not common), so it can re-use cached wrap breaks and only tries to find a new one if no suiting one is found for the range.

Thanks, when I thought more about this I kind of regretting adding another code-path so I agree with your observation. I will have a look at your branch and see how it performs for me.

@simonklee
Copy link
Contributor Author

Tested your implementation and it works well for me locally, not only that, the fix is 10x simpler.

@simonklee simonklee closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants