Skip to content

Comments

Improve parsing performance by avoiding String reallocations#62

Merged
kornelski merged 1 commit intokornelski:mainfrom
drott:parsingPerf
Jan 17, 2026
Merged

Improve parsing performance by avoiding String reallocations#62
kornelski merged 1 commit intokornelski:mainfrom
drott:parsingPerf

Conversation

@drott
Copy link

@drott drott commented Jan 16, 2026

We gain an about 10.7% performance improvement in the xml-rs integration in Chromium in blink_perf.parser xml parsing benchmark with this change. [1]

The problem in particular with the PullParser.buf is that after .take_buf() the buffer is reset to an empty String, and then single character or token pushes follow, leading to a lot of reallocations.

Instead, reset to an allocated buffer of size 20, which is choosen somewhat arbitraily as a compromise between covering many XML name length and avoiding too large temporary allocations.

Similarly, keep AttributeSet larger instead of needing reallocations to grow it. Accounts for about 1-2% of the perf improvement.

[1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/perf_tests/parser/xml-parser.html?q=xml-parser.html

We gain an about 10.7% performance improvement in the xml-rs integration
in Chromium in blink_perf.parser xml parsing benchmark with this
change. [1]

The problem in particular with the PullParser.buf is that after
.take_buf() the buffer is reset to an empty String, and then single
character or token pushes follow, leading to a lot of reallocations.

Instead, reset to an allocated buffer of size 20, which is choosen
somewhat arbitraily as a compromise between covering many XML name
length and avoiding too large temporary allocations.

Similarly, keep AttributeSet larger instead of needing reallocations to
grow it. Accounts for about 1-2% of the perf improvement.

[1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/perf_tests/parser/xml-parser.html?q=xml-parser.html
@drott
Copy link
Author

drott commented Jan 16, 2026

CC @sesse

@kornelski kornelski merged commit 76159a2 into kornelski:main Jan 17, 2026
2 checks passed
@kornelski
Copy link
Owner

If you're looking for perf improvements, the massive bottleneck is in the lexer reading chars one by one going all the way through the state machine.

This needs to be refactored to allow it to have an inner tight loop collecting entire strings at once.

The best approach to would be to refactor states from being enums to being functions that consume a slice.

https://github.com/cloudflare/lol-html/blob/main/src/parser/state_machine/mod.rs

lol-html obscures this with some clever macros, but at the core it's all about the current state of the parser being code not data. The code can ask for specific input, so it only needs to locally branch on that (loop until <, etc.) instead branching on every possible state/token every time.

@drott
Copy link
Author

drott commented Jan 19, 2026

Thanks for the quick merge and the additional performance considerations/suggestions.

Could you please tag a release? (That enables us to pull this change into Chromium.)

ajperel pushed a commit to chromium/chromium that referenced this pull request Jan 20, 2026
Changes merged in upstream but no version released yet.

Covers kornelski/xml-rs#62

Bug: 470367156
Change-Id: I9e778313b1f7fd88e96419879125dfd30b79dca0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7492973
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org>
Auto-Submit: Dominik Röttsches <drott@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1571637}
mohd-akram pushed a commit to gsource-mirror/chromium-src-third_party-rust that referenced this pull request Jan 20, 2026
Changes merged in upstream but no version released yet.

Covers kornelski/xml-rs#62

Bug: 470367156
Change-Id: I9e778313b1f7fd88e96419879125dfd30b79dca0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7492973
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org>
Auto-Submit: Dominik Röttsches <drott@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1571637}
NOKEYCHECK=True
GitOrigin-RevId: 89730e150cebf10780b580196850b04019935b93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants