Improve parsing performance by avoiding String reallocations by drott · Pull Request #62 · kornelski/xml-rs

drott · 2026-01-16T15:51:26Z

We gain an about 10.7% performance improvement in the xml-rs integration in Chromium in blink_perf.parser xml parsing benchmark with this change. [1]

The problem in particular with the PullParser.buf is that after .take_buf() the buffer is reset to an empty String, and then single character or token pushes follow, leading to a lot of reallocations.

Instead, reset to an allocated buffer of size 20, which is choosen somewhat arbitraily as a compromise between covering many XML name length and avoiding too large temporary allocations.

Similarly, keep AttributeSet larger instead of needing reallocations to grow it. Accounts for about 1-2% of the perf improvement.

[1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/perf_tests/parser/xml-parser.html?q=xml-parser.html

We gain an about 10.7% performance improvement in the xml-rs integration in Chromium in blink_perf.parser xml parsing benchmark with this change. [1] The problem in particular with the PullParser.buf is that after .take_buf() the buffer is reset to an empty String, and then single character or token pushes follow, leading to a lot of reallocations. Instead, reset to an allocated buffer of size 20, which is choosen somewhat arbitraily as a compromise between covering many XML name length and avoiding too large temporary allocations. Similarly, keep AttributeSet larger instead of needing reallocations to grow it. Accounts for about 1-2% of the perf improvement. [1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/perf_tests/parser/xml-parser.html?q=xml-parser.html

drott · 2026-01-16T15:55:59Z

CC @sesse

kornelski · 2026-01-17T01:43:49Z

If you're looking for perf improvements, the massive bottleneck is in the lexer reading chars one by one going all the way through the state machine.

This needs to be refactored to allow it to have an inner tight loop collecting entire strings at once.

The best approach to would be to refactor states from being enums to being functions that consume a slice.

https://github.com/cloudflare/lol-html/blob/main/src/parser/state_machine/mod.rs

lol-html obscures this with some clever macros, but at the core it's all about the current state of the parser being code not data. The code can ask for specific input, so it only needs to locally branch on that (loop until <, etc.) instead branching on every possible state/token every time.

drott · 2026-01-19T10:24:52Z

Thanks for the quick merge and the additional performance considerations/suggestions.

Could you please tag a release? (That enables us to pull this change into Chromium.)

Changes merged in upstream but no version released yet. Covers kornelski/xml-rs#62 Bug: 470367156 Change-Id: I9e778313b1f7fd88e96419879125dfd30b79dca0 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7492973 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Auto-Submit: Dominik Röttsches <drott@chromium.org> Cr-Commit-Position: refs/heads/main@{#1571637}

Changes merged in upstream but no version released yet. Covers kornelski/xml-rs#62 Bug: 470367156 Change-Id: I9e778313b1f7fd88e96419879125dfd30b79dca0 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7492973 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Auto-Submit: Dominik Röttsches <drott@chromium.org> Cr-Commit-Position: refs/heads/main@{#1571637} NOKEYCHECK=True GitOrigin-RevId: 89730e150cebf10780b580196850b04019935b93

drott force-pushed the parsingPerf branch from a25b35d to 73d7137 Compare January 16, 2026 15:54

kornelski merged commit 76159a2 into kornelski:main Jan 17, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Improve parsing performance by avoiding String reallocations#62

Improve parsing performance by avoiding String reallocations#62
kornelski merged 1 commit intokornelski:mainfrom
drott:parsingPerf

drott commented Jan 16, 2026 •

edited

Loading

Uh oh!

drott commented Jan 16, 2026

Uh oh!

Uh oh!

kornelski commented Jan 17, 2026

Uh oh!

drott commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

drott commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drott commented Jan 16, 2026

Uh oh!

Uh oh!

kornelski commented Jan 17, 2026

Uh oh!

drott commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drott commented Jan 16, 2026 •

edited

Loading

drott commented Jan 19, 2026 •

edited

Loading