Serialization specializations by l0rinc · Pull Request #122 · l0rinc/bitcoin

l0rinc · 2026-02-19T19:14:55Z

No description provided.

Measure both full block serialization and size computation via `SizeComputer`. `SizeComputer` returns the exact final size of the serialized content without writing any bytes. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 195,610.62 | 5,112.20 | 0.3% | 11.00 | `SerializeBlock` | 12,061.83 | 82,906.19 | 0.1% | 11.01 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 867,857.55 | 1,152.26 | 0.0% | 8,015,883.90 | 3,116,099.08 | 2.572 | 1,517,035.87 | 0.5% | 10.81 | `SerializeBlock` | 30,928.27 | 32,332.88 | 0.0% | 221,683.03 | 111,055.84 | 1.996 | 53,037.03 | 0.8% | 11.03 | `SizeComputerBlock`

Merged multiple template methods into single constexpr-delimited implementation to reduce template bloat (i.e. related functionality is grouped into a single method, but can be optimized because of C++20 constexpr conditions). This unifies related methods that were only bound before by similar signatures - and enables `SizeComputer` optimizations later

Endianness doesn’t affect final size, so skip it in `SizeComputer`. Fold existing overloads into one implementation, short‑circuiting logic when only the serialized size is needed. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 191,652.29 | 5,217.78 | 0.4% | 10.96 | `SerializeBlock` | 10,323.55 | 96,865.92 | 0.2% | 11.01 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 614,847.32 | 1,626.42 | 0.0% | 8,015,883.64 | 2,207,628.07 | 3.631 | 1,517,035.62 | 0.5% | 10.56 | `SerializeBlock` | 26,020.31 | 38,431.52 | 0.0% | 159,390.03 | 93,438.33 | 1.706 | 42,131.03 | 0.9% | 11.00 | `SizeComputerBlock`

Single byte writes are used very often (used for every (u)int8_t or std::byte or bool and for every VarInt's first byte which is also needed for every (pre)Vector). It makes sense to avoid the generalized serialization infrastructure that isn't needed: * AutoFile write doesn't need to allocate 4k buffer for a single byte now; * `VectorWriter` and `DataStream` avoids memcpy/insert calls. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 174,569.19 | 5,728.39 | 0.6% | 10.89 | `SerializeBlock` | 10,241.16 | 97,645.21 | 0.0% | 11.00 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 615,000.56 | 1,626.01 | 0.0% | 8,015,883.64 | 2,208,340.88 | 3.630 | 1,517,035.62 | 0.5% | 10.56 | `SerializeBlock` | 25,676.76 | 38,945.72 | 0.0% | 159,390.03 | 92,202.10 | 1.729 | 42,131.03 | 0.9% | 11.00 | `SizeComputerBlock`

Fast-path the common single-byte case and batch multi-byte encodes into a single span write.

Use a single templated read() implementation for fixed and dynamic span extents, and keep the 1-byte read fast path inside that method.

Use a single templated DataStream::write() implementation for fixed and dynamic span extents, keeping the static-extent special cases inside the same method.

Add an explicit append fast path in VectorWriter::write(std::span<const std::byte>) and reuse a single source pointer for both insert branches. This removes overwrite bookkeeping when nPos is already at the end, which is the dominant case. Microbenchmark (/tmp/serialize_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_perf_idea18_before.tsv after: /tmp/serialize_perf_idea18_after2.tsv VectorWriterWriteSpan32: 19.749404 -> 16.854612 ns/op (-14.658%) ReadCompactSize: 12.471655 -> 9.180395 ns/op (-26.390%) SerializeUint32: 1.328602 -> 1.258573 ns/op (-5.271%) UnserializeUint32: 2.460472 -> 2.469724 ns/op (+0.376%; noise-level)

Use std::move when inserting deserialized temporary map/set elements. The value/category is already unique in this context, so this removes extra key/value copies while keeping code straightforward. Microbenchmark (/tmp/serialize_assoc_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_assoc_idea27_before.tsv after: /tmp/serialize_assoc_idea27_after2.tsv UnserializeMap: 117092.799167 -> 92975.440833 ns/op (-20.598%) UnserializeSet: 157473.293333 -> 132820.890000 ns/op (-15.655%)

Add a direct single-chunk path for BasicByte vector/prevector deserialization when encoded size fits within one allocation chunk. This avoids loop bookkeeping in the common case. Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_vector_try_before.tsv after: /tmp/serialize_vector_try_patch.tsv VectorUnserialize: 57.370785 -> 55.291175 ns/op (-3.625%) PrevectorUnserialize: 43.594430 -> 40.879915 ns/op (-6.226%) serialize: use size_t counters in byte vector chunk loops Switch BasicByte vector/prevector chunked deserialization counters from unsigned int to size_t. This removes repeated integer-width conversions in loop control and std::min calls. Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_vector_try_size_t_before.tsv after: /tmp/serialize_vector_try_size_t_after.tsv VectorUnserialize: 53.531320 -> 51.489485 ns/op (-3.814%) PrevectorUnserialize: 39.887540 -> 39.709020 ns/op (-0.448%)

Bench: /tmp/ab_bench_score.py ab61b_writevarint_fastpath_2_3_4_reorder_p24 --pairs 24 score geomean_ns median: 133.394213552 -> 130.377638706 (-2.26%) serialize_perf geomean_ns median: 7.379390515 -> 7.126054177 (-3.43%) assoc_rw geomean_ns median: 43638.598005401 -> 43633.326177539 (-0.01%)

Bench: /tmp/ab_bench_score.py ab72b_writevarint_fastpath_5byte_p24 --pairs 24 score geomean_ns median: 130.459794148 -> 128.699255502 (-1.35%) serialize_perf geomean_ns median: 7.129743280 -> 6.975918082 (-2.16%) assoc_rw geomean_ns median: 43707.674388987 -> 43725.749214815 (+0.04%)

Bench: /tmp/ab_bench_score.py ab73b_writevarint_fastpath_6byte_p24 --pairs 24 score geomean_ns median: 128.608289348 -> 126.638221393 (-1.53%) serialize_perf geomean_ns median: 6.975603185 -> 6.811420432 (-2.35%) assoc_rw geomean_ns median: 43707.099246200 -> 43745.066824529 (+0.09%) # Conflicts: # src/serialize.h # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # interactive rebase in progress; onto 2069ee5b87 # Last commands done (3 commands done): # edit 70ac3174a9 serialize: fast-path 5-byte varint writes # edit 651423670b serialize: fast-path 6-byte varint writes # Next commands to do (5 remaining commands): # edit 05677d5bd5 serialize: inline and fast-path 7-byte varint writes # edit a2b0033b57 serialize: fast-path 8-byte varint writes # You are currently rebasing branch 'codex/pr31868-serialize-opt' on '2069ee5b87'. # # Changes to be committed: # modified: src/serialize.h # # Untracked files: # baseline-serialize-2.json # baseline-serialize-suite.json # baseline-serialize.json # digest_fit.py # output.log # result-rpi5-16-2-clang.txt # result-rpi5-16-2-gcc.txt # test/cache/ #

Bench: /tmp/ab_bench_score.py ab76b_writevarint_fastpath_7_always_inline_p24 --pairs 24 score geomean_ns median: 126.693922531 -> 123.529060714 (-2.50%) serialize_perf geomean_ns median: 6.821017413 -> 6.570016636 (-3.68%) assoc_rw geomean_ns median: 43706.512507994 -> 43743.488301889 (+0.08%)

l0rinc · 2026-02-20T13:36:38Z

src/bench/checkblock.cpp

 #include <optional>
 #include <vector>

+static void SizeComputerBlock(benchmark::Bench& bench) {


8339284: bench: measure block (size)serialization speed

pending comment

l0rinc · 2026-02-20T13:37:54Z

src/bench/checkblock.cpp


+static void SizeComputerBlock(benchmark::Bench& bench) {
+    CBlock block;
+    DataStream(benchmark::data::block413567) >> TX_WITH_WITNESS(block);


no header for some reason?!

l0rinc added 15 commits February 19, 2026 20:13

refactor: add explicit static extent to spans

19d97d4

serialize: optimize WriteVarInt writes

bec0cb7

Fast-path the common single-byte case and batch multi-byte encodes into a single span write.

streams: specialize span reads

8a84b59

Use a single templated read() implementation for fixed and dynamic span extents, and keep the 1-byte read fast path inside that method.

streams: specialize span writes

90bca75

Use a single templated DataStream::write() implementation for fixed and dynamic span extents, keeping the static-extent special cases inside the same method.

l0rinc commented Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization specializations#122

Serialization specializations#122
l0rinc wants to merge 15 commits intomasterfrom
detached486

l0rinc commented Feb 19, 2026

Uh oh!

l0rinc Feb 20, 2026

Uh oh!

l0rinc Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l0rinc commented Feb 19, 2026

Uh oh!

l0rinc Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

l0rinc Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l0rinc Feb 20, 2026 •

edited

Loading