Draft
Conversation
Measure both full block serialization and size computation via `SizeComputer`. `SizeComputer` returns the exact final size of the serialized content without writing any bytes. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 195,610.62 | 5,112.20 | 0.3% | 11.00 | `SerializeBlock` | 12,061.83 | 82,906.19 | 0.1% | 11.01 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 867,857.55 | 1,152.26 | 0.0% | 8,015,883.90 | 3,116,099.08 | 2.572 | 1,517,035.87 | 0.5% | 10.81 | `SerializeBlock` | 30,928.27 | 32,332.88 | 0.0% | 221,683.03 | 111,055.84 | 1.996 | 53,037.03 | 0.8% | 11.03 | `SizeComputerBlock`
Merged multiple template methods into single constexpr-delimited implementation to reduce template bloat (i.e. related functionality is grouped into a single method, but can be optimized because of C++20 constexpr conditions). This unifies related methods that were only bound before by similar signatures - and enables `SizeComputer` optimizations later
Endianness doesn’t affect final size, so skip it in `SizeComputer`. Fold existing overloads into one implementation, short‑circuiting logic when only the serialized size is needed. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 191,652.29 | 5,217.78 | 0.4% | 10.96 | `SerializeBlock` | 10,323.55 | 96,865.92 | 0.2% | 11.01 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 614,847.32 | 1,626.42 | 0.0% | 8,015,883.64 | 2,207,628.07 | 3.631 | 1,517,035.62 | 0.5% | 10.56 | `SerializeBlock` | 26,020.31 | 38,431.52 | 0.0% | 159,390.03 | 93,438.33 | 1.706 | 42,131.03 | 0.9% | 11.00 | `SizeComputerBlock`
Single byte writes are used very often (used for every (u)int8_t or std::byte or bool and for every VarInt's first byte which is also needed for every (pre)Vector). It makes sense to avoid the generalized serialization infrastructure that isn't needed: * AutoFile write doesn't need to allocate 4k buffer for a single byte now; * `VectorWriter` and `DataStream` avoids memcpy/insert calls. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000 > C compiler ............................ AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 174,569.19 | 5,728.39 | 0.6% | 10.89 | `SerializeBlock` | 10,241.16 | 97,645.21 | 0.0% | 11.00 | `SizeComputerBlock` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 615,000.56 | 1,626.01 | 0.0% | 8,015,883.64 | 2,208,340.88 | 3.630 | 1,517,035.62 | 0.5% | 10.56 | `SerializeBlock` | 25,676.76 | 38,945.72 | 0.0% | 159,390.03 | 92,202.10 | 1.729 | 42,131.03 | 0.9% | 11.00 | `SizeComputerBlock`
Fast-path the common single-byte case and batch multi-byte encodes into a single span write.
Use a single templated read() implementation for fixed and dynamic span extents, and keep the 1-byte read fast path inside that method.
Use a single templated DataStream::write() implementation for fixed and dynamic span extents, keeping the static-extent special cases inside the same method.
Add an explicit append fast path in VectorWriter::write(std::span<const std::byte>) and reuse a single source pointer for both insert branches. This removes overwrite bookkeeping when nPos is already at the end, which is the dominant case. Microbenchmark (/tmp/serialize_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_perf_idea18_before.tsv after: /tmp/serialize_perf_idea18_after2.tsv VectorWriterWriteSpan32: 19.749404 -> 16.854612 ns/op (-14.658%) ReadCompactSize: 12.471655 -> 9.180395 ns/op (-26.390%) SerializeUint32: 1.328602 -> 1.258573 ns/op (-5.271%) UnserializeUint32: 2.460472 -> 2.469724 ns/op (+0.376%; noise-level)
Use std::move when inserting deserialized temporary map/set elements. The value/category is already unique in this context, so this removes extra key/value copies while keeping code straightforward. Microbenchmark (/tmp/serialize_assoc_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_assoc_idea27_before.tsv after: /tmp/serialize_assoc_idea27_after2.tsv UnserializeMap: 117092.799167 -> 92975.440833 ns/op (-20.598%) UnserializeSet: 157473.293333 -> 132820.890000 ns/op (-15.655%)
Add a direct single-chunk path for BasicByte vector/prevector deserialization when encoded size fits within one allocation chunk. This avoids loop bookkeeping in the common case. Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_vector_try_before.tsv after: /tmp/serialize_vector_try_patch.tsv VectorUnserialize: 57.370785 -> 55.291175 ns/op (-3.625%) PrevectorUnserialize: 43.594430 -> 40.879915 ns/op (-6.226%) serialize: use size_t counters in byte vector chunk loops Switch BasicByte vector/prevector chunked deserialization counters from unsigned int to size_t. This removes repeated integer-width conversions in loop control and std::min calls. Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3): before: /tmp/serialize_vector_try_size_t_before.tsv after: /tmp/serialize_vector_try_size_t_after.tsv VectorUnserialize: 53.531320 -> 51.489485 ns/op (-3.814%) PrevectorUnserialize: 39.887540 -> 39.709020 ns/op (-0.448%)
Bench: /tmp/ab_bench_score.py ab61b_writevarint_fastpath_2_3_4_reorder_p24 --pairs 24 score geomean_ns median: 133.394213552 -> 130.377638706 (-2.26%) serialize_perf geomean_ns median: 7.379390515 -> 7.126054177 (-3.43%) assoc_rw geomean_ns median: 43638.598005401 -> 43633.326177539 (-0.01%)
Bench: /tmp/ab_bench_score.py ab72b_writevarint_fastpath_5byte_p24 --pairs 24 score geomean_ns median: 130.459794148 -> 128.699255502 (-1.35%) serialize_perf geomean_ns median: 7.129743280 -> 6.975918082 (-2.16%) assoc_rw geomean_ns median: 43707.674388987 -> 43725.749214815 (+0.04%)
Bench: /tmp/ab_bench_score.py ab73b_writevarint_fastpath_6byte_p24 --pairs 24 score geomean_ns median: 128.608289348 -> 126.638221393 (-1.53%) serialize_perf geomean_ns median: 6.975603185 -> 6.811420432 (-2.35%) assoc_rw geomean_ns median: 43707.099246200 -> 43745.066824529 (+0.09%) # Conflicts: # src/serialize.h # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # interactive rebase in progress; onto 2069ee5b87 # Last commands done (3 commands done): # edit 70ac3174a9 serialize: fast-path 5-byte varint writes # edit 651423670b serialize: fast-path 6-byte varint writes # Next commands to do (5 remaining commands): # edit 05677d5bd5 serialize: inline and fast-path 7-byte varint writes # edit a2b0033b57 serialize: fast-path 8-byte varint writes # You are currently rebasing branch 'codex/pr31868-serialize-opt' on '2069ee5b87'. # # Changes to be committed: # modified: src/serialize.h # # Untracked files: # baseline-serialize-2.json # baseline-serialize-suite.json # baseline-serialize.json # digest_fit.py # output.log # result-rpi5-16-2-clang.txt # result-rpi5-16-2-gcc.txt # test/cache/ #
Bench: /tmp/ab_bench_score.py ab76b_writevarint_fastpath_7_always_inline_p24 --pairs 24 score geomean_ns median: 126.693922531 -> 123.529060714 (-2.50%) serialize_perf geomean_ns median: 6.821017413 -> 6.570016636 (-3.68%) assoc_rw geomean_ns median: 43706.512507994 -> 43743.488301889 (+0.08%)
l0rinc
commented
Feb 20, 2026
| #include <optional> | ||
| #include <vector> | ||
|
|
||
| static void SizeComputerBlock(benchmark::Bench& bench) { |
Owner
Author
There was a problem hiding this comment.
8339284: bench: measure block (size)serialization speed
pending comment
|
|
||
| static void SizeComputerBlock(benchmark::Bench& bench) { | ||
| CBlock block; | ||
| DataStream(benchmark::data::block413567) >> TX_WITH_WITNESS(block); |
Owner
Author
There was a problem hiding this comment.
no header for some reason?!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.