Skip to content

Serialization specializations#122

Draft
l0rinc wants to merge 15 commits intomasterfrom
detached486
Draft

Serialization specializations#122
l0rinc wants to merge 15 commits intomasterfrom
detached486

Conversation

@l0rinc
Copy link
Copy Markdown
Owner

@l0rinc l0rinc commented Feb 19, 2026

No description provided.

Measure both full block serialization and size computation via `SizeComputer`.
`SizeComputer` returns the exact final size of the serialized content without writing any bytes.

> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000

> C compiler ............................ AppleClang 16.0.0.16000026

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|          195,610.62 |            5,112.20 |    0.3% |     11.00 | `SerializeBlock`
|           12,061.83 |           82,906.19 |    0.1% |     11.01 | `SizeComputerBlock`

> C++ compiler .......................... GNU 13.3.0

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          867,857.55 |            1,152.26 |    0.0% |    8,015,883.90 |    3,116,099.08 |  2.572 |   1,517,035.87 |    0.5% |     10.81 | `SerializeBlock`
|           30,928.27 |           32,332.88 |    0.0% |      221,683.03 |      111,055.84 |  1.996 |      53,037.03 |    0.8% |     11.03 | `SizeComputerBlock`
Merged multiple template methods into single constexpr-delimited implementation to reduce template bloat (i.e. related functionality is grouped into a single method, but can be optimized because of C++20 constexpr conditions).
This unifies related methods that were only bound before by similar signatures - and enables `SizeComputer` optimizations later
Endianness doesn’t affect final size, so skip it in `SizeComputer`.
Fold existing overloads into one implementation, short‑circuiting logic when only the serialized size is needed.

> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000

> C compiler ............................ AppleClang 16.0.0.16000026

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|          191,652.29 |            5,217.78 |    0.4% |     10.96 | `SerializeBlock`
|           10,323.55 |           96,865.92 |    0.2% |     11.01 | `SizeComputerBlock`

> C++ compiler .......................... GNU 13.3.0

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          614,847.32 |            1,626.42 |    0.0% |    8,015,883.64 |    2,207,628.07 |  3.631 |   1,517,035.62 |    0.5% |     10.56 | `SerializeBlock`
|           26,020.31 |           38,431.52 |    0.0% |      159,390.03 |       93,438.33 |  1.706 |      42,131.03 |    0.9% |     11.00 | `SizeComputerBlock`
Single byte writes are used very often (used for every (u)int8_t or std::byte or bool and for every VarInt's first byte which is also needed for every (pre)Vector).
It makes sense to avoid the generalized serialization infrastructure that isn't needed:
* AutoFile write doesn't need to allocate 4k buffer for a single byte now;
* `VectorWriter` and `DataStream` avoids memcpy/insert calls.

> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SizeComputerBlock|SerializeBlock' --min-time=10000

> C compiler ............................ AppleClang 16.0.0.16000026

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|          174,569.19 |            5,728.39 |    0.6% |     10.89 | `SerializeBlock`
|           10,241.16 |           97,645.21 |    0.0% |     11.00 | `SizeComputerBlock`

> C++ compiler .......................... GNU 13.3.0

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          615,000.56 |            1,626.01 |    0.0% |    8,015,883.64 |    2,208,340.88 |  3.630 |   1,517,035.62 |    0.5% |     10.56 | `SerializeBlock`
|           25,676.76 |           38,945.72 |    0.0% |      159,390.03 |       92,202.10 |  1.729 |      42,131.03 |    0.9% |     11.00 | `SizeComputerBlock`
Fast-path the common single-byte case and batch multi-byte encodes into a single span write.
Use a single templated read() implementation for fixed and dynamic span extents, and keep the 1-byte read fast path inside that method.
Use a single templated DataStream::write() implementation for fixed and dynamic span extents, keeping the static-extent special cases inside the same method.
Add an explicit append fast path in VectorWriter::write(std::span<const std::byte>) and reuse a single source pointer for both insert branches. This removes overwrite bookkeeping when nPos is already at the end, which is the dominant case.

Microbenchmark (/tmp/serialize_perf.cpp, g++-14.2, -O3):

before: /tmp/serialize_perf_idea18_before.tsv

after:  /tmp/serialize_perf_idea18_after2.tsv

VectorWriterWriteSpan32: 19.749404 -> 16.854612 ns/op (-14.658%)

ReadCompactSize: 12.471655 -> 9.180395 ns/op (-26.390%)

SerializeUint32: 1.328602 -> 1.258573 ns/op (-5.271%)

UnserializeUint32: 2.460472 -> 2.469724 ns/op (+0.376%; noise-level)
Use std::move when inserting deserialized temporary map/set elements. The value/category is already unique in this context, so this removes extra key/value copies while keeping code straightforward.

Microbenchmark (/tmp/serialize_assoc_perf.cpp, g++-14.2, -O3):

before: /tmp/serialize_assoc_idea27_before.tsv

after:  /tmp/serialize_assoc_idea27_after2.tsv

UnserializeMap: 117092.799167 -> 92975.440833 ns/op (-20.598%)

UnserializeSet: 157473.293333 -> 132820.890000 ns/op (-15.655%)
Add a direct single-chunk path for BasicByte vector/prevector deserialization when encoded size fits within one allocation chunk. This avoids loop bookkeeping in the common case.

Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3):

before: /tmp/serialize_vector_try_before.tsv

after:  /tmp/serialize_vector_try_patch.tsv

VectorUnserialize: 57.370785 -> 55.291175 ns/op (-3.625%)

PrevectorUnserialize: 43.594430 -> 40.879915 ns/op (-6.226%)

serialize: use size_t counters in byte vector chunk loops

Switch BasicByte vector/prevector chunked deserialization counters from unsigned int to size_t. This removes repeated integer-width conversions in loop control and std::min calls.

Microbenchmark (/tmp/serialize_vector_perf.cpp, g++-14.2, -O3):

before: /tmp/serialize_vector_try_size_t_before.tsv

after:  /tmp/serialize_vector_try_size_t_after.tsv

VectorUnserialize: 53.531320 -> 51.489485 ns/op (-3.814%)

PrevectorUnserialize: 39.887540 -> 39.709020 ns/op (-0.448%)
Bench: /tmp/ab_bench_score.py ab61b_writevarint_fastpath_2_3_4_reorder_p24 --pairs 24

score geomean_ns median: 133.394213552 -> 130.377638706 (-2.26%)

serialize_perf geomean_ns median: 7.379390515 -> 7.126054177 (-3.43%)

assoc_rw geomean_ns median: 43638.598005401 -> 43633.326177539 (-0.01%)
Bench: /tmp/ab_bench_score.py ab72b_writevarint_fastpath_5byte_p24 --pairs 24

score geomean_ns median: 130.459794148 -> 128.699255502 (-1.35%)

serialize_perf geomean_ns median: 7.129743280 -> 6.975918082 (-2.16%)

assoc_rw geomean_ns median: 43707.674388987 -> 43725.749214815 (+0.04%)
Bench: /tmp/ab_bench_score.py ab73b_writevarint_fastpath_6byte_p24 --pairs 24

score geomean_ns median: 128.608289348 -> 126.638221393 (-1.53%)

serialize_perf geomean_ns median: 6.975603185 -> 6.811420432 (-2.35%)

assoc_rw geomean_ns median: 43707.099246200 -> 43745.066824529 (+0.09%)

# Conflicts:
#	src/serialize.h

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# interactive rebase in progress; onto 2069ee5b87
# Last commands done (3 commands done):
#    edit 70ac3174a9 serialize: fast-path 5-byte varint writes
#    edit 651423670b serialize: fast-path 6-byte varint writes
# Next commands to do (5 remaining commands):
#    edit 05677d5bd5 serialize: inline and fast-path 7-byte varint writes
#    edit a2b0033b57 serialize: fast-path 8-byte varint writes
# You are currently rebasing branch 'codex/pr31868-serialize-opt' on '2069ee5b87'.
#
# Changes to be committed:
#	modified:   src/serialize.h
#
# Untracked files:
#	baseline-serialize-2.json
#	baseline-serialize-suite.json
#	baseline-serialize.json
#	digest_fit.py
#	output.log
#	result-rpi5-16-2-clang.txt
#	result-rpi5-16-2-gcc.txt
#	test/cache/
#
Bench: /tmp/ab_bench_score.py ab76b_writevarint_fastpath_7_always_inline_p24 --pairs 24

score geomean_ns median: 126.693922531 -> 123.529060714 (-2.50%)

serialize_perf geomean_ns median: 6.821017413 -> 6.570016636 (-3.68%)

assoc_rw geomean_ns median: 43706.512507994 -> 43743.488301889 (+0.08%)
#include <optional>
#include <vector>

static void SizeComputerBlock(benchmark::Bench& bench) {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8339284: bench: measure block (size)serialization speed

pending comment


static void SizeComputerBlock(benchmark::Bench& bench) {
CBlock block;
DataStream(benchmark::data::block413567) >> TX_WITH_WITNESS(block);
Copy link
Copy Markdown
Owner Author

@l0rinc l0rinc Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no header for some reason?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant