small IBD experiments by l0rinc · Pull Request #116 · l0rinc/bitcoin

l0rinc · 2026-02-09T14:44:18Z

What changes	Why it should speed up	Expected IBD speedup	Expected reindex-chainstate speedup
src/validation.cpp: skip coinsdb flush on prune-only flushes during IBD	Avoids expensive coins DB flushes when pruning alone triggered flush	+2% to +8% (prune mode)	+0% to +4% (if prune-active)
src/consensus/tx_verify.cpp: remove separate HaveInputs() pass, use AccessCoin() result directly	Eliminates one full input-lookup pass per tx	+2% to +5%	+2% to +5%
src/net_processing.cpp: skip txdownload maintenance during IBD	Cuts useless txdownload bookkeeping while mempool relay is irrelevant	+0.5% to +2%	~0%
Fill BIP68 prevheights during CheckTxInputs	Removes extra loop of AccessCoin() for sequence-lock prep	+1% to +3%	+1% to +3%
Single-pass UTXO access in GetTransactionSigOpCost	Merges P2SH+witness coin access paths	+1% to +3%	+1% to +3%
src/util/hasher.h: allow cached outpoint hash codes (non-noexcept)	libstdc++ caches hash in unordered_map nodes; fewer recomputes	+0.5% to +2%	+0.5% to +2%
Compute tx sigops during CheckTxInputs in same pass	Avoids another UTXO walk later in ConnectBlock	+2% to +6%	+2% to +6%
Skip prebuilding PrecomputedTransactionData when script checks are off	Avoids unnecessary per-tx precompute in assumevalid/skipped-script regions	+1% to +4%	+0.5% to +3%
src/node/blockstorage.cpp: prune multi-file pass scans block index once	Reduces prune work from repeated scans	+0.5% to +2% overall (plus fewer prune stalls)	+0% to +1.5%
src/coins.cpp: reuse COutPoint in AddCoins loop	Fewer tiny constructions/copies in hot output-add path	+0.2% to +1%	+0.2% to +1%
src/primitives/transaction.cpp: compute txid+wtxid in one serialization pass	Less duplicate serialization/hashing per tx object	+1% to +4%	+1% to +4%
src/dbwrapper.cpp: LevelDB cache split to 3/4 block cache, 1/4 write buffer	Better for read-heavy chainstate lookups	+0% to +4% (host-dependent)	+0% to +4%
src/dbwrapper.cpp: block_restart_interval=8	Faster key seeks inside LevelDB blocks (denser restart points)	Measured windows: +5.7% to +26.8%	Likely similar order (+5% to +15%)
src/dbwrapper.cpp: Bloom bits/key 10 -> 12	Lower false positives => fewer wasted block reads	Measured windows: +16% to +35%	Likely similar order (+8% to +20%)

When continuing a reindex scan over already-indexed block files, ImportBlocks and LoadExternalBlockFile emit an info log per file even if no blocks are loaded. On this machine those scans frequently run with nLoaded=0 and sub-second file times, which produces large debug.log churn with little diagnostic value. This change keeps progress visibility while reducing write pressure: - only emit the "Reindexing block file" info log when the integer percent changes - downgrade "Loaded X blocks ..." to debug for the common nLoaded=0, <1s case - keep info logs for useful signals (loaded blocks or slow files) Local check (same datadir, 20s startup window): - debug.log line growth: 183 -> 89 lines - LoadExternalBlockFile bench stays in the same range after rebuild (~136 ns/op in this environment).

When LoadExternalBlockFile sees a block that is already indexed with BLOCK_HAVE_DATA, we only need to advance to the next block marker.\n\nPreviously we advanced with BufferedFile::SkipTo(), which is rewind-preserving and pulls bytes through fread+obfuscation as it walks forward. In no-op reindex passes this does unnecessary work over known block payloads.\n\nThis change adds BufferedFile::FastSkipNoRewind() and uses it for known blocks (and out-of-order deferrals) so we can seek to the block end directly. Unknown blocks keep the old SkipTo()+rewind flow so deserialization behavior is unchanged.\n\nMeasured on /mnt/my_storage/BitcoinData reindex no-op scan (same machine/config):\n- before: 20.88s average per +1% file progress (0->8%)\n- after: 18.75s average per +1% file progress (0->8%)\n\n~10% faster in the known-block scan path.

WriteBlockUndo previously serialized block undo data twice: once into a HashWriter and again to the undo file. Add a TeeWriter to hash the exact bytes written, then append the checksum. Expected effect: reduce CPU and allocator pressure on IBD/reindex-chainstate without changing on-disk format.

Add a BlockManager option to hint that block/undo file pages can be dropped after use. When running -reindex/-reindex-chainstate or offline (-connect=0), call posix_fadvise(..., DONTNEED) after reading full blocks and after writing undo. Expected effect: reduce page cache pollution and memory pressure during bulk validation, preserving memory for the UTXO cache and LevelDB and lowering OOM risk once the UTXO set no longer fits.

Reindex/IBD on memory-constrained systems slows down sharply once the UTXO set no longer fits in the in-memory cache and lookups fall back to LevelDB. The previous 8 MiB hard cap on coinsdb cache leaves an extremely small block cache and write buffer, increasing disk reads and compaction churn. Raise the cap to 64 MiB to give LevelDB enough working set without meaningfully reducing the UTXO cache for large -dbcache values.

Reduce options.block_restart_interval from 8 to 4. Perf profiles during reindex-chainstate show leveldb::Block::Iter::Seek and key comparisons as a noticeable CPU cost once UTXO lookups spill to disk. Tradeoff: slightly larger table blocks in exchange for faster seeks inside data blocks.

Increase the bloom filter policy from 12 to 14 bits per key. When the in-memory UTXO cache cannot hold the full working set, LevelDB lookups dominate; a lower false-positive rate reduces unnecessary block reads and comparator work. Tradeoff: slightly larger filter blocks.

Increase the block tree DB cache cap from 2 MiB to 8 MiB. This reduces disk reads for block index metadata during long reindex/IBD runs at negligible cost to the in-memory coins cache.

During reindex-chainstate/IBD, periodic FlushStateToDisk() is called every block. When the coins cache approaches its size limit, the LARGE cache state can trigger an empty-cache flush, wiping the in-memory UTXO cache and forcing extended IO-bound periods (major faults dominated by LevelDB ReadBlock via CCoinsViewDB::GetCoin) while the cache warms up again. Only wipe the coins cache when explicitly forced or when it exceeds its configured limit (CRITICAL). Periodic writes still occur on the existing m_next_write schedule, and IF_NEEDED flushes still protect against running over the configured cache size.

Set DEFAULT_KERNEL_CACHE to 550 MiB. Replace the import-mode cache split heuristic with a simple kernel cache split that allows the chainstate LevelDB cache (block cache + write buffers) to scale with -dbcache while still leaving most memory for the in-memory UTXO set: - raise MAX_COINS_DB_CACHE to 512 MiB - allocate coinsdb cache as 1/8 of remaining cache This keeps cache sizing predictable (no special-cased rules in node startup) and improves IO-bound import/reindex scenarios once UTXO lookups spill to disk. Update the oversized dbcache warning test to match the new default.

Mark SaltedOutpointHasher::operator() noexcept so libstdc++ can recalculate hashes during rehash instead of caching hash codes in unordered_map nodes. On this machine this improves CCoinsViewCache density (more txos per MiB in the UpdateTip cache=...MiB(...txo) logs), which should reduce LevelDB GetCoin reads once the UTXO set no longer fits in memory.

During 800k+ offline import/reindex-style runs, the coins cache can exceed the configured limit by a very small amount (allocator/bucket growth granularity). Treating these tiny overshoots as CRITICAL wipes the entire cache and causes long IO-bound warmup periods. Allow up to 16 MiB of overshoot before declaring the cache CRITICAL.

Reindex/IBD spends a lot of time in CCoinsViewDB::GetCoin, which calls CDBWrapper::Read/Exists for small values. The previous implementation constructed a fresh std::string for every leveldb::DB::Get(), creating allocator churn and exacerbating fragmentation once the UTXO set no longer fits in memory. Reuse a per-thread scratch std::string for successful/failed reads. This keeps the same semantics while reducing malloc/free traffic in the hot LevelDB read path.

CDBWrapper::Read() previously copied every leveldb::DB::Get() value into a DataStream (vector-backed) just to deobfuscate and deserialize it. During reindex/IBD, GetCoin is hot and this extra allocation+memcpy shows up as allocator churn. Deobfuscate the std::string buffer in-place and deserialize using SpanReader over the existing bytes. This keeps behavior identical while reducing copies and transient allocations in the LevelDB read path.

On POSIX, LevelDB frequently serves table reads via mmap (PosixMmapReadableFile::Read ignores the scratch buffer and returns a pointer into the mapping). ReadBlock() still unconditionally allocated a heap scratch buffer on every miss, only to immediately free it when the read did not use it. Teach RandomAccessFile to report whether it requires scratch, and skip the allocation for mmap-backed files. This reduces malloc/free traffic and fragmentation in the GetCoin-heavy path once the UTXO set spills to disk.

Compaction shows up as a major CPU+IO cost during reindex/IBD once validation becomes chainstate-lookup bound (e.g. when the UTXO working set no longer fits in the coins cache). Increase LevelDB write_buffer_size from 1/4 to 1/3 of the per-DB cache budget, using the remainder for the block cache.

LevelDB's memtable flush creates level-0 files roughly write_buffer_size in size, independent of options.max_file_size. When write_buffer_size grows far past max_file_size, large level-0 tables overlap wider key ranges and can increase compaction work. Cap write_buffer_size at options.max_file_size while keeping the existing nCacheSize/3 bias. This keeps level-0 file sizes aligned with the configured max_file_size, and allocates the remainder of the per-DB cache budget to the block cache.

On this machine, offline validation becomes dominated by chainstate LevelDB point lookups once the UTXO working set no longer fits in the in-memory cache. perf sampling shows significant overhead in the mmapped table read path (page faults + LRU bookkeeping) while servicing leveldb::ReadBlock(). Disable read-only table mmaps by default (mmap limit 0) so reads use the pread-based RandomAccessFile implementation. Also stop forcing POSIX_FADV_RANDOM on those fds so the kernel can apply readahead heuristics for sequential scans (e.g. compaction) instead of treating all access as uniformly random.

Perf profiles during offline validation show significant time in malloc/free (e.g. _int_malloc, malloc_consolidate) while the coins cache grows. Construct the CCoinsViewCache node allocator resource with a larger chunk size (1 MiB vs the 256 KiB default) to reduce the frequency of aligned operator new() calls without changing the steady-state memory footprint.

Reindex/import becomes IO-bound once the in-memory UTXO cache can no longer hold the working set. Perf profiles on this machine show leveldb::Block::Iter::Seek as the top CPU hotspot with sustained ~3k random reads/s. Increase the coinsdb share from 1/8 to 1/6 of the remaining -dbcache so the LevelDB block cache can absorb more reads and reduce iowait, while still leaving most memory for the in-memory UTXO set.

SpendCoin is typically preceded by AccessCoin/HaveCoin in ConnectBlock. Avoid the try_emplace() path when the coin is already in cache to reduce unordered_map work in the input spending hot path.

Reindex/import becomes dominated by random chainstate lookups once the in-memory UTXO cache no longer holds the working set. Increasing bits-per-key reduces bloom false positives, avoiding unnecessary table block reads and seeks. Tradeoff: slightly larger filter blocks.

CDBWrapper::Read/Exists serialize keys into a DataStream. Reuse a thread_local buffer so hot LevelDB lookups (eg CCoinsViewDB::GetCoin) avoid per-call heap allocations, reducing allocator churn during IBD/reindex when UTXO reads spill to disk.

When the UTXO working set no longer fits in memory, validation becomes dominated by chainstate LevelDB lookups. Reduce unordered_map bucket overhead so more coins fit in the in-memory cache for a given -dbcache, improving hit rate. Tradeoff: slightly more work per lookup due to higher average bucket chain length.

Avoid per-call DataStream allocations when seeking LevelDB iterators by reusing a thread_local key buffer.

EstimateSize() serializes two keys into temporary DataStreams. Reuse thread_local buffers to avoid allocations on repeated calls.

CDBIterator::GetKey previously constructed a DataStream from the underlying key span, copying bytes into a vector. Use SpanReader directly to deserialize the key without an intermediate allocation/copy.

CDBIterator::GetValue previously constructed a DataStream from the value span, allocating/copying each call. Reuse a thread_local DataStream buffer so repeated iteration avoids allocator churn (while still copying to apply obfuscation).

Bloom filter CreateFilter/KeyMayMatch compute bit positions using h % bits. On aarch64, keeping the divisor in 32-bit avoids unnecessary 64-bit division in this hot loop during compactions and filter checks.

During reindex-chainstate, cacheCoins can exceed its target size in sudden steps when unordered_map rehashes. If this pushes the cache just a few MiB over the configured limit, Chainstate enters the CRITICAL state and wipes the UTXO cache, leading to long IO-bound warmup periods. Allow a small fixed overshoot (64 MiB) before treating the cache as CRITICAL. On this machine we observed ~34 MiB overshoot at height ~899131 that triggered a full cache wipe; 64 MiB avoids that while still protecting against genuine runaway memory usage.

Chainstate lookups during reindex/IBD are dominated by point reads from LevelDB table files once the UTXO working set no longer fits in memory. Set POSIX_FADV_RANDOM on the permanent RandomAccessFile fd to reduce readahead and wasted IO on cache-unfriendly workloads. This is best-effort and does not change semantics.

l0rinc · 2026-02-22T08:32:40Z

validation: skip coinsdb flush on prune-only pruned flushes during IBD

5a8a427 Merge bitcoin#32745: scripted-diff: Update DeriveType enum values to mention ranged derivations

2026-02-20 | pruned IBD | 933339 blocks | dbcache 3000 | pruning 550 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD                                  
                                                                                                                                                                                                     
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/ShallowBitcoinData -stopatheight=933339 -dbcache=3000 -blocksonly -prune=550 -printtoconsole=0 (COMMIT = 5a8a427610ed9672f485ebfefbb8e63b55282636)                                                                                                                                                                                
  Time (mean ± σ):     31670.463 s ± 621.972 s    [User: 70829.639 s, System: 4403.924 s]                                                                                                            
  Range (min … max):   31230.663 s … 32110.264 s    2 runs

e17fbef validation: skip coinsdb flush on prune-only pruned flushes during IBD

2026-02-21 | pruned IBD | 933339 blocks | dbcache 3000 | pruning 550 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/ShallowBitcoinData -stopatheight=933339 -dbcache=3000 -blocksonly -prune=550 -printtoconsole=0 (COMMIT = e17fbef7c18d8c6ba5a76a020b9158602d643fe8)
  Time (mean ± σ):     28366.575 s ± 74.929 s    [User: 48106.691 s, System: 2236.688 s]
  Range (min … max):   28313.592 s … 28419.558 s    2 runs

l0rinc · 2026-03-02T16:19:21Z

src/validation.cpp

        const auto empty_cache{(mode == FlushStateMode::FORCE_FLUSH) || fCacheLarge || fCacheCritical};
        // Combine all conditions that result in a write to disk.
        bool should_write = (mode == FlushStateMode::FORCE_SYNC) || empty_cache || fPeriodicWrite || fFlushForPrune;
+        // The coins database write is the most expensive part of a flush during IBD.


l0rinc changed the title ~~contrib: utxo_to_sqlite.py: add options to store txid/spk as BLOBs~~ IBD experiments Feb 9, 2026

l0rinc changed the title ~~IBD experiments~~ small IBD experiments Feb 9, 2026

l0rinc added 28 commits February 19, 2026 15:18

validation: skip coinsdb flush on prune-only pruned flushes during IBD

e17fbef

consensus: avoid redundant utxo lookups in CheckTxInputs

71c86cb

net_processing: skip txdownload maintenance during IBD

dc90f26

validation: fill BIP68 prevheights during CheckTxInputs

1d78f91

consensus: single-pass UTXO access in GetTransactionSigOpCost

110b9ed

util: allow caching outpoint hash codes

43191ac

validation: compute sigops cost during CheckTxInputs

4efed4b

validation: avoid precomputing txdata when script checks are skipped

c6e389c

node: scan block index once when pruning multiple files

eed04ed

coins: reuse outpoint when adding tx outputs

47e4396

tx: compute txid and wtxid in one serialization pass

c8c3372

dbwrapper: bias coinsdb cache toward block cache

99f34a8

dbwrapper: lower LevelDB block restart interval for IBD lookups

4bbd0c7

dbwrapper: increase LevelDB bloom bits for chainstate lookups

cb6d6fc

kernel: raise max block tree DB cache

020fd32

Increase the block tree DB cache cap from 2 MiB to 8 MiB. This reduces disk reads for block index metadata during long reindex/IBD runs at negligible cost to the in-memory coins cache.

blockstorage: reuse raw block buffer in ReadBlock

a855b56

coins: skip unspendable outputs in AddCoins

142dde1

l0rinc added 19 commits February 19, 2026 15:18

coins: SpendCoin fast-path cached lookups

3367d87

SpendCoin is typically preceded by AccessCoin/HaveCoin in ConnectBlock. Avoid the try_emplace() path when the coin is already in cache to reduce unordered_map work in the input spending hot path.

dbwrapper: reuse key buffer for iterator Seek

bdea9e0

Avoid per-call DataStream allocations when seeking LevelDB iterators by reusing a thread_local key buffer.

dbwrapper: reuse key buffers for EstimateSize

ad71b18

EstimateSize() serializes two keys into temporary DataStreams. Reuse thread_local buffers to avoid allocations on repeated calls.

dbwrapper: avoid DataStream copy for iterator keys

42d9690

CDBIterator::GetKey previously constructed a DataStream from the underlying key span, copying bytes into a vector. Use SpanReader directly to deserialize the key without an intermediate allocation/copy.

leveldb: use 32-bit modulus for bloom bit positions

912632a

Bloom filter CreateFilter/KeyMayMatch compute bit positions using h % bits. On aarch64, keeping the divisor in 32-bit avoids unnecessary 64-bit division in this hot loop during compactions and filter checks.

l0rinc force-pushed the detached482 branch from 78dbd1d to 9dbe85f Compare February 19, 2026 14:18

l0rinc closed this Feb 19, 2026

l0rinc reopened this Feb 19, 2026

l0rinc commented Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small IBD experiments#116

small IBD experiments#116
l0rinc wants to merge 47 commits intomasterfrom
detached482

l0rinc commented Feb 9, 2026 •

edited

Loading

Uh oh!

l0rinc commented Feb 22, 2026

Uh oh!

l0rinc Mar 2, 2026

Uh oh!

l0rinc Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l0rinc commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Feb 22, 2026

Uh oh!

l0rinc Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

l0rinc Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l0rinc commented Feb 9, 2026 •

edited

Loading