Skip to content

Inline small contents in nodes#2390

Open
cuihtlauac wants to merge 19 commits intomirage:eiofrom
cuihtlauac:inline-small-objects-v2
Open

Inline small contents in nodes#2390
cuihtlauac wants to merge 19 commits intomirage:eiofrom
cuihtlauac:inline-small-objects-v2

Conversation

@cuihtlauac
Copy link

Summary

This PR continues the work from #2378, implementing inline contents for irmin-pack. Small content values (≤13 bytes raw, <16 bytes serialized) are stored directly within parent node entries rather than as separate pack file entries.

Key changes:

  • Add inline_contents configuration option (default: false)
  • Introduce Inode_v3 format with new magic bytes (S for root, T for non-root)
  • Extend node value type with Contents_inlined variant
  • Update tree module to check content size and inline when appropriate

Benefits:

  • Reduced storage overhead for small values
  • Fewer I/O operations (one read instead of two)
  • Better cache locality

Limitations:

  • Enabling inlining changes node hashes (stores not hash-compatible)
  • 16-byte threshold is fixed
  • Not retroactive (existing contents not automatically inlined)

Test plan

  • All existing tests pass with inlining disabled
  • New inlined_contents test suite validates correctness with inlining enabled
  • Structure verification confirms small contents are stored inline
  • Manual testing with real-world data

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

@cuihtlauac cuihtlauac force-pushed the inline-small-objects-v2 branch from 5dbb839 to b8486dd Compare January 23, 2026 15:58
clecat and others added 19 commits January 29, 2026 08:42
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Call Contents.export in on_contents_inlined to cache the key
- Handle Contents_inlined_3 in add_node_map like regular Contents

Inlined contents are now correctly saved and retrieved.
All inline contents tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Regenerate V3 test archives with new node hashes resulting from the
node_entry format change (adding inlined field). The hash computation
for nodes containing node references has changed.

Changes:
- Add generators (gen_minimal.ml, gen_always.ml, gen_gced.ml) for
  reproducible test archive regeneration
- Regenerate version_3_minimal, version_3_always, version_3_minimal_gced
  archives with Inode_v3 magic bytes ('S', 'T')
- Update hash constants in test_upgrade.ml for n0, c0, n1, c1, c2
- Update expected hex values in test_dispatcher.ml for node_2, commit_2,
  node_3 (hash and magic byte changes)
- Update expected magic byte in test_inode.ml ('R' -> 'S')
- Disable From_v2 upgrade test (V2 archives have incompatible old hashes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add design document (doc/irmin-pack/design/inline_contents.md)
  explaining the motivation, design, and implementation of inlining
  small content values within node entries

- Update version-history.md with Inode_v3 format details and magic
  bytes ('S'/'T' for root/nonroot)

- Enhance conf.mli documentation for inline_contents option

- Add test_inlining_structure test verifying small contents are
  actually stored inline (threshold: 13 bytes raw / 16 bytes serialized)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This implements the core inline contents feature, allowing small content
values (< 16 bytes serialized) to be stored directly within node entries
rather than as separate pack file entries.

Key changes:

Core Irmin:
- Add `Contents_inlined` variant to node value type
- Add `Contents_inlined_3` to tree destruct type
- Extend Tree.of_contents to inline small values when enabled
- Update node interfaces and implementations

Irmin-pack:
- Add Inode_v3_root ('S') and Inode_v3_nonroot ('T') magic bytes
- Add Contents_inlined_value to inode encoding
- Update GC worker, snapshot, and traversal to handle inlined contents
- Update checks and integrity verification

Other:
- Update irmin-git, irmin-graphql, irmin-tezos for new node value type
- Update irmin-pack-tools (tezos_explorer) for new entry kinds
- Update tests to handle inlined contents in assertions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace all numbered variants (Contents_inlined_3, Contents_inlined_5,
Contents_inlined__1, etc.) with a single Contents_inlined name.

OCaml's structural typing for polymorphic variants distinguishes them
by payload type, so the same name works correctly across different
contexts (unit markers, contents+metadata values, keys, hashes).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Benchmark compares write and read performance with and without
inline_contents enabled, for different content sizes (small, medium, large).

Usage: dune exec bench/irmin-pack/bench_inlined_contents.exe -- -n 2000 -c 5 -s small

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This simplifies the tree type by removing Contents_inlined from the
internal representation. The inlining decision is now made at export
time based on serialized content size.

Changes:
- Remove Contents_inlined from tree.t, tree.elt, and related types
- Add should_inline_contents helper in export function
- Inline contents <= 16 bytes at serialization time
- Simplify pattern matches throughout the codebase
- Remove debug print statements

The backend kinded_hash/kinded_key types still support Contents_inlined
for the on-disk format. Import functions convert inlined contents back
to regular Contents values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use Contents.to_value instead of cached_value to ensure value is available
- Add 2 bytes overhead to size check (1 byte variant tag + 1 byte varint length)
- This fixes the inlining structure test which expects 3 inlined contents

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New benchmark suite in test/irmin-pack/bench_inline/ to measure
inline contents performance across different size distributions:

- Uniform (small, large, around-threshold)
- Bimodal (mostly-small, mostly-large)
- Log-normal (small, medium)
- Zipfian (small, medium, steep)

Metrics collected:
- Store size on disk
- Write time
- Read latency percentiles (p50, p99, p999)
- Inlined vs non-inlined content counts

Usage:
  dune exec test/irmin-pack/bench_inline/main.exe -- run --all
  dune exec test/irmin-pack/bench_inline/main.exe -- sweep -t 16

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Benchmarks across multiple size distributions (zipfian, log-normal,
bimodal) showed that 48 bytes provides better read latency improvement
than 16 bytes across most workloads.

Changes:
- Update default inline_contents_max_bytes from 16 to 48
- Make threshold configurable via set_inline_contents_max_bytes
- Add get_inline_contents_max_bytes to query current threshold
- Add 'optimize' command to benchmark for threshold tuning
- Update test expectations for new threshold

Benchmark results showed:
- zipfian-steep: p99 latency 139µs -> 84µs at threshold=48
- zipfian-small: p99 latency 166µs -> 108µs at threshold=48
- log-normal-small: p99 latency 229µs -> 111µs at threshold=48
- mostly-small: p99 latency 162µs -> 87µs at threshold=48

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add README.md with usage instructions for all benchmark commands
- Add gnuplot scripts for generating comparison charts
- Add sample data file and generated PNG plots

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add bench_inline_storage.gp gnuplot script for storage visualization
- Add generate_data.sh to parse CSV output into gnuplot data files
- Add 'make bench-inline' target to run benchmarks and generate plots
- Update README with storage impact documentation and quick start

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add flush before close in Repo.close for read-write stores.
Previously, direct node operations that didn't go through batch()
could leave unflushed data, causing close to fail with Pending_flush.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The kinded_key type changed to support inline contents:
- `Contents of key * metadata` (was `Contents of key`)
- `Contents_inlined of string * metadata` (new)
- `Node of key * key list` (was `Node of key`)

This commit fixes all call sites to use the new tuple format:
- save_tree now returns correct kinded_key with metadata
- Store.key returns full Tree.kinded_key
- Pattern matches updated in examples and tests
- stat.t output updated for Inode_v3

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cuihtlauac cuihtlauac force-pushed the inline-small-objects-v2 branch from 2f5ec29 to 88e9295 Compare January 29, 2026 07:47
@cuihtlauac cuihtlauac marked this pull request as ready for review January 29, 2026 08:51
Comment on lines +36 to +37
let inline_contents_enabled = ref false
let set_inline_contents_enabled v = inline_contents_enabled := v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems error-prone, we should either enable it globally or figure out a better way to configure it on a per-store basis. Maybe it could be stored in the config.

Comment on lines +42 to +44
let inline_contents_max_bytes = ref 48
let set_inline_contents_max_bytes v = inline_contents_max_bytes := v
let get_inline_contents_max_bytes () = !inline_contents_max_bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, maybe we should pick a single default value or store this in the config too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants