irmin: use a more compact accumulator for unique tree traversals by craigfe · Pull Request #1770 · mirage/irmin

craigfe · 2022-02-12T12:41:12Z

This PR improves the memory usage of Tree.fold ~unique:`True by a factor of two by introducing a more efficient hashset implementation for small fixed-length strings. This is a first step towards factoring the Tezos snapshot export logic into Irmin core (and so considerably simplifying lib_context). Provided in two commits:

introduces a new internal irmin.data library containing the hashset implementation. The hashset uses open addressing inside a bigstring with a relatively high maximum load factor in order to get close-to-optimal compactness without much runtime overhead.
switches Tree.fold to use this hashset implementation rather than the previous (hash, unit) Stdlib.Hashtbl.t.

The implementation comes with a basic benchmark that compares it to Stdlib.Hashbtl.t:

For now I've also included a Python script that takes the .csv output from the benchmark and generates the above graphs.

codecov-commenter · 2022-02-12T12:51:32Z

Codecov Report

Merging #1770 (44dbf18) into main (1d83f08) will increase coverage by 0.11%.
The diff coverage is 75.25%.

@@            Coverage Diff             @@
##             main    #1770      +/-   ##
==========================================
+ Coverage   67.48%   67.59%   +0.11%     
==========================================
  Files         100      101       +1     
  Lines       12974    13063      +89     
==========================================
+ Hits         8755     8830      +75     
- Misses       4219     4233      +14

Impacted Files	Coverage Δ
src/irmin/hash.ml	`54.54% <11.11%> (-16.29%)`	⬇️
src/irmin/tree.ml	`81.26% <20.00%> (+0.04%)`	⬆️
src/irmin-tezos/schema.ml	`82.92% <50.00%> (-2.08%)`	⬇️
src/irmin/data/fixed_size_string_set.ml	`86.41% <86.41%> (ø)`
src/irmin-unix/fs.ml	`67.74% <0.00%> (+3.22%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d83f08...44dbf18. Read the comment docs.

src/irmin/data/fixed_size_string_set.mli

src/irmin/data/fixed_size_string_set.ml

icristescu · 2022-03-14T09:55:39Z

@craigfe can you move the python scripts (maybe at https://github.com/tarides/tezos-storage-bench or https://github.com/tarides/irmin-tezos?) so that we can merge this

craigfe · 2022-03-15T17:21:06Z

Rebased and dealt with the outstanding comments. Will merge once the CI is passing to unblock #1757.

@Ngoguey42

…min-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-chunk and irmin-bench (3.2.0) CHANGES: ### Added - **irmin-pack** - Add `forbid_empty_dir_persistence` in store configuration. (mirage/irmin#1789, @Ngoguey42) - Add `Store.Snapshot` to expose the inodes for tezos snapshots (mirage/irmin#1757, @icristescu). ### Changed - **irmin** - Add error types in the API or proof verifiers. (mirage/irmin#1791, @icristescu) - Reduced the memory footprint of ``Tree.fold ~uniq:`True`` by a factor of 2. (mirage/irmin#1770, @craigfe)

Ngoguey42 reviewed Feb 14, 2022

View reviewed changes

src/irmin/data/fixed_size_string_set.mli Show resolved Hide resolved

src/irmin/data/fixed_size_string_set.mli Show resolved Hide resolved

src/irmin/data/fixed_size_string_set.ml Outdated Show resolved Hide resolved

src/irmin/data/fixed_size_string_set.ml Show resolved Hide resolved

Ngoguey42 mentioned this pull request Feb 17, 2022

Irmin 3.1 #1780

Closed

Ngoguey42 mentioned this pull request Feb 25, 2022

Irmin 3.2 #1784

Closed

craigfe added 5 commits March 15, 2022 17:12

Add a new internal library containing a compact hashset

19a1198

irmin: use a more efficient hashset for unique folds

88f40c7

irmin.data: respond to review comments

392e5b0

irmin.data: remove python script for processing benchmark results

29a3915

irmin.data: rename initial_capacity to initial_slots

0c2619f

craigfe force-pushed the craigfe@irmin-data-library branch from fbdba25 to 0c2619f Compare March 15, 2022 17:12

Add a CHANGES entry

d3c4c4f

craigfe merged commit e54478b into main Mar 15, 2022

craigfe mentioned this pull request Mar 16, 2022

Expose a type for snapshots in inodes #1757

Merged

samoht deleted the craigfe@irmin-data-library branch March 27, 2022 10:48

icristescu mentioned this pull request Mar 28, 2022

[new release] irmin project (3.2.0) ocaml/opam-repository#21038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

irmin: use a more compact accumulator for unique tree traversals#1770

irmin: use a more compact accumulator for unique tree traversals#1770
craigfe merged 6 commits intomainfrom
craigfe@irmin-data-library

craigfe commented Feb 12, 2022

Uh oh!

codecov-commenter commented Feb 12, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icristescu commented Mar 14, 2022

Uh oh!

craigfe commented Mar 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

craigfe commented Feb 12, 2022

Uh oh!

codecov-commenter commented Feb 12, 2022

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icristescu commented Mar 14, 2022

Uh oh!

craigfe commented Mar 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants