irmin: use a more compact accumulator for unique tree traversals#1770
Merged
irmin: use a more compact accumulator for unique tree traversals#1770
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1770 +/- ##
==========================================
+ Coverage 67.48% 67.59% +0.11%
==========================================
Files 100 101 +1
Lines 12974 13063 +89
==========================================
+ Hits 8755 8830 +75
- Misses 4219 4233 +14
Continue to review full report at Codecov.
|
Ngoguey42
reviewed
Feb 14, 2022
Closed
Closed
Contributor
|
@craigfe can you move the python scripts (maybe at https://github.com/tarides/tezos-storage-bench or https://github.com/tarides/irmin-tezos?) so that we can merge this |
fbdba25 to
0c2619f
Compare
Member
Author
|
Rebased and dealt with the outstanding comments. Will merge once the CI is passing to unblock #1757. |
icristescu
pushed a commit
to icristescu/opam-repository
that referenced
this pull request
Mar 28, 2022
…min-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-chunk and irmin-bench (3.2.0) CHANGES: ### Added - **irmin-pack** - Add `forbid_empty_dir_persistence` in store configuration. (mirage/irmin#1789, @Ngoguey42) - Add `Store.Snapshot` to expose the inodes for tezos snapshots (mirage/irmin#1757, @icristescu). ### Changed - **irmin** - Add error types in the API or proof verifiers. (mirage/irmin#1791, @icristescu) - Reduced the memory footprint of ``Tree.fold ~uniq:`True`` by a factor of 2. (mirage/irmin#1770, @craigfe)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the memory usage of
Tree.fold ~unique:`Trueby a factor of two by introducing a more efficient hashset implementation for small fixed-length strings. This is a first step towards factoring the Tezos snapshot export logic into Irmin core (and so considerably simplifyinglib_context). Provided in two commits:introduces a new internal
irmin.datalibrary containing the hashset implementation. The hashset uses open addressing inside a bigstring with a relatively high maximum load factor in order to get close-to-optimal compactness without much runtime overhead.switches
Tree.foldto use this hashset implementation rather than the previous(hash, unit) Stdlib.Hashtbl.t.The implementation comes with a basic benchmark that compares it to
Stdlib.Hashbtl.t:For now I've also included a Python script that takes the
.csvoutput from the benchmark and generates the above graphs.