Skip to content

irmin: use a more compact accumulator for unique tree traversals#1770

Merged
craigfe merged 6 commits intomainfrom
craigfe@irmin-data-library
Mar 15, 2022
Merged

irmin: use a more compact accumulator for unique tree traversals#1770
craigfe merged 6 commits intomainfrom
craigfe@irmin-data-library

Conversation

@craigfe
Copy link
Member

@craigfe craigfe commented Feb 12, 2022

This PR improves the memory usage of Tree.fold ~unique:`True by a factor of two by introducing a more efficient hashset implementation for small fixed-length strings. This is a first step towards factoring the Tezos snapshot export logic into Irmin core (and so considerably simplifying lib_context). Provided in two commits:

  1. introduces a new internal irmin.data library containing the hashset implementation. The hashset uses open addressing inside a bigstring with a relatively high maximum load factor in order to get close-to-optimal compactness without much runtime overhead.

  2. switches Tree.fold to use this hashset implementation rather than the previous (hash, unit) Stdlib.Hashtbl.t.

The implementation comes with a basic benchmark that compares it to Stdlib.Hashbtl.t:

output

For now I've also included a Python script that takes the .csv output from the benchmark and generates the above graphs.

@codecov-commenter
Copy link

Codecov Report

Merging #1770 (44dbf18) into main (1d83f08) will increase coverage by 0.11%.
The diff coverage is 75.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1770      +/-   ##
==========================================
+ Coverage   67.48%   67.59%   +0.11%     
==========================================
  Files         100      101       +1     
  Lines       12974    13063      +89     
==========================================
+ Hits         8755     8830      +75     
- Misses       4219     4233      +14     
Impacted Files Coverage Δ
src/irmin/hash.ml 54.54% <11.11%> (-16.29%) ⬇️
src/irmin/tree.ml 81.26% <20.00%> (+0.04%) ⬆️
src/irmin-tezos/schema.ml 82.92% <50.00%> (-2.08%) ⬇️
src/irmin/data/fixed_size_string_set.ml 86.41% <86.41%> (ø)
src/irmin-unix/fs.ml 67.74% <0.00%> (+3.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d83f08...44dbf18. Read the comment docs.

@Ngoguey42 Ngoguey42 mentioned this pull request Feb 17, 2022
@Ngoguey42 Ngoguey42 mentioned this pull request Feb 25, 2022
@icristescu
Copy link
Contributor

@craigfe can you move the python scripts (maybe at https://github.com/tarides/tezos-storage-bench or https://github.com/tarides/irmin-tezos?) so that we can merge this

@craigfe craigfe force-pushed the craigfe@irmin-data-library branch from fbdba25 to 0c2619f Compare March 15, 2022 17:12
@craigfe
Copy link
Member Author

craigfe commented Mar 15, 2022

Rebased and dealt with the outstanding comments. Will merge once the CI is passing to unblock #1757.

@craigfe craigfe merged commit e54478b into main Mar 15, 2022
@samoht samoht deleted the craigfe@irmin-data-library branch March 27, 2022 10:48
icristescu pushed a commit to icristescu/opam-repository that referenced this pull request Mar 28, 2022
…min-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-chunk and irmin-bench (3.2.0)

CHANGES:

### Added

- **irmin-pack**
  - Add `forbid_empty_dir_persistence` in store configuration. (mirage/irmin#1789,
    @Ngoguey42)
  - Add `Store.Snapshot` to expose the inodes for tezos snapshots (mirage/irmin#1757,
    @icristescu).

### Changed

- **irmin**
  - Add error types in the API or proof verifiers. (mirage/irmin#1791, @icristescu)
  - Reduced the memory footprint of ``Tree.fold ~uniq:`True`` by a factor of 2.
    (mirage/irmin#1770, @craigfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants