Skip to content

Expose a type for snapshots in inodes#1757

Merged
icristescu merged 1 commit intomirage:mainfrom
icristescu:expose_inode
Mar 24, 2022
Merged

Expose a type for snapshots in inodes#1757
icristescu merged 1 commit intomirage:mainfrom
icristescu:expose_inode

Conversation

@icristescu
Copy link
Contributor

@icristescu icristescu commented Feb 6, 2022

I added a test that shows how this is used in tezos, and also tarides/tezos#19

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2022

Codecov Report

Merging #1757 (5240d7d) into main (e54478b) will decrease coverage by 0.09%.
The diff coverage is 57.27%.

@@            Coverage Diff             @@
##             main    #1757      +/-   ##
==========================================
- Coverage   64.63%   64.54%   -0.10%     
==========================================
  Files         116      118       +2     
  Lines       14159    14375     +216     
==========================================
+ Hits         9152     9278     +126     
- Misses       5007     5097      +90     
Impacted Files Coverage Δ
src/irmin-pack/mem/irmin_pack_mem.ml 63.88% <16.66%> (-9.45%) ⬇️
src/irmin-pack/unix/inode.ml 33.33% <33.33%> (ø)
src/irmin-pack/inode.ml 78.65% <52.38%> (-1.88%) ⬇️
src/irmin-pack/unix/snapshot.ml 53.78% <53.78%> (ø)
src/irmin-pack/unix/pack_store.ml 85.00% <83.33%> (-0.19%) ⬇️
src/irmin-pack/unix/ext.ml 65.41% <95.65%> (+6.32%) ⬆️
src/irmin-git/atomic_write.ml 82.11% <0.00%> (-1.33%) ⬇️
src/irmin-test/store.ml 94.93% <0.00%> (-0.06%) ⬇️
src/irmin/commit.ml 63.98% <0.00%> (+0.32%) ⬆️
src/irmin-unix/fs.ml 68.38% <0.00%> (+0.64%) ⬆️
... and 1 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@icristescu icristescu force-pushed the expose_inode branch 2 times, most recently from a47de7b to 0d7ce97 Compare February 16, 2022 10:51
@icristescu
Copy link
Contributor Author

thanks for the suggestions @Ngoguey42

@Ngoguey42 Ngoguey42 mentioned this pull request Feb 17, 2022
@Ngoguey42 Ngoguey42 mentioned this pull request Feb 25, 2022
@icristescu icristescu marked this pull request as draft March 1, 2022 14:28
@icristescu icristescu added area/backend About the backend users/tezos Related to Tezos users labels Mar 1, 2022
@icristescu icristescu force-pushed the expose_inode branch 4 times, most recently from 48227d5 to ebf3177 Compare March 8, 2022 15:07
@icristescu icristescu marked this pull request as ready for review March 8, 2022 15:37
@icristescu
Copy link
Contributor Author

icristescu commented Mar 8, 2022

The on_disk options use an index with the same log_size as the store (2_500_000).

  • export rolling
 |                          | master1  |    master2    |     mem1      |     mem2      |     disk1      |     disk2
 | --                       | --       | --            | --            | --            | --             | --
 | -- main metrics --       |          |               |               |               |                | 
 | CPU time elapsed         |    3m54s |    3m37s  93% |    2m49s  72% |    2m50s  73% |     6m50s 175% |     6m50s 175%
 | Wall time elapsed        |    4m36s |    4m18s  93% |    3m33s  77% |    3m37s  79% |     7m23s 160% |     7m24s 161%
 | TZ-transactions per sec  |        0 |        0 -nan |        0 -nan |        0 -nan |         0 -nan |         0 -nan
 | TZ-operations per sec    |        0 |        0 -nan |        0 -nan |        0 -nan |         0 -nan |         0 -nan
 | Context.add per sec      |        0 |        0 -nan |        0 -nan |        0 -nan |         0 -nan |         0 -nan
 | tail latency (1)         |      n/a |      n/a  nan |      n/a  nan |      n/a  nan |       n/a  nan |       n/a  nan
 |                          |          |               |               |               |                | 
 | -- resource usage --     |          |               |               |               |                | 
 | disk IO (total)          |          |               |               |               |                | 
 |   IOPS (op/sec)          |  305_880 |  328_602 107% |  729_855 239% |  721_967 236% |   994_263 325% |   994_407 325%
 |   throughput (bytes/sec) | 21.823 M | 23.444 M 107% | 52.018 M 238% | 51.456 M 236% | 336.519 M  15x | 336.551 M  15x
 |   total (bytes)          |  5.097 G |  5.097 G 100% |  8.773 G 172% |  8.773 G 172% | 137.840 G  27x | 137.832 G  27x
 | disk IO (read)           |          |               |               |               |                | 
 |   IOPS (op/sec)          |  305_880 |  328_602 107% |  729_855 239% |  721_967 236% |   994_250 325% |   994_395 325%
 |   throughput (bytes/sec) | 21.823 M | 23.444 M 107% | 52.018 M 238% | 51.456 M 236% | 326.256 M  15x | 326.279 M  15x
 |   total (bytes)          |  5.097 G |  5.097 G 100% |  8.773 G 172% |  8.773 G 172% | 133.636 G  26x | 133.625 G  26x
 | disk IO (write)          |          |               |               |               |                | 
 |   IOPS (op/sec)          |        0 |        0 -nan |        0 -nan |        0 -nan |        13  inf |        13  inf
 |   throughput (bytes/sec) |        0 |        0 -nan |        0 -nan |        0 -nan |  10.263 M  inf |  10.272 M  inf
 |   total (bytes)          |        0 |        0 -nan |        0 -nan |        0 -nan |   4.204 G  inf |   4.207 G  inf
 |                          |          |               |               |               |                | 
 | max memory usage (bytes) |  7.492 G |  7.136 G  95% |  1.622 G  22% |  1.622 G  22% |   0.541 G 7.2% |   0.508 G 6.8%
 | mean CPU usage           |      85% |      84%      |      79%      |      79%      |       92%      |       92%     

The size of the snapshot on disk obtained after the export with master is 3.6G and 3.7G with the new branch (but it is rolling probably the difference is more important in the full case).

  • import rolling
 |                          |  master1  |    master2     |     mem1      |     mem2      |     disk1     |     disk2     |  disk+always1  |  disk+always2
 | --                       | --        | --             | --            | --            | --            | --            | --             | --
 | -- main metrics --       |           |                |               |               |               |               |                | 
 | CPU time elapsed         |     8m33s |     8m41s 102% |    4m19s  50% |    4m21s  51% |    6m14s  73% |    6m13s  73% |     7m30s  88% |     7m30s  88%
 | Wall time elapsed        |     8m32s |     8m40s 102% |    4m29s  53% |    4m31s  53% |    6m12s  73% |    6m11s  73% |     7m29s  88% |     7m28s  88%
 | TZ-transactions per sec  |         0 |         0 -nan |        0 -nan |        0 -nan |        0 -nan |        0 -nan |         0 -nan |         0 -nan
 | TZ-operations per sec    |         0 |         0 -nan |        0 -nan |        0 -nan |        0 -nan |        0 -nan |         0 -nan |         0 -nan
 | Context.add per sec      |     2.182 |     2.148  98% |    4.332 198% |    4.295 197% |    2.997 137% |    3.002 138% |     2.487 114% |     2.488 114%
 | tail latency (1)         |   0.032 s |   0.019 s  59% |  0.014 s  45% |  0.020 s  64% |  0.018 s  58% |  0.012 s  38% |   0.025 s  78% |   0.019 s  61%
 |                          |           |                |               |               |               |               |                | 
 | -- resource usage --     |           |                |               |               |               |               |                | 
 | disk IO (total)          |           |                |               |               |               |               |                | 
 |   IOPS (op/sec)          |   525_307 |   517_185  98% |  118_896  23% |  117_891  22% |  452_969  86% |  453_657  86% |   584_282 111% |   584_580 111%
 |   throughput (bytes/sec) | 193.181 M | 190.203 M  98% | 13.208 M 6.8% | 13.097 M 6.8% | 50.549 M  26% | 50.621 M  26% | 184.648 M  96% | 184.679 M  96%
 |   total (bytes)          |  99.140 G |  99.156 G 100% |  3.415 G 3.4% |  3.415 G 3.4% | 18.889 G  19% | 18.885 G  19% |  83.168 G  84% |  83.143 G  84%
 | disk IO (read)           |           |                |               |               |               |               |                | 
 |   IOPS (op/sec)          |   525_142 |   517_022  98% |   43_419 8.3% |   43_052 8.2% |  400_729  76% |  401_330  76% |   497_698  95% |   497_955  95%
 |   throughput (bytes/sec) | 175.879 M | 173.147 M  98% |  1.468 M 0.8% |  1.455 M 0.8% | 27.072 M  15% | 27.113 M  15% | 164.691 M  94% | 164.711 M  94%
 |   total (bytes)          |  90.261 G |  90.264 G 100% |  0.379 G 0.4% |  0.379 G 0.4% | 10.116 G  11% | 10.115 G  11% |  74.179 G  82% |  74.154 G  82%
 | disk IO (write)          |           |                |               |               |               |               |                | 
 |   IOPS (op/sec)          |       166 |       163  98% |   75_477 456x |   74_839 452x |   52_240 316x |   52_327 316x |    86_584 523x |    86_625 523x
 |   throughput (bytes/sec) |  17.302 M |  17.056 M  99% | 11.741 M  68% | 11.642 M  67% | 23.478 M 136% | 23.508 M 136% |  19.958 M 115% |  19.968 M 115%
 |   total (bytes)          |   8.879 G |   8.892 G 100% |  3.036 G  34% |  3.036 G  34% |  8.773 G  99% |  8.770 G  99% |   8.989 G 101% |   8.990 G 101%
 |                          |           |                |               |               |               |               |                | 
 | max memory usage (bytes) |   4.701 G |   4.660 G  99% |  5.207 G 111% |  5.113 G 109% |  0.503 G  11% |  0.485 G  10% |   0.482 G  10% |   0.531 G  11%
 | mean CPU usage           |      100% |      100%      |      96%      |      96%      |     101%      |     100%      |      100%      |      100%     


The disk+always is importing the store with the always indexing strategy and reusing the index to track visited inodes.
The new branch stores all visited hashes including all the inodes, whereas before we stored only the root nodes (so this probably explains the diff in maxrss btw master and mem).

The corresponding tezos commit is [here] (icristescu/tezos@58f6f3d).

Thanks @Ngoguey42 and @maiste for the recording branch, very useful for benchmarking :)

Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review

- if [on_disk] is [`Reuse] the store's index is reused. *)

val export :
?on_disk:on_disk ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ~on_disk:`Reuse make sense in the context of export?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it doesn't, I wanted to reuse the same type for import and export.
I replaced on_disk with string for export. I don't like so much, because its less clear, but if you think its better I'll leave it like this.

Copy link
Contributor

@Ngoguey42 Ngoguey42 Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove the `Reusable tag from the current types. That way most of the symmetry is kept

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant removing `Reusable from export only, and not from import.

?on_disk:[ `Path of string ] in export and ?on_disk:[ `Path of string | `Reuse ] in import.

Reuse was a good feature, I think it should be in this PR. Sorry for the ambiguous message :(

Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part of my review :)

I still have to review the inode code

in
let entry_of_address = function
| Compress.Offset offset -> entry_of_offset offset
| Hash h -> entry_of_hash h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Reaching this line is the sign that h is dangling.

In Tezos it only happens when importing a full snapshot because of a small bug in check_protocol_commit_consistency that was fixed since.

You can leave your code as is.

Comment on lines 1712 to 1720
let get_key ~index h =
match index h with
| None ->
Fmt.failwith
"get_key: You are trying to save to the backend an inode \
deserialized using [Irmin.Type] that used to contain pointer(s) \
to inodes which are unknown to the backend. Hash: %a"
pp_hash h
| Some key -> key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better for index to have the index:(hash -> key) signature (instead of index:(hash -> key option)). You could then raise the exception from a place that has more context.

Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

@icristescu icristescu merged commit 021a6a5 into mirage:main Mar 24, 2022
@icristescu icristescu deleted the expose_inode branch March 24, 2022 18:00
icristescu pushed a commit to icristescu/opam-repository that referenced this pull request Mar 28, 2022
…min-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-chunk and irmin-bench (3.2.0)

CHANGES:

### Added

- **irmin-pack**
  - Add `forbid_empty_dir_persistence` in store configuration. (mirage/irmin#1789,
    @Ngoguey42)
  - Add `Store.Snapshot` to expose the inodes for tezos snapshots (mirage/irmin#1757,
    @icristescu).

### Changed

- **irmin**
  - Add error types in the API or proof verifiers. (mirage/irmin#1791, @icristescu)
  - Reduced the memory footprint of ``Tree.fold ~uniq:`True`` by a factor of 2.
    (mirage/irmin#1770, @craigfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend About the backend users/tezos Related to Tezos users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants