Skip to content

CIDv2 idea: include the heights of trees in the CID #62

@EdSchouten

Description

@EdSchouten

The other week I was brainstorming whether it would be possible to use IPLD as a potential data storage/transfer format for a future version of Bazel’s Remote Execution protocol, most notably its Content Addressable Storage (CAS). See bazelbuild/remote-apis#250 for details. Where this use case differs from the IPFS is that Bazel remote execution follows a more traditional client-server model. There is no immediate intent to use peer-to-peer sharing of objects.

One thing I was thinking about, was what an efficient algorithm for replicating a DAG from a client to a server (i.e., uploading source code to build), or from a server to a client (i.e., downloading build outputs) would look like. Considering that IPLD/IPFS relies on chunking more heavily compared to what Bazel does right now, it’d be pretty important for build clients/servers to use heavy parallelism to transfer such data across.

That said, you do want to place bounds to the amount of parallelism to prevent exhaustion in case of large data sets. It’s fine for a DAG to have large fanout, and it’s fine for a DAG to be deep. But if a DAG has large fanout and is very deep, then it might be necessary to limit the amount of parallelism traversing the DAG to avoid keeping too many partially replicated blocks in memory.

One piece of information that would be very useful to have to be able to implement a properly bounded parallel replication algorithm is tree height. If this information was attached to every link, then a replication algorithm could at any point make smart choices on how aggressively fan out when traversing.

Unfortunately, this information is not part of CIDv0/CIDv1, meaning that if a storage system using IPLD wants to use such information, it would need to track it out of band. Alternative, one could use IPLD with a custom link system, but that has all sorts of unfortunate implications, such as being unable to export build results into IPFS for archiving purposes.

My question is therefore whether a future version of CID, if ever created, could include tree height (in blocks) as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions