Should we import datasets as CAR files?

Related to #18 , I noticed that when we write tests that transfer a dataset (such as a folder / file) as opposed to just random generated bytes, we import the data into UnixFS from the file system (in normal system format). If our intent is to truly test some of the data sets on https://awesome.ipfs.io/datasets/ as they exist on IPFS, the reliable way to bring them in is to export them as CAR files and import into the tests. The reason is a UnixFS import from system files is not gauranteed to produce the exact same DAG or root CID. There are several variables that affect how the DAG is built -- such as chunking strategy, use of raw leaves, etc. The only reliable way to know you the exact same dag is to use CAR files. This might also make sense in terms of writing scripts to download datasets-- as long as IPFS is, unfortunately, not as fast as HTTP on a fast hosted site, it's going to be much more efficient if we can import into the seeds blockstore from a car file on a CDN network -- plus that means we don't have to download ahead of time -- we can probably just include it as part of the test, which makes things more reproducable on CI.

Anyway, curious to get your thoughts @adlrocha -- also does this make sense? It may not be obvious if you haven't worked a lot with UnixFS files and DAG structures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should we import datasets as CAR files? #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should we import datasets as CAR files? #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions