Skip to content

My thoughts on this project #9

@martindurant

Description

@martindurant

This was recently brought to my attention. I am glad that you are able to get better performance than standard fsspec.

First a couple of notes

  • fsspec provides multiple possible (memory) caching mechanisms. The default is "readahead", which is good for typical access patterns on more or less sequential reading, but poor for HDF5. "First" is often better, if most of the metadata is near the start of the file
  • fsspec also has a file-based cache, either whole files or partial files
  • the kerchunk project can scan the metadata once for HDF5 files, and store them elsewhere (e.g., JSON file), and provide fast, parallel reads thereafter

Secondly, may I suggest that you consider upstreaming this code to fsspec, so that many users can get automatic access? It could even become the default caching mechanism for HDF5 in the same way that fsspec provides a parquet module optimised to that format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions