My thoughts on this project

This was recently brought to my attention. I am glad that you are able to get better performance than standard fsspec. 

First a couple of notes
- fsspec provides multiple possible (memory) caching mechanisms. The default is "readahead", which is good for typical access patterns on more or less sequential reading, but poor for HDF5. "First" is often better, if most of the metadata is near the start of the file
- fsspec also has a file-based cache, either whole files or partial files
- the [kerchunk project](https://github.com/fsspec/kerchunk) can scan the metadata once for HDF5 files, and store them _elsewhere_ (e.g., JSON file), and provide fast, parallel reads thereafter

Secondly, may I suggest that you consider upstreaming this code to fsspec, so that many users can get automatic access? It could even become the default caching mechanism for HDF5 in the same way that fsspec provides a parquet module optimised to that format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My thoughts on this project #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

My thoughts on this project #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions