Skip to content

Question: How did you mitigate ERA5 I/O bottlenecks / huge dataset reads? #180

@liname3

Description

@liname3

Hi, authors!

I’m trying to fine-tune your excllent aurora model to my downstream tasks and I’m running into a serious I/O bottleneck with ERA5 due to the dataset size—data loading dominates and GPU utilization stays low. At the moment I store each training sample as HDF5 file, but I suspect the per-file overhead and random reads are killing throughput. Could you share how you handled the ERA5 “huge data + I/O bottleneck” problem in your pipeline?

If you have scripts/configs or pointers in the repo for preprocessing and recommended layout, I’d really appreciate them—thanks!

Thank you again for your work and your time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions