Question: How did you mitigate ERA5 I/O bottlenecks / huge dataset reads?

**Hi, authors!**
 
I’m trying to fine-tune your excllent aurora model to my downstream tasks and I’m running into a serious I/O bottleneck with ERA5 due to the dataset size—data loading dominates and GPU utilization stays low. At the moment I store each training sample as  HDF5 file, but I suspect the per-file overhead and random reads are killing throughput. Could you share how you handled the ERA5 “huge data + I/O bottleneck” problem in your pipeline?

If you have scripts/configs or pointers in the repo for preprocessing and recommended layout, I’d really appreciate them—thanks!

Thank you again for your work and your time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How did you mitigate ERA5 I/O bottlenecks / huge dataset reads? #180

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: How did you mitigate ERA5 I/O bottlenecks / huge dataset reads? #180

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions