Allow postponing dataset integrity checks in NextGenHDFDataset#1323
Allow postponing dataset integrity checks in NextGenHDFDataset#1323NeoLegends wants to merge 1 commit intomasterfrom
NextGenHDFDataset#1323Conversation
NextGenHDFDataset
8936027 to
a00d0ec
Compare
|
Why do you put "chore:" into the description? Where do you get this from? This is inconsistent to our normal commit messages, so we should not use this. |
|
Does this really need to be a new option? I personally prefer to have fewer options if possible, esp if not really needed. Can't we just always do those integrity checks lazily? |
|
Please check failing tests. Please use PyCharm to directly see those inspections. |
|
I guess @JackTemaki and/or @patrick-wilken should otherwise review this. |
NextGenHDFDatasetNextGenHDFDataset
2f08d76 to
a8ecbbf
Compare
I don't really have an opinion for or against that here -- it's mainly detecting data format errors early vs. saving time. If you're sure your data is good there is no reason to do these checks eagerly. |
Follow-up from #1315, where @JackTemaki noticed the
NextGenHDFDatasetgoes through the entire data on startup to perform integrity checks -- and indeed RETURNN startup w/ that dataset is quite slow. This PR allows delaying these checks to training time, saving startup time at the cost of detecting data errors only later on.