I noticed that you can get substantial training speedups by adding eg. num_workers=8 here:
https://github.com/HazyResearch/hyperbolics/blob/master/pytorch/pytorch_hyperbolic.py#L279
I haven't had a chance to test whether this is valid on the other 3 possible DataLoader instantiations.