The dataset pipeline looks like this:
[file_path, file_path, ...] -> map(load_func) -> map(random_crop_func) -> shuffle(960) -> batch(96)
And this works faster without parallel calls of functions passed to the map operation: 9 sec. vs 27 sec.
The dataset consists of 6202 files of 256x256 images and some labels.