Dataset is significantly slower with multiprocessing

The dataset pipeline looks like this:
[file_path, file_path, ...] -> map(load_func) -> map(random_crop_func) -> shuffle(960) -> batch(96)

And this works faster without parallel calls of functions passed to the map operation: 9 sec. vs 27 sec.

The dataset consists of 6202 files of 256x256 images and some labels.