This is a benchmarking suite for data loading libraries.
This repository contains the code and experiments accompanying the paper:
@inproceedings{ofeidis2024overview,
title={An overview of the data-loader landscape: Comparative performance analysis},
author={Ofeidis, Iason and Kiedanski, Diego and Tassiulas, Leandros},
booktitle={2024 IEEE International Conference on Big Data (BigData)},
pages={360--367},
year={2024},
organization={IEEE}
}
If you use this repository in your research or find it helpful, please cite our work.
Create a .env file with the following information
DOCKER_NAME=<name of org>/<name of container>:<version>
DYNACONF_AWS_ACCESS_KEY_ID=<aws id>
DYNACONF_AWS_SECRET_ACCESS_KEY=<aws secret>
DYNACONF_BUCKET_NAME=<bucket name in aws> # needs to exist before running experiments- Clone this repository
- Build the docker container:
./scripts/build.sh - Run the container:
./scripts/run.sh - Run all the experiments:
./experiments/run_all.sh
- Create the file
~/.aws/credentialswith the following content:
[default]
aws_access_key_id = <aws id>
aws_secret_access_key = <aws secret>- Make sure that an S3 bucket is created with the name defined above and that it is accessible with the credentials provided.
- Download the
get_ecrscript to fetch the latest docker image:wget https://raw.githubusercontent.com/kiedanski/dataloader-benchmarks/main/scripts/get_erc.sh && chmod +x get_ecr.sh - Download the latest docker image locally:
./get_ecr.sh - Download the run script:
wget https://raw.githubusercontent.com/kiedanski/dataloader-benchmarks/main/scripts/run.sh && chmod +x run.sh - Execute the run command to get into the docker container:
./run.sh - Run all the experiments:
./experiments/run_all.sh
Inside the container run:
python src/plots/download_results.pypython src/plots/generate_plots.py
| Pytorch | FFCV | Hub | Deep Lake | Torchdata | Webdataset | Squirrel | NVIDIA DALI | ||
|---|---|---|---|---|---|---|---|---|---|
| CIFAR-10 | default | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| remote | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❓ | ❌ | |
| filtering | ✅ | ❓ | ✅ | ✅ | ✅ | ✅ | ❓ | ❌ | |
| multi-gpu | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| RANDOM | default | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| remote | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❓ | ❌ | |
| filtering | ✅ | ❓ | ✅ | ✅ | ✅ | ✅ | ❓ | ❌ | |
| multi-gpu | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| CoCo | default | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| remote | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ❓ | ❌ | |
| filtering | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❓ | ❌ | |
| multi-gpu | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |