Skip to content

write imagenet to beton takes 6 hours #400

@Si1kyRiver

Description

@Si1kyRiver

Hi, by using the code below, it takes > 6hours to write imagenet1k to beton, which is nontrivial.

from ffcv.writer import DatasetWriter
from ffcv.fields import RGBImageField, IntField

# Your dataset (`torch.utils.data.Dataset`) of (image, label) pairs
my_dataset = make_my_dataset()
write_path = '/output/path/for/converted/ds.beton'

# Pass a type for each data field
writer = DatasetWriter(write_path, {
    # Tune options to optimize dataset size, throughput at train-time
    'image': RGBImageField(
        max_resolution=256
    ),
    'label': IntField()
})

# Write dataset
writer.from_indexed_dataset(my_dataset)

Is there a solution or any tricks?
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions