Skip to content

Sen2-100k dataset size issue #30

@SoundWavesHello

Description

@SoundWavesHello

It seems like all of the potential methods to download the dataset are about 6k samples shy. Using huggingface's load_dataset() option, downloading tar.xz files manually, gif lfs, etc. all run into the same issue as of this date.

The downloaded dataset ends up containing ~94k samples (94164 samples per my most recent attempt at this), which makes attempts to reproduce the work or leverage the excellent dataset/dataloader work done already quite challenging.

If I'm eyeballing it, it looks like the data_10.tar.xz file in the data is the most likely culprit, as the other .tar files over around ~7.8 GB in size, and data_10.tar.xz is 3.25 GB.

It's certainly possible I'm missing something, but I haven't been able to figure out an effective way around this issue. Any assistance in the matter would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions