Skip to content

slow transfer speeds from URL sources #113

@axelmagn

Description

@axelmagn

I am working on ingesting the RPV2 dataset onto GCS buckets using GCP storage transfer jobs. Speeds seem to be incredibly slow (on the order of 100KB/s - 1MB/s), and at this rate it will take on the order of weeks to transfer the files. There's still a possibility that the bottleneck is on my end, but more and more it's looking like the host is either throttling connections or overloaded on I/O.

Can you shed any light on how this dataset is hosted, or what the best transfer methods would be at scale? I've already prototyped out a small pipeline on sampled data, and would like to scale it up in a reasonable timeframe.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions