-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
When generating large-scale SpatialBench datasets (e.g., SF1000 or higher), there is currently no way to write generated data directly to S3. This creates several significant limitations:
- Local Storage bottlenecks: Large-scale datasets can be hundreds of GBs or TBs in size, quickly exhausting local disk space. For example, SF1000 Trip table alone can exceed 500GB.
- Workflow inefficiency: The current workflow requires generating data locally first, then manually uploading to S3 using separate tools (aws cli, rclone, etc.), which is time-consuming and error-prone.
Describe the solution you'd like
Add support for S3 URIs in the --output-dir parameter, enabling the tool to stream generated data directly to S3 without requiring local storage:
# Current workflow:
spatialbench-cli --scale-factor 1000 --output-dir ./data
# Then manually: aws s3 cp ./data s3://my-bucket/spatialbench/sf1000 --recursive
# Proposed workflow:
spatialbench-cli --scale-factor 1000 --output-dir s3://my-bucket/spatialbench/sf1000Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request