Skip to content

g-bertozzi/Ocean-Hackathon-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Filename: coastal-radar/multithread-vectors-fetcher.py

This script downloads and manages hourly coastal radar vector data products from Ocean Networks Canada (ONC) using the ONC Python SDK. It supports multi-threaded downloads, periodic manifest saving, and mid-run recovery after interruptions.

Features

Multi-threaded downloading (default: 10 worker threads)

Download manifest (manifest.csv) tracks each file and its status (success / failed)

Automatic retry on rerun: only missing or failed files are re-queued

Periodic manifest saving (default: every 90 seconds)

Provenance file (provenance.yaml) records API calls and dataset context

Mid-run interrupt support: rerun safely after stopping

Download Layout

coastal-radar/ data/ vectors(2024)/ # downloaded .tuv files metadata/ vectors(2024)/ manifest.csv # download status tracking provenance.yaml # provenance and API metadata

Manifest Behavior

manifest.csv is the record of all attempted downloads.

On rerun:

success → skipped

failed or missing → retried

Manifest is updated continuously during execution.

Reruns and Interruptions

If the script stops or fails, rerun with the same time period.

Already successful files are skipped.

Failed files are retried automatically.

Changing Time Periods

Using the same period repeatedly is safe.

Changing the period without clearing the manifest will merge old and new requests. Old files remain in the manifest, new ones are added.

Best practice for testing:

Run on a short subset of the intended period (e.g. one week).

If successful, re-run for the full period.

Resetting Progress

To force a complete re-download:

    Delete metadata/vectors(YYYY)/manifest.csv.

    Optionally delete any files in data/vectors(YYYY)/.

    Re-run with desired time period.

Concurrency and Retry Logic

Downloads run in parallel worker threads.

A queue distributes files to workers.

Each worker updates the manifest with success or failed status.

Failed files remain in the manifest and are retried on rerun.

A background thread saves the manifest periodically to preserve progress.

Best Practices

Keep the time period fixed for production runs.

For testing, run a small subset first, then clear manifest and files before a full run.

Rerun with the same date range if recovering from an interruption.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages