GitHub - g-bertozzi/Ocean-Hackathon-Datasets

Filename: coastal-radar/multithread-vectors-fetcher.py

This script downloads and manages hourly coastal radar vector data products from Ocean Networks Canada (ONC) using the ONC Python SDK. It supports multi-threaded downloads, periodic manifest saving, and mid-run recovery after interruptions.

Features

Multi-threaded downloading (default: 10 worker threads)

Download manifest (manifest.csv) tracks each file and its status (success / failed)

Automatic retry on rerun: only missing or failed files are re-queued

Periodic manifest saving (default: every 90 seconds)

Provenance file (provenance.yaml) records API calls and dataset context

Mid-run interrupt support: rerun safely after stopping

Download Layout

coastal-radar/ data/ vectors(2024)/ # downloaded .tuv files metadata/ vectors(2024)/ manifest.csv # download status tracking provenance.yaml # provenance and API metadata

Manifest Behavior

manifest.csv is the record of all attempted downloads.

On rerun:

success → skipped

failed or missing → retried

Manifest is updated continuously during execution.

Reruns and Interruptions

If the script stops or fails, rerun with the same time period.

Already successful files are skipped.

Failed files are retried automatically.

Changing Time Periods

Using the same period repeatedly is safe.

Changing the period without clearing the manifest will merge old and new requests. Old files remain in the manifest, new ones are added.

Best practice for testing:

Run on a short subset of the intended period (e.g. one week).

If successful, re-run for the full period.

Resetting Progress

To force a complete re-download:

    Delete metadata/vectors(YYYY)/manifest.csv.

    Optionally delete any files in data/vectors(YYYY)/.

    Re-run with desired time period.

Concurrency and Retry Logic

Downloads run in parallel worker threads.

A queue distributes files to workers.

Each worker updates the manifest with success or failed status.

Failed files remain in the manifest and are retried on rerun.

A background thread saves the manifest periodically to preserve progress.

Best Practices

Keep the time period fixed for production runs.

For testing, run a small subset first, then clear manifest and files before a full run.

Rerun with the same date range if recovering from an interruption.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
ONC-github		ONC-github
boat-traffic		boat-traffic
coastal-radar		coastal-radar
ship-classification		ship-classification
.gitignore		.gitignore
readme.md		readme.md
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

g-bertozzi/Ocean-Hackathon-Datasets

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages