Feat: CSIRO Underway data #193

lbesnard · 2025-08-19T06:09:04Z

Draft PR to showcase how CSIRO could create Parquet dataset to be ingested by AODN

codecov-commenter · 2025-08-19T06:13:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.81%. Comparing base (9e1e881) to head (9542303).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #193   +/-   ##
=======================================
  Coverage   66.81%   66.81%           
=======================================
  Files          29       29           
  Lines        4752     4752           
=======================================
  Hits         3175     3175           
  Misses       1577     1577

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lbesnard · 2025-08-19T06:15:16Z

@Rosspet
Steps:

Input CSIRO NetCDF modified with this script
csiro_underway_netcdf_conversion.py. This converts to NetCDF4 and create a TIME variable from the global attributes. Run as for f in `fd . -t f | grep nc`; do ./csiro_underway_netcdf_conversion.py $f;done
Upload the modified NetCDF4 files to local input MinIO, under s3://[INPUT_BUCKET_NAME]/[DATASET_NAME_LOCATION]/ . More info
create the dataset configuration by using one NetCDF file as the point of truth cloud_optimised_create_dataset_config -f s3://[INPUT_BUCKET_NAME]/[DATASET_NAME_LOCATION]/in2017_v02uwy.nc -c parquet -d vessel_underway_csiro --s3fs-opts '{"key": "minioadmin", "secret": "minioadmin", "client_kwargs": {"endpoint_url": "http://localhost:9000"}}'. More info
The config file should be quite similar to the one in this PR, however the one is the PR is more complete and works
What needs to be modified in the config file:
- add_variables (from global attributes, for example voyage and ship_name ; see doc)
- setting global attributes (acknowledgement, citations, project, author ...)
- partition keys (similar to indexes in the database, they're ordered, see doc)
- run_settings with the input and output bucket, the number of files to process at once ...
- drop_variables: Some variables were multidimensional and were being flatten, causing memory explosion. These variables were not needed
run poetry install --with dev and restart the shell if needed
run cloud_optimised_vessel_underway_csiro. This will run locally
check the associated notebook which also points to the local MinIO bucket with

aodn = GetAodn(
   bucket_name="aodn-cloud-optimised",
   prefix="",
   s3_fs_opts={
       "key": "minioadmin",
       "secret": "minioadmin",
       "client_kwargs": {
           "endpoint_url": "http://127.0.0.1:9000"
       }
   }
)

speak to @mhidas and @craigrose from the pipeline core team to see how to implement the lib to process one file at a time

I won't merge this PR, but this is a good working example

Feat: CSIRO Underway data

35285ff

lbesnard self-assigned this Aug 19, 2025

lbesnard added 2 commits August 20, 2025 10:43

Update notebook

d7fc02d

Update notebook

9542303

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: CSIRO Underway data #193

Feat: CSIRO Underway data #193

Uh oh!

lbesnard commented Aug 19, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 19, 2025 •

edited

Loading

Uh oh!

lbesnard commented Aug 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: CSIRO Underway data #193

Are you sure you want to change the base?

Feat: CSIRO Underway data #193

Uh oh!

Conversation

lbesnard commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lbesnard commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lbesnard commented Aug 19, 2025 •

edited

Loading

codecov-commenter commented Aug 19, 2025 •

edited

Loading

lbesnard commented Aug 19, 2025 •

edited

Loading