You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 6, 2026. It is now read-only.
The idea behind this script is to compare incoming netcdf files against a set of known, working files in the Navigator. This will help with catching changes in metadata that have surprised us consistently over the past few years. As of April 30 2020, there will be no core developers around so this will help with automating some of the day-to-day maintenance of the Navigator and reduce the workload on @dwayne-hart. Basically if the script doesn't like any data files, not our problem and they won't be ingested into the Navigator.
Yes, I'm using python here so I don't get yelled at...
Architecture
Command-line script with 3 required arguments: dataset name, template file, and incoming files.
Super minimal conda environment (called data-sentinel), with netcdf4 being the only primary dependency.
Dataset name: corresponds to the dataset_key entry in the template file.
Incoming files: A list of file paths, nothing complicated:
file1.nc
file2.nc
...
Usage
main.py --dataset DATASET --template TEMPLATE [incoming] [incoming] is a list of the files to be tested. This argument may a file path, or standard input (pipe).
e.g.:
Script will read in the template file (json) and iterate over the incoming files, match the name of each files against the filename_regex_pattern's, and compare their metadata to the known file as defined in the template file.
Will provide output as to how many (and which) files passed, failed, or didn't match to any filename_regex_pattern.
Infrastructure
This script will sit in front of the index tool. As in, when we download new data files, this script will be run to create a list of passing files, which in turn will be indexed. Any files that failed can be handled case-by-case.
Intro
The idea behind this script is to compare incoming netcdf files against a set of known, working files in the Navigator. This will help with catching changes in metadata that have surprised us consistently over the past few years. As of April 30 2020, there will be no core developers around so this will help with automating some of the day-to-day maintenance of the Navigator and reduce the workload on @dwayne-hart. Basically if the script doesn't like any data files, not our problem and they won't be ingested into the Navigator.
Yes, I'm using python here so I don't get yelled at...
Architecture
data-sentinel), withnetcdf4being the only primary dependency.dataset_keyentry in the template file.{ "dataset_key": { "rules": { "check_attrs": [], "check_dimensions_identical": true OR false, "check_unlimited_time_dim": true OR false, "check_variables": true OR false }, "known_files": { "filename_regex_pattern": "path_to_valid_file" } } }Usage
main.py --dataset DATASET --template TEMPLATE [incoming][incoming]is a list of the files to be tested. This argument may a file path, or standard input (pipe).e.g.:
python main.py --dataset giops_daily --template ./my_template.json < incoming_filesBehaviour
filename_regex_pattern's, and compare their metadata to the known file as defined in the template file.filename_regex_pattern.Infrastructure