toolbox

All references to 'toolbox' are subject to change upon appropriate package naming.

Documentation

The documentation for this package is available here

Please note that the documentation is still under construction.

Building Your Own Steps

Click here for a guide on how to build your own steps.

Development Information

Python >= 3.10

Installation

For a local, editable version of the toolbox

git clone https://github.com/NOC-OBG-Autonomy/toolbox.git
cd toolbox
# create/activate a virtual environment
pip install -e .

About

🧭 Pipeline Architecture

The toolbox pipeline provides a flexible, modular framework for defining, executing, and visualising multi-step data-processing workflows — such as those used for autonomous underwater glider missions.

Each pipeline is composed of a series of steps that are automatically discovered, registered, and executed in sequence. This enables users to build, extend, and visualise complex data workflows using a simple YAML configuration file.

🏗️ Overview

The Pipeline class orchestrates the flow of data through a sequence of modular “steps.” Each step performs a specific processing task (e.g., data loading, quality control, profile detection, export).

Key characteristics:

Component	Description
Configuration-driven	Users define the workflow in a YAML file describing each step, its parameters, and diagnostics options.
Dynamic discovery	New steps are auto-registered via decorators and discovered dynamically when imported.
Composable	Steps can be nested to form sub-pipelines (e.g., a calibration block within a larger workflow).
Context-aware	Each step passes its results into a shared `context`, which is used as input for subsequent steps.
Visualisable	A Graphviz diagram can be automatically generated to visualise pipeline structure and dependencies.

⚙️ How It Works

Initialization
```
 from toolbox.pipeline import Pipeline
 pipeline = Pipeline(config_path="my_pipeline.yaml")
```
- Loads the pipeline configuration from YAML.
- Discovers all available step classes using @register_step.
- Validates step dependencies defined in STEP_DEPENDENCIES.

Step Discovery & Registration

Each step inherits from BaseStep and registers itself via a decorator:

from toolbox.steps.base_step import register_step, BaseStep

 @register_step
 class LoadOG1(BaseStep):
     step_name = "Load OG1"

     def run(self):
         self.log("Loading NetCDF data...")
         # Do processing
         return self.context

Registered steps are stored in a global registry (REGISTERED_STEPS) and automatically imported at runtime by:

toolbox.steps.discover_steps()

Pipeline Execution
Running the pipeline executes each step in order, passing context forward:
```
 results = pipeline.run()
```
Internally:
- Each step is instantiated via create_step().
- The run() method is called.
- The returned context (e.g., processed xarray.Dataset) is merged and passed to the next step.
Diagnostics & Visualization
Steps can optionally include diagnostic plots or summaries by setting:
```
diagnostics: true
```
If the visualisation option is enabled, a Graphviz diagram of the pipeline is generated:
```
pipeline:
  visualize: true
```
The pipeline renders a Graphviz diagram showing step dependencies and flow.
```
[Load OG1] → [Derive CTD*] → [Find Profiles] → [Data Export]
                  *diagnostics enabled
```
Exporting Pipeline Configuration
The entire pipeline configuration can be exported to a YAML file for reproducibility:
```
pipeline.export_config("exported_pipeline.yaml")
```

🧩 Example Configuration

An example YAML configuration for a simple pipeline:

pipeline:
  name: "Data Processing Pipeline"
  description: "Process and analyze multi-dimensional glider data"
  visualisation: false

steps:
  - name: "Load OG1"
    parameters:
      file_path: "../../examples/data/OG1/Doombar_648_R.nc"
      add_meta: false
    diagnostics: false

  - name: "Derive CTD"
    parameters:
      interpolate_latitude_longitude: true
    diagnostics: true

  - name: "Find Profiles"
    parameters:
      gradient_thresholds: [0.02, -0.02]
    diagnostics: false

  - name: "Data Export"
    parameters:
      export_format: "hdf5"
      output_path: "../../examples/data/OG1/exported_Doombar_648_R.nc"

🔁 Extending the Pipeline

A full breakdown can be found here: Developer Guide.

TL;DR:

Create a file under toolbox/steps/custom/variables (e.g., qc_salinity.py).
Define a class that inherits from BaseStep.
Decorate it with @register_step and provide a unique step_name.
Implement a run() method that processes data and returns an updated context.

🧠 Key Takeaways

Declarative workflow: Define what happens, not how it’s executed.
Extensible design: Add new steps without modifying core code.
Integrated diagnostics: Flag and visualise data quality issues inline.
Portable & reproducible: YAML configurations make it easy to rerun or share pipelines.

License

Apache 2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toolbox

Documentation

Building Your Own Steps

Development Information

Installation

About

🧭 Pipeline Architecture

🏗️ Overview

⚙️ How It Works

Initialization

Step Discovery & Registration

Pipeline Execution

Diagnostics & Visualization

Exporting Pipeline Configuration

🧩 Example Configuration

🔁 Extending the Pipeline

🧠 Key Takeaways

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

NOC-OBG-Autonomy/toolbox

Folders and files

Latest commit

History

Repository files navigation

toolbox

Documentation

Building Your Own Steps

Development Information

Installation

About

🧭 Pipeline Architecture

🏗️ Overview

⚙️ How It Works

Initialization

Step Discovery & Registration

Pipeline Execution

Diagnostics & Visualization

Exporting Pipeline Configuration

🧩 Example Configuration

🔁 Extending the Pipeline

🧠 Key Takeaways

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages