From d014a639077b2c8b29d54b8d06905e49f047af8e Mon Sep 17 00:00:00 2001 From: aleksandrina-streltsova Date: Sun, 15 Feb 2026 13:32:11 +0300 Subject: [PATCH 1/2] Update `README.md` and temporarily move sections related to CLI to a separate `cli.md` file. --- README.md | 305 ++++++++++++++++++++++--------------------------- cli.md | 151 ++++++++++++++++++++++++ docs/index.rst | 44 +++---- 3 files changed, 305 insertions(+), 195 deletions(-) create mode 100644 cli.md diff --git a/README.md b/README.md index 7f32cd9..7c11282 100644 --- a/README.md +++ b/README.md @@ -1,162 +1,151 @@ # EASE (hElikite dAta proceSsing codE) -This library supports Helikite campaigns by unifying field-collected data, generating quicklooks, and performing quality control on instrument recordings. It is now available on PyPI, can be used via a command‐line interface (CLI), and also runs in Docker containers if needed. +This library supports Helikite campaigns by unifying field-collected data, generating quicklooks, +and performing quality control on instrument recordings. +It is available on PyPI and is designed to be used as a Python package within Jupyter notebooks. ## Table of Contents -1. [Getting Started](#getting-started) - 1. [Pip Installation](#pip-installation) - 2. [Docker](#docker) - 3. [Makefile](#makefile) + +1. [Installation](#installation) 2. [Using the Library](#using-the-library) -3. [Cleaner](#cleaner) -4. [Documentation & Examples](#documentation--examples) -5. [Command-line Usage](#command-line-usage) -6. [Development](#development) + + 1. [Level 0 (Cleaner)](#level-0-cleaner) + 2. [Level 1](#level-1) + 3. [Level 1.5](#level-15) + 4. [Level 2](#level-2) + 5. [Configuration](#configuration) +3. [Documentation & Examples](#documentation--examples) +4. [Development](#development) + 1. [The Instrument class](#the-instrument-class) 2. [Adding more instruments](#adding-more-instruments) -7. [Configuration](#configuration) - 1. [Application constants](#application-constants) - 2. [Runtime configuration](#runtime) -# Getting Started +# Installation -## Pip Installation +## Installation for standard users -Helikite is published on [PyPI](https://pypi.org/project/helikite-data-processing/). To install it via `pip`, run: +Helikite is published on PyPI. To install it with `pip`, run: ```bash pip install helikite-data-processing ``` -After installation, the CLI is available as a system command: +## Installation for contributors -```bash -helikite --help +For an isolated development environment, or if you prefer Poetry for dependency management: + +**Clone the repository** + +``` +git clone https://github.com/EERL-EPFL/helikite-data-processing.git +cd helikite-data-processing ``` -## Docker +**Install dependencies with Poetry** -> **Note:** Docker usage is now optional. For most users, installing via pip is the recommended approach. +``` +poetry install +``` -### Building and Running with Docker -1. **Build the Docker image:** +# Using the Library - ```bash - docker build -t helikite . - ``` +Helikite is intended to be used as an importable Python package. The standard workflow is organized into multiple +processing levels, typically executed through Jupyter notebooks. These notebooks allow interactive control during +processing, such as selecting flight takeoff and landing times or marking outliers and flags +(for example, hovering periods). -2. **Generate project folders and create the configuration file:** +The library also supports automatic detection of outliers and flags, enabling fully non-interactive processing. +Automatic runs may produce results that differ from manual review, so they should be used with caution. - ```bash - docker run \ - -v ./inputs:/app/inputs \ - -v ./outputs:/app/outputs \ - helikite:latest generate_config - ``` +An example script that processes all flights from a campaign in non-interactive mode is available +[here](./notebooks/execute_all.py). -3. **Preprocess the configuration file:** +If the library is installed from source, run the following from the project root to view usage instructions: - ```bash - docker run \ - -v ./inputs:/app/inputs \ - -v ./outputs:/app/outputs \ - helikite:latest preprocess - ``` +``` +poetry run python ./notebooks/execute_all.py --help +``` -4. **Process data and generate plots:** - ```bash - docker run \ - -v ./inputs:/app/inputs \ - -v ./outputs:/app/outputs \ - helikite:latest - ``` +## Level 0 (Cleaner) -You can also use the pre-built image from GitHub Packages: +Level 0 synchronizes timestamps across instruments and merges their data into a unified structure. -```bash -docker run \ - -v ./inputs:/app/inputs \ - -v ./outputs:/app/outputs \ - ghcr.io/eerl-epfl/helikite-data-processing:latest generate_config -``` +See the [Level 0 notebook](notebooks/level0_DataProcessing.ipynb) for a detailed example, or the `execute_level0` +function in the [script](./notebooks/execute_all.py). -## Makefile -The Makefile provides simple commands for common tasks: +## Level 1 -```bash -make build # Build the Docker image -make generate_config # Generate the configuration file in the inputs folder -make preprocess # Preprocess data and update the configuration file -make process # Process data and generate plots (output goes into a timestamped folder) -``` +Level 1 performs quality control, averages humidity and temperature measurements, calculates flight altitude +using the barometric equation, and applies instrument-specific processing. -# Using the Library +See the [Level 1 notebook](notebooks/level1_DataProcessing.ipynb) for a detailed example, or the `execute_level1` +function in the [script](./notebooks/execute_all.py). -Helikite can be used both as a standalone CLI tool and as an importable Python package. -For non-programmers, the CLI is the simplest way to use the library. -For programmers, the library can be imported and used in your own scripts: -```python -import helikite -from helikite.processing import preprocess, sorting -from helikite.constants import constants +## Level 1.5 -# For example, to generate a configuration file programmatically: -preprocess.generate_config() -``` +Level 1.5 detects flags that indicate environmental or flight conditions, such as hovering, pollution exposure, +or cloud immersion. -A complete list of available functions and modules is documented on the [auto-published documentation site](https://eerl-epfl.github.io/helikite-data-processing/). +See the [Level 1.5 notebook](notebooks/level1_5_DataProcessing.ipynb) for a detailed example, or the `execute_level1_5` +function in the [script](./notebooks/execute_all.py). -# Cleaner -The `cleaner` module is designed to tidy up output folders generated by the application. For instructions on how to use it, refer to the [Level 0 notebook](./notebooks/level0.ipynb). +## Level 2 -# Documentation & Examples +Level 2 averages data to 10-second intervals and can merge flights into a final campaign dataset. -For full API documentation, usage examples, and tutorials, please visit the [Helikite Data Processing Documentation](https://eerl-epfl.github.io/helikite-data-processing/). +See the [Level 2 notebook](notebooks/level2_DataProcessing.ipynb) for a detailed example, or the `execute_level2` +function in the [script](./notebooks/execute_all.py). -The `notebooks` folder also contains a [Level 0 processing example](./notebooks/level0.ipynb) that demonstrates how to use the library for basic data processing tasks. -# Command-line Usage +## Configuration -Once installed (via pip or Docker), you can use the CLI to run the three main stages of the application: +Each notebook uses a configuration file, and the same file is applied to all processing levels for a given flight. +An example configuration: -1. **Generate a configuration file:** - This creates a config file in your `inputs` folder. - ```bash - helikite generate-config - ``` +``` +flight: "1" +flight_date: 2025-11-26 +flight_suffix: "A" + +output_schema: "ORACLES_25_26" +campaign_data_dirpath: /home/EERL/data/ORACLES/Helikite/2025-2026/Data/ +processing_dir: "./outputs/2025-2026" +``` -2. **Preprocess:** - Scans the input folder, associates raw instrument files to configurations, and updates the config file. - ```bash - helikite preprocess - ``` +Where: -3. **Process:** - Processes the input data based on the configuration, normalizes timestamps, and generates plots. - (Running without any command runs this stage.) - ```bash - helikite - ``` +* `campaign_data_dirpath` contains folder `2025-11-26_A` corresponding to an individual campaign flight +* `output_schema` defines plot and output formatting (see available schemas [here](./helikite/classes/output_schemas.py)) + +Custom schemas can be registered using `OutputSchemas.register(name, SCHEMA)` +if default configurations do not match campaign needs. + +A complete list of available modules and functions is documented on the auto-published documentation site. + + +# Documentation & Examples + +Full API documentation is available on the +[Helikite Data Processing Documentation](https://eerl-epfl.github.io/helikite-data-processing/) site. -For detailed help on any command, append `--help` (e.g., `helikite preprocess --help`). # Development ## The Instrument class -The structure of the Instrument class allows specific data cleaning activities to be overridden for each instrument that inherits from it. The main application (in `helikite.py`) calls these class methods to process the data. +All instruments implement a shared interface that allows instrument-specific behavior to override default processing. +Data processing components call these methods during workflow execution. -## Adding more instruments -The configuration file is generated during the `generate_config`/`preprocess` steps by iterating over the instantiated classes imported in `helikite/instruments/__init__.py`. To add a new instrument, create a subclass of `Instrument` and import it in `__init__.py`. +## Adding more instruments -Firstly, the class should inherit from `Instrument` and set a unique name (e.g., for the `MCPC` instrument): +New instrument classes should inherit from `Instrument` and define a unique name. Example: ```python def __init__(self, *args, **kwargs) -> None: @@ -164,70 +153,52 @@ def __init__(self, *args, **kwargs) -> None: self.name = 'mcpc' ``` -The minimum functions required are: - -- `file_identifier()`: Accepts the first 50 lines of a CSV file and returns `True` if it matches the instrument’s criteria (typically checking header content). - - ```python - # Example for the pico instrument: - def file_identifier(self, first_lines_of_csv) -> bool: - if ("win0Fit0,win0Fit1,win0Fit2,win0Fit3,win0Fit4,win0Fit5,win0Fit6," - "win0Fit7,win0Fit8,win0Fit9,win1Fit0,win1Fit1,win1Fit2") in first_lines_of_csv[0]: - return True - return False - ``` - -- `set_time_as_index()`: Converts the instrument's timestamp information into a common pandas `DateTimeIndex`. - - ```python - # Example for the filter instrument: - def set_time_as_index(self, df: pd.DataFrame) -> pd.DataFrame: - df['DateTime'] = pd.to_datetime( - df['#YY/MM/DD'].str.strip() + ' ' + df['HR:MN:SC'].str.strip(), - format='%y/%m/%d %H:%M:%S' - ) - df.drop(columns=["#YY/MM/DD", "HR:MN:SC"], inplace=True) - df.set_index('DateTime', inplace=True) - return df - ``` - -For more details and examples, refer to the [auto-published documentation](https://eerl-epfl.github.io/helikite-data-processing/). - -# Configuration - -There are three sources of configuration parameters: - -## Application constants - -These are defined in `helikite/constants.py` and include settings such as filenames, folder paths for inputs/outputs, logging formats, and default plotting parameters. - -## Runtime configuration - -The runtime configuration is stored in `config.yaml` (located in your `inputs` folder). This file is generated during the `generate_config` or `preprocess` steps. It holds runtime arguments for each instrument (e.g., file locations, time adjustments, and plotting settings). - -Below is an example snippet from a generated `config.yaml`: - -```yaml -global: - time_trim: - start: 2022-09-29 10:21:58 - end: 2022-09-29 12:34:36 -ground_station: - altitude: null - pressure: null - temperature: 7.8 -instruments: - filter: - config: filter - date: null - file: /app/inputs/220209A3.TXT - pressure_offset: null - time_offset: - hour: 5555 - minute: 0 - second: 0 -plots: - altitude_ground_level: false - grid: - resample_seconds: 60 +Required methods include: + +### `file_identifier()` + +Determines whether a CSV file belongs to the instrument by inspecting header lines. + +```python +def file_identifier(self, first_lines_of_csv) -> bool: + if ("win0Fit0,win0Fit1,win0Fit2,win0Fit3,win0Fit4,win0Fit5,win0Fit6," + "win0Fit7,win0Fit8,win0Fit9,win1Fit0,win1Fit1,win1Fit2") in first_lines_of_csv[0]: + return True + return False ``` + +### `read_data()` + +Parses raw instrument data. + +### `data_corrections()` + +Applies instrument-specific corrections. + +### `set_time_as_index()` + +Converts the instrument's timestamp information into a common pandas `DateTimeIndex`. + +```python +def set_time_as_index(self, df: pd.DataFrame) -> pd.DataFrame: + df['DateTime'] = pd.to_datetime( + df['#YY/MM/DD'].str.strip() + ' ' + df['HR:MN:SC'].str.strip(), + format='%y/%m/%d %H:%M:%S' + ) + df.drop(columns=["#YY/MM/DD", "HR:MN:SC"], inplace=True) + df.set_index('DateTime', inplace=True) + return df +``` + +### `__repr__()` + +Returns a short instrument label used in certain plots (for example, `"FC"` for the flight computer). + +Additional implementation details and examples are available in the auto-published documentation. + + +# Command-line Interface (Outdated) + +The CLI is not up to date with the main processing workflow. +If CLI usage is still required, refer to the legacy documentation: `./cli.md`. + diff --git a/cli.md b/cli.md new file mode 100644 index 0000000..ec8e40c --- /dev/null +++ b/cli.md @@ -0,0 +1,151 @@ +# Command-line Interface (Outdated) + +> **Note:** CLI is not up-to-date with the main processing functionality. +> Please, refer to the [README.md](./README.md) to see the currently recommended way of using the library. + +Helikite can be used both as a standalone CLI tool and as an importable Python package. +For non-programmers, the CLI is the simplest way to use the library. +For programmers, the library can be imported and used in your own scripts: + +```python +import helikite +from helikite.processing import preprocess, sorting +from helikite.constants import constants + +# For example, to generate a configuration file programmatically: +preprocess.generate_config() +``` +## Table of Contents + +1. [Docker](#docker) +2. [Makefile](#makefile) +3. [Usage](#usage) +4. [Configuration](#configuration) + 1. [Application constants](#application-constants) + 2. [Runtime configuration](#runtime) + +# Docker + +> **Note:** Docker usage is now optional. For most users, installing via pip is the recommended approach. + +## Building and Running with Docker + +1. **Build the Docker image:** + + ```bash + docker build -t helikite . + ``` + +2. **Generate project folders and create the configuration file:** + + ```bash + docker run \ + -v ./inputs:/app/inputs \ + -v ./outputs:/app/outputs \ + helikite:latest generate_config + ``` + +3. **Preprocess the configuration file:** + + ```bash + docker run \ + -v ./inputs:/app/inputs \ + -v ./outputs:/app/outputs \ + helikite:latest preprocess + ``` + +4. **Process data and generate plots:** + + ```bash + docker run \ + -v ./inputs:/app/inputs \ + -v ./outputs:/app/outputs \ + helikite:latest + ``` + +You can also use the pre-built image from GitHub Packages: + +```bash +docker run \ + -v ./inputs:/app/inputs \ + -v ./outputs:/app/outputs \ + ghcr.io/eerl-epfl/helikite-data-processing:latest generate_config +``` + +# Makefile + +The Makefile provides simple commands for common tasks: + +```bash +make build # Build the Docker image +make generate_config # Generate the configuration file in the inputs folder +make preprocess # Preprocess data and update the configuration file +make process # Process data and generate plots (output goes into a timestamped folder) +``` + +# Usage +After installation, the CLI is available as a system command: + +```bash +helikite --help +``` + +1. **Generate a configuration file:** + This creates a config file in your `inputs` folder. + ```bash + helikite generate-config + ``` + +2. **Preprocess:** + Scans the input folder, associates raw instrument files to configurations, and updates the config file. + ```bash + helikite preprocess + ``` + +3. **Process:** + Processes the input data based on the configuration, normalizes timestamps, and generates plots. + (Running without any command runs this stage.) + ```bash + helikite + ``` + +For detailed help on any command, append `--help` (e.g., `helikite preprocess --help`). + +# Configuration + +There are three sources of configuration parameters: + +## Application constants + +These are defined in `helikite/constants.py` and include settings such as filenames, folder paths for inputs/outputs, logging formats, and default plotting parameters. + +## Runtime configuration + +The runtime configuration is stored in `config.yaml` (located in your `inputs` folder). This file is generated during the `generate_config` or `preprocess` steps. It holds runtime arguments for each instrument (e.g., file locations, time adjustments, and plotting settings). + +Below is an example snippet from a generated `config.yaml`: + +```yaml +global: + time_trim: + start: 2022-09-29 10:21:58 + end: 2022-09-29 12:34:36 +ground_station: + altitude: null + pressure: null + temperature: 7.8 +instruments: + filter: + config: filter + date: null + file: /app/inputs/220209A3.TXT + pressure_offset: null + time_offset: + hour: 5555 + minute: 0 + second: 0 +plots: + altitude_ground_level: false + grid: + resample_seconds: 60 +``` \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index c5098cb..3dc543c 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,7 +7,9 @@ Welcome to Helikite Data Processing's Documentation! Overview -------- -Helikite Data Processing is a Python library designed to support Helikite campaigns by unifying field-collected data, generating quicklooks, and performing quality control on instrument data. Whether you’re a non-programmer who simply needs to run the provided command-line interface (CLI) or a developer looking to integrate its powerful API into your own workflows, this documentation will guide you every step of the way. +This library supports Helikite campaigns by unifying field-collected data, generating quicklooks, +and performing quality control on instrument recordings. +It is available on PyPI and is designed to be used as a Python package within Jupyter notebooks. Installation & Environment Setup ---------------------------------- @@ -22,12 +24,6 @@ Helikite is published on PyPI: https://pypi.org/project/helikite-data-processing pip install helikite-data-processing -After installation, the CLI is available as a system command: - -.. code-block:: bash - - helikite --help - 2. Setting Up a Poetry Environment For an isolated development environment or if you prefer Poetry for dependency management: @@ -45,12 +41,6 @@ For an isolated development environment or if you prefer Poetry for dependency m poetry install -- **Run the CLI within Poetry:** - - .. code-block:: bash - - poetry run helikite --help - 3. Using Jupyter Notebooks Helikite includes several Jupyter notebooks demonstrating various processing workflows. To work with these notebooks: @@ -61,11 +51,13 @@ Helikite includes several Jupyter notebooks demonstrating various processing wor poetry run jupyter lab -- **Open the notebooks** from the ``notebooks/`` folder. Notable examples include: - - ``level0.ipynb`` or ``level0_tutorial.ipynb``: An introductory tutorial covering basic processing. - - ``OutlierRemoval.ipynb``: Demonstrates techniques for identifying and removing outliers. - - ``FeatureFlagging.ipynb``: Shows how to apply feature flags to control processing features. - - ``metadata.ipynb``: Provides examples for handling metadata. +- **Open the notebooks** from the ``notebooks/`` folder. + - ``level0_DataProcessing``: Level 0 processing tutorial. Level 0 synchronizes timestamps across instruments and merges their data into a unified structure. + - ``level1_DataProcessing``: Level 1 processing tutorial. Level 1 performs quality control, averages humidity and temperature measurements, calculates flight altitude +using the barometric equation, and applies instrument-specific processing. + - ``level1_5_DataProcessing``: Level 1.5 processing tutorial. Level 1.5 detects flags that indicate environmental or flight conditions, such as hovering, pollution exposure, +or cloud immersion. + - ``level2_DataProcessing``: Level 2 processing tutorial. Level 2 averages data to 10-second intervals and can merge flights into a final campaign dataset. Using the Library ----------------- @@ -89,21 +81,17 @@ Below is the auto-generated API reference documentation that covers all modules, Notebooks & Tutorials ---------------------- -A collection of Jupyter notebooks in the ``notebooks/`` folder provides practical, step-by-step examples of common workflows. These include: - -- **Level 0 Tutorial:** An introductory guide covering basic data processing steps. -- **Outlier Removal:** Detailed techniques for outlier detection and removal. -- **Feature Flagging:** How to enable and apply feature flags within your processing pipeline. -- **Metadata Handling:** Examples for processing and utilizing metadata. +A collection of Jupyter notebooks in the ``notebooks/`` folder provides practical, +step-by-step examples of common workflows. .. toctree:: :maxdepth: 2 :caption: Notebooks & Tutorials - notebooks/level0_tutorial - notebooks/OutlierRemoval - notebooks/FeatureFlagging - notebooks/metadata + notebooks/level0_DataProcessing + notebooks/level1_DataProcessing + notebooks/level1_5_DataProcessing + notebooks/level2_DataProcessing Additional Resources -------------------- From 36e6f5565042e3a8c1b18e5245e3732873df7a5c Mon Sep 17 00:00:00 2001 From: aleksandrina-streltsova Date: Sun, 15 Feb 2026 13:52:44 +0300 Subject: [PATCH 2/2] Update library version --- CHANGELOG.md | 15 +++++++++++++++ helikite/__init__.py | 2 +- pyproject.toml | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 640d0ad..511aaf8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,21 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.1.4] - 2026-02-15 + +### Added +- Output format specification for final data file and plots consistent across flights in a campaign. +- Level 1, 1.5, and 2 data processors analogous to `Cleaner` for level 0. +- Automatic instruments detection at level 0 based on the flight files. +- Automatic flag detection based on [[Beck et al., 2022]](https://doi.org/10.5194/amt-15-4195-2022). +- Instrument classes for CO2 and MicroAeth. +- Instrument methods. +- Flight processing configuration. + +### Changed +- Modified pressure-based time synchronization allowing arbitrarily large time offsets. +- Modified CPC3007, Tapir, STAP instrument classes. + ## [1.1.3] - 2025-09-09 ### Added diff --git a/helikite/__init__.py b/helikite/__init__.py index 8785620..bbf0edb 100644 --- a/helikite/__init__.py +++ b/helikite/__init__.py @@ -1,6 +1,6 @@ # from .helikite import app # noqa from helikite.classes.cleaning import Cleaner # noqa -__version__ = "1.1.3" +__version__ = "1.1.4" __appname__ = "helikite-data-processing" __description__ = "Library to generate quicklooks and data quality checks on Helikite campaigns" diff --git a/pyproject.toml b/pyproject.toml index 1409bee..b14eb13 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "helikite-data-processing" -version = "1.1.3" +version = "1.1.4" description = "Library to generate quicklooks and data quality checks on Helikite campaigns" authors = ["Evan Thomas "] readme = "README.md"