This repository provides an openEO-based, Docker-compatible, reproducible testing framework for downscaling Earth Observation (EO) data. It supports a modular data processing and machine learning pipeline for climate data—powered by STAC, Dask, and LightGBM. The tests are designed to validate processing and modeling logic using both public and restricted datasets.
- openEO-compatible test pipelines using openeo-processes-dask
- STAC-based data loading for ERA5, SEAS5, DEM, and EMO1 products
- Hybrid Improved Precipitation downscaling framework using LightGBM
- Docker-based isolated runtime with all dependencies
- Flexible Makefile-based automation
- AWS-secured workflows for SEAS5 datasets
| Dataset | Access | Spatial Extent | Temporal Extent | Notes |
|---|---|---|---|---|
| ERA5 | Public | 2°E–20°E, 40°N–52°N (Alps region) | 2000–2020 (daily) | Used in open pytests |
| SEAS5 | Requires AWS | Same as ERA5 | 12 Inits(Aug '21 - July '22) | Used in closed pytests only |
| EMO1, DEM | Public | Same as ERA5 | 2000–2022 used | EMO1 contains downstream targets |
make setup This will build the DockerFile and run the pre-processing and downScaling pipeline using:
/app/tests/test_downscaleml_pipeline.py
make shellYou’ll drop into a Docker shell with the micromamba environment pre-activated.
make cleanSome tests require access to SEAS5 data through AWS-authenticated STAC endpoints. These tests will only work if valid AWS credentials are provided.
Set your AWS credentials as environment variables:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secretThen, run:
make run TEST_FILE=/app/tests/pytest_A.py- Loads open ERA5, DEM, and EMO1 datasets via STAC
- Performs resampling, cube merging, and
sin_cos_doyfeature expansion - Saves output as
.zarrand registers with raster2stac - Trains pixel-based LightGBM models for the target variable
- Validates predictions and saves them as Zarr
- Same workflow as above, but includes SEAS5 datasets
- Requires valid AWS credentials
- downScaleML: core downscaling package
- raster2stac: Zarr-to-STAC converter
- openeo-processes-dask: local execution of openEO processes
- LightGBM + scikit-learn: pixel-based regression models
- Dask: parallel computation backend
- Micromamba: lightweight conda environment manager
micromamba env create -f environment.yml
micromamba activate openEO_downScaleML
pip install -r test_requirements.txt
pytest tests/test_downscaleml_pipeline.py.
├── Dockerfile
├── Makefile
├── environment.yml
├── test_requirements.txt
├── tests/
│ ├── test_downscaleml_pipeline.py # open test pipeline
│ ├── pytest_A.py # closed test pipeline (requires AWS)
│ └── ...
└── app/
└── test_data/ # Output and intermediate results
- The
sin_cos_doyandraster2stacoperations are registered openEO processes fromopeneo-processes-dask. - The
.zarrand STAC metadata output are stored in/app/test_data/. pytest_A.pyand any reference to SEAS5 require valid AWS credentials.
Distributed under an open-source license aligned with interTwin project guidelines.
This work is part of the interTwin project, and integrates components from the broader openEO ecosystem.