Skip to content

Conversation

@gavinevans
Copy link
Contributor

@gavinevans gavinevans commented Dec 3, 2025

Addresses https://github.com/metoppv/mo-blue-team/issues/791

Description
The main aim of this PR is to add functionality for clustering realizations (ensemble members) from a primary forecast source, and then matching realizations from secondary forecast sources with each cluster.

File breakdown:

  • doc/source/examples/realization_cluster_and_match_example_data.py: A worked example, as another example of the functionality added in Template for adding worked examples (v2) #2238. This is intended to help with providing an understanding of the rest of the functionality added in this PR.
  • envs/environment_a.yml: Update environment to add esmf-regrid, scikit-learn and kmedoids packages.
  • improver/cli/realization_cluster_and_match.py: CLI for the RealizationClusterAndMatch plugin.
  • improver/clustering/init.py
  • improver/clustering/clustering.py: This file provides a thin wrapper around the scikit-learn and kmedoids packages. The realization clustering is only intended to work with kmedoids clustering, but I've added this more generic capability as I thought this might have future applications.
  • improver/clustering/realization_clustering.py: This contains the core algorithms added in this PR. The plugins added in this file are described further below.
  • improver/regrid/landsea.py: I've added some for using the ESMF (Earth System Modelling Framework) area weighted regridding scheme (available through iris). I've used the ESMF area weighted regridding as it was easier to avoid producing masked values following the regridding.
  • improver_tests/acceptance/SHA256SUMS: Updated checksums.
  • improver_tests/acceptance/test_realization_cluster_and_match.py: Acceptance tests for the new CLI.
  • improver_tests/clustering/init.py
  • improver_tests/clustering/test_clustering.py: Unit tests for the generic clustering plugin.
  • improver_tests/clustering/test_realization_clustering.py: Unit tests for the plugins related to Realization Clustering.
  • improver_tests/regrid/test_RegridLandSea.py: An additional unit test for the ESMF area-weighted regridding.

Plugin breakdown:
improver/clustering/realization_clustering.py contains three plugins:

  • RealizationClustering is a thin wrapper around the more generic FitClustering plugin specific for clustering realizations.
  • RealizationToClusterMatcher is a fairly complex plugin aiming to match secondary forecast sources to pre-computed clusters using a mean squared error-based approach.
  • RealizationClusterAndMatch is a large plugin that either contains key functionality itself or stitches together functionality from other plugins. The main aim of this plugin is to:
    • Cluster the primary forecast source using KMedoids clustering. KMedoids clustering is used because it identifies one of the members of each cluster as the medoid, so that this member is interpreted as being representative of the cluster. The primary forecast source is regridded using area-weighted regridding prior to clustering, so as to emphasise similar spatial features, which can be helpful for the clustering.
    • Match secondary forecast sources with the clustered primary forecast source. This matching is split into two main methods, with the first matching realizations from the secondary forecast source where the secondary forecast source has at least as many members as there are clusters. This might be the case with an ensemble NWP forecast. The second matching method is designed for secondary forecast sources where the secondary forecast source has fewer realizations than there are clusters, for example, a deterministic nowcast. These two matching situations are handled slightly differently.

improver_test_data PR:
metoppv/improver_test_data#115

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

… so that it can work even if the esmf_regrid package is not available.
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 91.06628% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.42%. Comparing base (84a8944) to head (d70d5f0).
⚠️ Report is 153 commits behind head on master.

Files with missing lines Patch % Lines
improver/clustering/realization_clustering.py 90.31% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2259      +/-   ##
==========================================
- Coverage   98.39%   95.42%   -2.97%     
==========================================
  Files         124      151      +27     
  Lines       12212    15592    +3380     
==========================================
+ Hits        12016    14879    +2863     
- Misses        196      713     +517     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gavinevans gavinevans marked this pull request as ready for review December 4, 2025 16:36
…ng from the RealizationClusterAndMatch plugin. This provides some way of knowing at what forecast periods temporal interpolation should be later applied.
…ring strings within a coordinate isn't supported in netCDF, as far as I can tell.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant