-
Notifications
You must be signed in to change notification settings - Fork 100
Add quantile mapping and associated tests #2264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add quantile mapping and associated tests #2264
Conversation
1c47068 to
73363ed
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2264 +/- ##
==========================================
- Coverage 98.39% 95.19% -3.20%
==========================================
Files 124 150 +26
Lines 12212 15323 +3111
==========================================
+ Hits 12016 14587 +2571
- Misses 196 736 +540 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
73363ed to
ae2a5ad
Compare
gavinevans
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maxwhitemet 👍
I've added some comments below.
| return np.interp(quantiles, empirical_quantiles, sorted_values) | ||
|
|
||
|
|
||
| def quantile_mapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be good to name this something else to avoid a quantile_mapping function and a QuantileMapping class in the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
I have changed the name to 'map_quantiles'. Please let me know if this needs changing.
| # Create a copy of the forecast_cube or forecast_to_calibrate cube to hold | ||
| # output data and preserve metadata. | ||
| output_cube = ( | ||
| forecast_cube.copy() | ||
| if forecast_to_calibrate is None | ||
| else forecast_to_calibrate.copy() | ||
| ) | ||
|
|
||
| # Extract data, handling masked arrays | ||
| if np.ma.is_masked(reference_cube.data): | ||
| reference_data_flat = reference_cube.data.filled().flatten() | ||
| else: | ||
| reference_data_flat = reference_cube.data.flatten() | ||
|
|
||
| if np.ma.is_masked(forecast_cube.data): | ||
| forecast_data_flat = forecast_cube.data.filled().flatten() | ||
| else: | ||
| forecast_data_flat = forecast_cube.data.flatten() | ||
|
|
||
| # Determine values to map and output shape | ||
| if forecast_to_calibrate is None: | ||
| # Use forecast_cube data | ||
| if np.ma.is_masked(output_cube.data): | ||
| values_to_map_flat = output_cube.data.filled().flatten() | ||
| else: | ||
| values_to_map_flat = output_cube.data.flatten() | ||
| output_shape = forecast_cube.shape | ||
| output_mask = ( | ||
| forecast_cube.data.mask if np.ma.is_masked(forecast_cube.data) else None | ||
| ) | ||
| else: | ||
| # Use provided cube's data | ||
| output_cube = forecast_to_calibrate.copy() | ||
| if np.ma.is_masked(forecast_to_calibrate.data): | ||
| values_to_map_flat = forecast_to_calibrate.data.filled().flatten() | ||
| else: | ||
| values_to_map_flat = forecast_to_calibrate.data.flatten() | ||
| output_shape = forecast_to_calibrate.shape | ||
| output_mask = ( | ||
| forecast_to_calibrate.data.mask | ||
| if np.ma.is_masked(forecast_to_calibrate.data) | ||
| else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you could put this into a separate method / function, so that the process method is simpler. You even put the pattern below into a method / function, given that you re-use a number of times:
if np.ma.is_masked(forecast_cube.data):
forecast_data_flat = forecast_cube.data.filled().flatten()
else:
forecast_data_flat = forecast_cube.data.flatten()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now removed the use of .filled() as I was concerned this would introduce changes to the statistics. Instead, the code now only processes unmasked data points, and later reinserts the mask where it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you may as well move these tests into a quantile_mapping directory to match the pattern of the other tests for calibration methods.
- Move functionality into QuantileMapping class - Remove redundancy - Increase variable name clarity - Refactor into smaller functions 2. Additions: - Improved readability experience of docstrings - Fixed improper masked array handling
ae2a5ad to
6ec215f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the feedback received, I have implemented the below modifications:
- Made lots of changes to docstrings, such that now:
- More extensive documentation has moved from private to public methods
- Removed redundant Args sections in private methods, defined elsewhere.
- Masked arrays
- I was concerned about what would happen if the reference cube and the post-processed forecast cube had differing mask locations. Thus I have added handling that may require further discussion: combine the masks such that only points that are valid in both cubes are used to build the CDFs.
- Removed redundant use of np.where for non-masked arrays as I discovered this is implicitly handled in np.ma.where
| return np.interp(quantiles, empirical_quantiles, sorted_values) | ||
|
|
||
|
|
||
| def quantile_mapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
I have changed the name to 'map_quantiles'. Please let me know if this needs changing.
improver/cli/quantile_mapping.py
Outdated
| *, | ||
| mapping_method: str = "floor", | ||
| preservation_threshold: float = None, | ||
| forecast_to_calibrate: cli.inputcube = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the option to provide the third cube from the plugin and this CLI script. Thank you.
| # Create a copy of the forecast_cube or forecast_to_calibrate cube to hold | ||
| # output data and preserve metadata. | ||
| output_cube = ( | ||
| forecast_cube.copy() | ||
| if forecast_to_calibrate is None | ||
| else forecast_to_calibrate.copy() | ||
| ) | ||
|
|
||
| # Extract data, handling masked arrays | ||
| if np.ma.is_masked(reference_cube.data): | ||
| reference_data_flat = reference_cube.data.filled().flatten() | ||
| else: | ||
| reference_data_flat = reference_cube.data.flatten() | ||
|
|
||
| if np.ma.is_masked(forecast_cube.data): | ||
| forecast_data_flat = forecast_cube.data.filled().flatten() | ||
| else: | ||
| forecast_data_flat = forecast_cube.data.flatten() | ||
|
|
||
| # Determine values to map and output shape | ||
| if forecast_to_calibrate is None: | ||
| # Use forecast_cube data | ||
| if np.ma.is_masked(output_cube.data): | ||
| values_to_map_flat = output_cube.data.filled().flatten() | ||
| else: | ||
| values_to_map_flat = output_cube.data.flatten() | ||
| output_shape = forecast_cube.shape | ||
| output_mask = ( | ||
| forecast_cube.data.mask if np.ma.is_masked(forecast_cube.data) else None | ||
| ) | ||
| else: | ||
| # Use provided cube's data | ||
| output_cube = forecast_to_calibrate.copy() | ||
| if np.ma.is_masked(forecast_to_calibrate.data): | ||
| values_to_map_flat = forecast_to_calibrate.data.filled().flatten() | ||
| else: | ||
| values_to_map_flat = forecast_to_calibrate.data.flatten() | ||
| output_shape = forecast_to_calibrate.shape | ||
| output_mask = ( | ||
| forecast_to_calibrate.data.mask | ||
| if np.ma.is_masked(forecast_to_calibrate.data) | ||
| else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now removed the use of .filled() as I was concerned this would introduce changes to the statistics. Instead, the code now only processes unmasked data points, and later reinserts the mask where it was.
improver/cli/quantile_mapping.py
Outdated
| reference_cube: cli.inputcube, | ||
| forecast_cube: cli.inputcube, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented your suggestion though excluded the portion of the 'cubes' docstring on land-sea masking handled by the estimate_emos_coefficients plugin here.
Please could you let me know if I should add this?
Addresses #1007
This PR implements quantile mapping into the IMPROVER repo, adding a quantile mapping module, CLI, unit tests, and acceptance tests.
A demonstration of the plugin's functionality is available here.
Testing: