Skip to content

Conversation

@rodrigoalmeida94
Copy link

EWB Pull Request

Description

Adds Receiver Operating Characteristic Skill Score metric implementation.

This probabilistic metric has been found to be relatively insensitive to the rarity of hydro-climatological events, which makes it suitable for addition into the benchmarking suite. [ref]

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Unit tests

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@nicholasloveday
Copy link

Let me know if you have any performance issues with ROC in the scores package. I have some ideas on how to significantly speed up the AUC calculation in the scores package!

@aaTman
Copy link
Collaborator

aaTman commented Jan 26, 2026

@nicholasloveday I think I ran into a scores bug with dask here. It seems that there's code in roc_impl.py that isn't compatible with dask, specifically line 132:

...
if fcst.max().item() > 1 or fcst.min().item() < 0:
...

Where item isn't a method for dask arrays. I can open an issue at some point after my presentation.

Traceback:

"""
Traceback (most recent call last):
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 198, in _call_possibly_missing_method
    method = getattr(arg, name)
AttributeError: 'Array' object has no attribute 'item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 490, in _process_worker
    r = call_item()
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/parallel.py", line 607, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
            ~~~~^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/evaluate.py", line 409, in compute_case_operator
    _evaluate_metric_and_return_df(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        forecast_ds=aligned_forecast_ds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        **metric_kwargs,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/evaluate.py", line 565, in _evaluate_metric_and_return_df
    metric_result = metric.compute_metric(
        forecast_data,
        target_data,
        **kwargs,
    )
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 138, in compute_metric
    return self._compute_metric(forecast, target, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 708, in _compute_metric
    roc_curve_data = super()._compute_metric(forecast, target, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 684, in _compute_metric
    return scores.probability.roc_curve_data(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        binary_forecast,
        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
        weights=None,
        ^^^^^^^^^^^^^
    )
    ^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/scores/plotdata/roc_impl.py", line 132, in roc
    if fcst.max().item() > 1 or fcst.min().item() < 0:
       ~~~~~~~~~~~~~~~^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 210, in func
    return _call_possibly_missing_method(self.data, name, args, kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 200, in _call_possibly_missing_method
    duck_array_ops.fail_on_dask_array_input(arg, func_name=name)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/core/duck_array_ops.py", line 117, in fail_on_dask_array_input
    raise NotImplementedError(msg % func_name)
NotImplementedError: 'item' is not yet a valid method on dask arrays
"""

@nicholasloveday
Copy link

Hi @aaTman , you need to set check_args=False for it to work with dask. I'll update this so that it does this automatically in the future as someone else was caught out with this

@tennlee
Copy link

tennlee commented Jan 26, 2026

The error message indicates that the dask team would ideally implement the required functionality if time allowed (NotImplementedError: 'item' is not yet a valid method on dask arrays) ... we can put something into scores to make life nicer for users (such as put a warning instead when we encounter a NotImplementedError), but it might also be worth cross-posting to the dask issue tracker so they are aware there is user demand for a fix. I'm happy to do that next week when our development schedule allows, but if you feel like it, you may want to look into the dask issue tracker and see if there's an existing issue.

@aaTman
Copy link
Collaborator

aaTman commented Jan 26, 2026

Thanks @nicholasloveday and @tennlee! I need to consider exactly how to implement check_args=False in the current workflow, ideally without hardcoding it in.

@tennlee
Copy link

tennlee commented Jan 26, 2026

No problem. I wasn't really across this issue until 20 minutes ago, so I don't have anything to contribute on recommended workarounds until I've had some time to fully grok what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants