Add ROCSS #300

rodrigoalmeida94 · 2026-01-13T12:37:54Z

EWB Pull Request

Description

Adds Receiver Operating Characteristic Skill Score metric implementation.

This probabilistic metric has been found to be relatively insensitive to the rarity of hydro-climatological events, which makes it suitable for addition into the benchmarking suite. [ref]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Unit tests

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

nicholasloveday · 2026-01-22T05:28:52Z

Let me know if you have any performance issues with ROC in the scores package. I have some ideas on how to significantly speed up the AUC calculation in the scores package!

aaTman · 2026-01-26T20:58:42Z

@nicholasloveday I think I ran into a scores bug with dask here. It seems that there's code in roc_impl.py that isn't compatible with dask, specifically line 132:

...
if fcst.max().item() > 1 or fcst.min().item() < 0:
...

Where item isn't a method for dask arrays. I can open an issue at some point after my presentation.

Traceback:

"""
Traceback (most recent call last):
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 198, in _call_possibly_missing_method
    method = getattr(arg, name)
AttributeError: 'Array' object has no attribute 'item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 490, in _process_worker
    r = call_item()
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/joblib/parallel.py", line 607, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
            ~~~~^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/evaluate.py", line 409, in compute_case_operator
    _evaluate_metric_and_return_df(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        forecast_ds=aligned_forecast_ds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        **metric_kwargs,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/evaluate.py", line 565, in _evaluate_metric_and_return_df
    metric_result = metric.compute_metric(
        forecast_data,
        target_data,
        **kwargs,
    )
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 44, in _compute_metric_with_docstring
    return _original_compute_metric(self, *args, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 138, in compute_metric
    return self._compute_metric(forecast, target, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 708, in _compute_metric
    roc_curve_data = super()._compute_metric(forecast, target, **kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/src/extremeweatherbench/metrics.py", line 684, in _compute_metric
    return scores.probability.roc_curve_data(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        binary_forecast,
        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
        weights=None,
        ^^^^^^^^^^^^^
    )
    ^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/scores/plotdata/roc_impl.py", line 132, in roc
    if fcst.max().item() > 1 or fcst.min().item() < 0:
       ~~~~~~~~~~~~~~~^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 210, in func
    return _call_possibly_missing_method(self.data, name, args, kwargs)
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/computation/ops.py", line 200, in _call_possibly_missing_method
    duck_array_ops.fail_on_dask_array_input(arg, func_name=name)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/taylor/code/ExtremeWeatherBench/.venv/lib/python3.13/site-packages/xarray/core/duck_array_ops.py", line 117, in fail_on_dask_array_input
    raise NotImplementedError(msg % func_name)
NotImplementedError: 'item' is not yet a valid method on dask arrays
"""

nicholasloveday · 2026-01-26T22:44:39Z

Hi @aaTman , you need to set check_args=False for it to work with dask. I'll update this so that it does this automatically in the future as someone else was caught out with this

tennlee · 2026-01-26T23:04:59Z

The error message indicates that the dask team would ideally implement the required functionality if time allowed (NotImplementedError: 'item' is not yet a valid method on dask arrays) ... we can put something into scores to make life nicer for users (such as put a warning instead when we encounter a NotImplementedError), but it might also be worth cross-posting to the dask issue tracker so they are aware there is user demand for a fix. I'm happy to do that next week when our development schedule allows, but if you feel like it, you may want to look into the dask issue tracker and see if there's an existing issue.

aaTman · 2026-01-26T23:09:38Z

Thanks @nicholasloveday and @tennlee! I need to consider exactly how to implement check_args=False in the current workflow, ideally without hardcoding it in.

tennlee · 2026-01-26T23:13:59Z

No problem. I wasn't really across this issue until 20 minutes ago, so I don't have anything to contribute on recommended workarounds until I've had some time to fully grok what's going on.

Rodrigo Almeida added 2 commits January 13, 2026 13:35

Add ROC skill metrics and tests

8ef34a0

format

e3c2797

nicholasloveday mentioned this pull request Jan 26, 2026

ROC produces an error with default args when dask is used nci/scores#958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCSS #300

Add ROCSS #300

Uh oh!

rodrigoalmeida94 commented Jan 13, 2026

Uh oh!

nicholasloveday commented Jan 22, 2026

Uh oh!

aaTman commented Jan 26, 2026

Uh oh!

nicholasloveday commented Jan 26, 2026

Uh oh!

tennlee commented Jan 26, 2026

Uh oh!

aaTman commented Jan 26, 2026

Uh oh!

tennlee commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add ROCSS #300

Are you sure you want to change the base?

Add ROCSS #300

Uh oh!

Conversation

rodrigoalmeida94 commented Jan 13, 2026

EWB Pull Request

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

nicholasloveday commented Jan 22, 2026

Uh oh!

aaTman commented Jan 26, 2026

Uh oh!

nicholasloveday commented Jan 26, 2026

Uh oh!

tennlee commented Jan 26, 2026

Uh oh!

aaTman commented Jan 26, 2026

Uh oh!

tennlee commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants