prediction type error when using ranked probability score metric in tuning

## The problem

I'm unable to use the experimental `ranked_prob_score()` to optimize a hyperparameter. I can use it to evaluate a model on new data without trouble, but when running `tune_grid()` all models fail with an error i haven't had time to track down (see the reprex for full output):

```r
→ A | error:   `type` should be one of `raw`, `numeric`, `class`, `prob`,
<truncated>
```

I have the development (`main` branch on GitHub) version of {yardstick} installed but the rest of the tidyverse is from CRAN (see the folded code at the end). Though i encountered the problem while using it with the unpublished extension {ordered}.

## Reproducible example

``` r
library(tidymodels)
set.seed(0)

# training and testing data (for tuning procedure)
house_data <-
  MASS::housing[rep(seq(nrow(MASS::housing)), MASS::housing$Freq), -5]
house_split <- initial_split(house_data, prop = .8)
house_train <- training(house_split)
house_test <- testing(house_split)

# prep for fit & tuning
house_rec <- recipe(Sat ~ Infl + Type + Cont, data = house_train)
house_spec <- multinom_reg() |>
  set_engine("nnet") |>
  set_args(penalty = tune())
house_spec |> 
  extract_parameter_set_dials() |> 
  grid_regular(levels = 3) |> 
  print() -> house_grid
#> # A tibble: 3 × 1
#>        penalty
#>          <dbl>
#> 1 0.0000000001
#> 2 0.00001     
#> 3 1
house_prep <- prep(house_rec)

# fit, predict, and evaluate (succeeds)
house_spec |>
  set_args(penalty = .001) |>
  fit(formula(house_prep), data = house_train) |>
  print() -> house_fit
#> parsnip model object
#> 
#> Call:
#> nnet::multinom(formula = Sat ~ Infl + Type + Cont, data = data, 
#>     decay = ~0.001, trace = FALSE)
#> 
#> Coefficients:
#>        (Intercept) InflMedium  InflHigh TypeApartment TypeAtrium TypeTerrace
#> Medium  -0.3605754  0.3510484 0.6186474    -0.4228095  0.1616948   -0.712358
#> High    -0.1098962  0.6235826 1.6182775    -0.7607390 -0.4459065   -1.485253
#>         ContHigh
#> Medium 0.3152534
#> High   0.5166651
#> 
#> Residual Deviance: 2767.968 
#> AIC: 2795.968
bind_cols(
  house_test,
  predict(house_fit, new_data = bake(house_prep, house_test), type = "class"),
  predict(house_fit, new_data = bake(house_prep, house_test), type = "prob")
) |>
  as_tibble() |>
  print() -> house_pred
#> # A tibble: 337 × 8
#>    Sat    Infl  Type  Cont  .pred_class .pred_Low .pred_Medium .pred_High
#>    <ord>  <fct> <fct> <fct> <ord>           <dbl>        <dbl>      <dbl>
#>  1 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  2 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  3 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  4 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  5 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  6 Low    Low   Tower Low   Low             0.386        0.269      0.345
#>  7 Medium Low   Tower Low   Low             0.386        0.269      0.345
#>  8 Medium Low   Tower Low   Low             0.386        0.269      0.345
#>  9 Medium Low   Tower Low   Low             0.386        0.269      0.345
#> 10 Medium Low   Tower Low   Low             0.386        0.269      0.345
#> # ℹ 327 more rows
brier_class(house_pred, truth = Sat, starts_with(".pred_"), -.pred_class)
#> # A tibble: 1 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 brier_class multiclass     0.315
ranked_prob_score(house_pred, truth = Sat, starts_with(".pred_"), -.pred_class)
#> # A tibble: 1 × 3
#>   .metric           .estimator .estimate
#>   <chr>             <chr>          <dbl>
#> 1 ranked_prob_score multiclass     0.217

# tune using Brier class (succeeds)
house_res <- tune_grid(
  house_spec,
  preprocessor = house_rec,
  resamples = vfold_cv(house_train),
  grid = house_grid,
  metrics = metric_set(brier_class)
)
# tune using ranked probability score (fails)
house_res <- tune_grid(
  house_spec,
  preprocessor = house_rec,
  resamples = vfold_cv(house_train),
  grid = house_grid,
  metrics = metric_set(ranked_prob_score)
)
#> → A | error:   `type` should be one of `raw`, `numeric`, `class`, `prob`, `conf_int`,
#>                `pred_int`, `quantile`, `time`, `survival`, `linear_pred`, or `hazard`.
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x19
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> There were issues with some computations   A: x60
#> 
```

<sup>Created on 2025-10-24 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

<details>

<summary>session info</summary>

``` r
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.3 (2023-03-15)
#>  os       macOS Catalina 10.15.7
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-10-24
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#>  quarto   1.8.25 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.5   2025-04-23 [1] CRAN (R 4.2.3)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.2.3)
#>  evaluate      1.0.5   2025-08-27 [1] CRAN (R 4.2.3)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.2.3)
#>  fs            1.6.6   2025-04-12 [1] CRAN (R 4.2.3)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.2.3)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.2.3)
#>  knitr         1.50    2025-03-16 [1] CRAN (R 4.2.3)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.2.0)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.2.3)
#>  rlang         1.1.6   2025-04-11 [1] CRAN (R 4.2.3)
#>  rmarkdown     2.30    2025-09-28 [1] CRAN (R 4.2.3)
#>  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.2.3)
#>  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.2.3)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.2.3)
#>  xfun          0.53    2025-08-19 [1] CRAN (R 4.2.3)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.2.3)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
```

<sup>Created on 2025-10-24 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prediction type error when using ranked probability score metric in tuning #556

The problem

Reproducible example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prediction type error when using ranked probability score metric in tuning #556

Description

The problem

Reproducible example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions