-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Hi! Congratluations on the great package!
I have come across an unexpected behaviour of the PseudoR2 function: it returns a wrong value of McFadden's Pseudo-R2 for binomial GLMs that model successes out of failures. It appears to work fine with binomial GLMs that model binary responses. I provide a reprex with overdispersed binomial GLM, it shows the same warning as my own data which are not overdispersed:
In res["Tjur"] <- unname(diff(tapply(y.hat.resp, y, mean, na.rm = TRUE))) : number of items to replace is not a multiple of replacement length
Previous reported issues with PseudoR2 did not deal with binomial GLMs explicitly, as far as I can tell: #19
Thank you for your time!!
library(DescTools)
library(DHARMa)
#> Warning: package 'DHARMa' was built under R version 4.2.3
#> This is DHARMa 0.4.6. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
library(faraway)
#> Warning: package 'faraway' was built under R version 4.2.2
library(performance)
#> Warning: package 'performance' was built under R version 4.2.3
# Binomial GLM: k success out of K trials
data(troutegg, package="faraway") # Survival of trout eggs
m1 <- glm(cbind(survive,total-survive) ~ location+period, family=binomial, troutegg)
summary(m1)
#>
#> Call:
#> glm(formula = cbind(survive, total - survive) ~ location + period,
#> family = binomial, data = troutegg)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -4.8305 -0.3650 -0.0303 0.6191 3.2434
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 4.6358 0.2813 16.479 < 2e-16 ***
#> location2 -0.4168 0.2461 -1.694 0.0903 .
#> location3 -1.2421 0.2194 -5.660 1.51e-08 ***
#> location4 -0.9509 0.2288 -4.157 3.23e-05 ***
#> location5 -4.6138 0.2502 -18.439 < 2e-16 ***
#> period7 -2.1702 0.2384 -9.103 < 2e-16 ***
#> period8 -2.3256 0.2429 -9.573 < 2e-16 ***
#> period11 -2.4500 0.2341 -10.466 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1021.469 on 19 degrees of freedom
#> Residual deviance: 64.495 on 12 degrees of freedom
#> AIC: 157.03
#>
#> Number of Fisher Scoring iterations: 5
# Overdispersed binomial (depending on who you ask)
DHARMa::testDispersion(m1, alternative = "greater")#>
#> DHARMa nonparametric dispersion test via sd of residuals fitted vs.
#> simulated
#>
#> data: simulationOutput
#> dispersion = 1.0411, p-value = 0.144
#> alternative hypothesis: greater
performance::check_overdispersion(m1)
#> # Overdispersion test
#>
#> dispersion ratio = 5.330
#> Pearson's Chi-Squared = 63.964
#> p-value = < 0.001
#> Overdispersion detected.
PseudoR2(m1, which = "McFadden")
#> Warning in res["Tjur"] <- unname(diff(tapply(y.hat.resp, y, mean, na.rm =
#> TRUE))): número de items para para sustituir no es un múltiplo de la longitud
#> del reemplazo
#> McFadden
#> 0.9989086
1 - logLik(m1)/logLik(update(m1, . ~ 1))
#> 'log Lik.' 0.8715584 (df=8)
# Binomial GLM: 0 vs 1 (dying vs surviving the Titanic's maiden voyage)
m2 <- glm(Survived ~ ., data=Untable(Titanic), family=binomial)
summary(m2)
#>
#> Call:
#> glm(formula = Survived ~ ., family = binomial, data = Untable(Titanic))
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.0812 -0.7149 -0.6656 0.6858 2.1278
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.6853 0.2730 2.510 0.0121 *
#> Class2nd -1.0181 0.1960 -5.194 2.05e-07 ***
#> Class3rd -1.7778 0.1716 -10.362 < 2e-16 ***
#> ClassCrew -0.8577 0.1573 -5.451 5.00e-08 ***
#> SexFemale 2.4201 0.1404 17.236 < 2e-16 ***
#> AgeAdult -1.0615 0.2440 -4.350 1.36e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 2769.5 on 2200 degrees of freedom
#> Residual deviance: 2210.1 on 2195 degrees of freedom
#> AIC: 2222.1
#>
#> Number of Fisher Scoring iterations: 4
PseudoR2(m2, which = "McFadden")
#> McFadden
#> 0.2019875
1 - logLik(m2)/logLik(update(m2, . ~ 1))
#> 'log Lik.' 0.2019875 (df=6)
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] performance_0.10.5 faraway_1.0.8 DHARMa_0.4.6 DescTools_0.99.45
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.10 nloptr_2.0.1 cellranger_1.1.0 compiler_4.2.0
#> [5] highr_0.9 class_7.3-20 tools_4.2.0 lme4_1.1-29
#> [9] boot_1.3-28 digest_0.6.29 nlme_3.1-157 evaluate_0.15
#> [13] lifecycle_1.0.3 rootSolve_1.8.2.3 lattice_0.20-45 rlang_1.1.1
#> [17] reprex_2.0.2 Matrix_1.4-1 cli_3.6.1 rstudioapi_0.13
#> [21] yaml_2.3.5 mvtnorm_1.1-3 expm_0.999-6 xfun_0.40
#> [25] fastmap_1.1.0 e1071_1.7-9 httr_1.4.3 withr_2.5.0
#> [29] stringr_1.5.0 knitr_1.39 fs_1.5.2 vctrs_0.6.3
#> [33] gld_2.6.4 grid_4.2.0 glue_1.6.2 data.table_1.14.2
#> [37] R6_2.5.1 readxl_1.4.0 lmom_2.8 rmarkdown_2.14
#> [41] minqa_1.2.4 magrittr_2.0.3 splines_4.2.0 htmltools_0.5.6
#> [45] MASS_7.3-57 insight_0.19.5 Exact_3.1 stringi_1.7.6
#> [49] proxy_0.4-26
Created on 2023-10-04 with reprex v2.0.2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
