Skip to content

bug in predict for two-class cases? #14

@luca-scr

Description

@luca-scr

When computing predictions for a two-class case there seems to be a mistake.

Here is a reproducible example:

library(polyreg)
library(MLmetrics)

data(kyphosis, package = "rpart")
kyphosis$y <- ifelse(kyphosis$Kyphosis == "absent", 1, 0)
kyphosis$Kyphosis <- NULL
mod <- glm(y ~ ., data = kyphosis, family = binomial())
mod
# Coefficients:
# (Intercept)          Age       Number        Start  
#     2.03693     -0.01093     -0.41060      0.20651  
#
# Degrees of Freedom: 80 Total (i.e. Null);  77 Residual
# Null Deviance:	    83.23 
# Residual Deviance: 61.38 	AIC: 69.38
table(ifelse(predict(mod, type = "response") > 0.5, 1, 0), kyphosis$y)
#    0  1
# 0  7  3
# 1 10 61
Accuracy(ifelse(predict(mod, type = "response") > 0.5, 1, 0), kyphosis$y)
# 0.8395062
table(ifelse(predict(mod) > 0.5, 1, 0), kyphosis$y)
#    0  1
# 0 10  8
# 1  7 56
Accuracy(ifelse(predict(mod) > 0.5, 1, 0), kyphosis$y)
# 0.8148148

data(kyphosis, package = "rpart")
kyphosis <- kyphosis[,c(2:4,1)]
kyphosis$Kyphosis <- as.character(kyphosis$Kyphosis)
pf <- polyFit(kyphosis, deg = 1, use = "glm")
pf$fit 
# Coefficients:
# (Intercept)           V1           V2           V3  
#     2.03693     -0.01093     -0.41060      0.20651  
# 
# Degrees of Freedom: 80 Total (i.e. Null);  77 Residual
# Null Deviance:	    83.23 
# Residual Deviance: 61.38 	AIC: 69.38

Ok the same model is fitted, but computing predictions:

table(predict(pf, kyphosis), kyphosis$Kyphosis)
#         absent present
# absent      56       7
# present      8      10
Accuracy(predict(pf, kyphosis), kyphosis$Kyphosis)
# 0.8148148

seems to be wrong. Looking at the code you can see

# glm case
  if (is.null(object$glmMethod)) { # only two classes
    pre <- predict(object$fit, plm.newdata)
    pred <- ifelse(pre > 0.5, object$classes[1], object$classes[2])
  } 

IMHO the prediction returned is in the link scale (see help(predict.glm)) but it should be on the probability scale, i.e. type = "response", or if in the link scale pre > 0. However, I prefer to resonate in terms of probability scale.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions