I think it is better to use barplot in palce of point pdp plot for categorical variables using autoplot function as suggested here. I am rpoviding a minimal reproduicible example here
library(pdp)
# dummy data
categorical <- c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B')
numerical <- c(1, 2, 3, 4, 1, 2, 3, 4)
target <- c(100, 200, 300, 400, 500, 600, 700, 800)
data <- data.frame(categorical, numerical, target)
data$categorical <- factor(data$categorical)
set.seed(101) # for reproducibility
mod.rf <- randomForest(target ~ ., data = data)
cat.pdp <- partial(mod.rf, pred.var = c("categorical"))
autoplot(cat.pdp, contour = TRUE)
autoplot(cat.pdp, contour = TRUE) +
geom_col()
As we can see, the points are also plotted and the y-axis limit is starting from 0 unlike the point plot where y-axis limits are optimised according to the data.
It would be better if the point plot is repalced by barplot.
I think it is better to use barplot in palce of point pdp plot for categorical variables using autoplot function as suggested here. I am rpoviding a minimal reproduicible example here
Then I have tried the following code to generate barplot
As we can see, the points are also plotted and the y-axis limit is starting from 0 unlike the point plot where y-axis limits are optimised according to the data.
It would be better if the point plot is repalced by barplot.