-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hello,
My goal is to summarize a number of variables of very different nature and scales with inline plots.
I would like to be able to manually provide the parameters to density or histogram plots.
The defaults use the information from each variable, but do not work well for me in all cases.
When I provide my own value, it pertains to all plots in a column.
set.seed(1030)
data <- data.frame(Age = rnorm(40, mean=44, sd =20),
Sex = factor(rbinom(40, 1, prob = c(0.4, 0.6)),
levels = 0:1,
labels = c("Male", "Female")),
X = runif(40, 10, 20),
Y = c(rbeta(40, 0.15, 0.3) * 40))
library(gtExtras)
library(tidyr)
library(dplyr)
data_l <- data %>%
pivot_longer(cols = X:Y, names_to = "Variable", values_to = "Value" ) %>%
group_by(Variable) %>%
summarize(Mean= mean(Value),
SD = sd(Value),
Value = list(Value)) %>%
mutate(Value1 = Value)
data_l %>%
gt() %>%
gt_plt_dist( Value,
type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>%
gt_plt_dist(Value1,
type = "density", line_color = "purple", fill_color = "green", same_limit = FALSE)
For the type="histogram" the upper plot is much better, but the lower is "worse" (to me).
data_l %>%
gt() %>%
gt_plt_dist( Value,
type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>%
gt_plt_dist(Value1,
type = "density", line_color = "purple", fill_color = "green", same_limit = FALSE, bw = .8)
OK, this is better but I'd prefer to adjust it more per case:
For the histogram I used the Freedman-Diaconis rule implemented in R, so now it resembles a bit more the beta "U" shaped distribution:
fd_binwidth <- function(x) {
num_bins <- nclass.FD(x)
data_range <- max(x) - min(x)
bin_width <- data_range / num_bins
return(bin_width)
}
data_l %>%
gt() %>%
gt_plt_dist( Value,
type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>%
gt_plt_dist(Value1,
type = "histogram", line_color = "purple", fill_color = "green", same_limit = FALSE, bw = fd_binwidth)
}
which gives a little bit better result.

But there any trick, any way to tell the function to use different BW for different variables, e.g. 0.1 for variable 1, 0.5 for variable 2, and so on? Any "named vector", list, etc?
Or maybe these table rows could be made separately, row-by-row in a loop / map, each with appropriate BW, and then somehow combined into the final table?