Is it possible to use different bandwidth / binwidth for different variables / rows?

Hello,
My goal is to summarize a number of variables of very different nature and scales with inline plots.
I would like to be able to manually provide the parameters to density or histogram plots.

The defaults use the information from each variable, but do not work well for me in all cases.
When I provide my own value, it pertains to all plots in a column.

```{r}
set.seed(1030)
data <- data.frame(Age = rnorm(40, mean=44, sd =20),
                   Sex = factor(rbinom(40, 1, prob = c(0.4, 0.6)),
                                levels = 0:1, 
                                labels = c("Male", "Female")),
                   X = runif(40, 10, 20),
                   Y = c(rbeta(40, 0.15, 0.3) * 40))


library(gtExtras)
library(tidyr)
library(dplyr)

data_l <- data %>%
  pivot_longer(cols = X:Y, names_to = "Variable", values_to = "Value" ) %>% 
  group_by(Variable) %>%
  summarize(Mean= mean(Value),
            SD = sd(Value),
            Value = list(Value)) %>% 
  mutate(Value1 = Value)

data_l %>%
  gt() %>% 
  gt_plt_dist( Value,
               type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>% 
  gt_plt_dist(Value1,
              type = "density", line_color = "purple", fill_color = "green", same_limit = FALSE) 
```

<img width="786" height="269" alt="Image" src="https://github.com/user-attachments/assets/95cf48c3-3d75-4dce-b2b9-b4a6b0291cda" />

For the `type="histogram"` the upper plot is much better, but the lower is "worse" (to me).

<img width="618" height="176" alt="Image" src="https://github.com/user-attachments/assets/d8e86a11-1f1c-45ed-bda4-32b9a0e47296" />

```{r}

data_l %>%
  gt() %>% 
  gt_plt_dist( Value,
               type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>% 
  gt_plt_dist(Value1,
              type = "density", line_color = "purple", fill_color = "green", same_limit = FALSE, bw = .8) 
```
OK, this is better but I'd prefer to adjust it more per case:

<img width="609" height="154" alt="Image" src="https://github.com/user-attachments/assets/992e2a07-a98a-4ab4-87de-fde2251c2f02" />

For the histogram I used the Freedman-Diaconis rule implemented in R, so now it resembles a bit more the beta "U" shaped distribution:
```{r}
fd_binwidth <- function(x) {
  num_bins <- nclass.FD(x)
  data_range <- max(x) - min(x)
  bin_width <- data_range / num_bins
  return(bin_width)
}

data_l %>%
  gt() %>% 
  gt_plt_dist( Value,
               type = "boxplot", line_color = "purple", fill_color = "green", same_limit = FALSE) %>% 
  gt_plt_dist(Value1,
              type = "histogram", line_color = "purple", fill_color = "green", same_limit = FALSE, bw = fd_binwidth) 
}
```
which gives a little bit better result. 
<img width="630" height="167" alt="Image" src="https://github.com/user-attachments/assets/3565cc10-ef34-4cfa-b916-0a50582bad48" />

But there any trick, any way to tell the function to use different BW for different variables, e.g. 0.1 for variable 1, 0.5 for variable 2, and so on? Any "named vector", list, etc?

Or maybe these table rows could be made separately, row-by-row in a loop / map, each with appropriate BW, and then somehow combined into the final table?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use different bandwidth / binwidth for different variables / rows? #153

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is it possible to use different bandwidth / binwidth for different variables / rows? #153

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions