renaming grouping functions?

We have some functions that suggest shuffling "grouped" data that for me are a bit confusing:

```
designit::shuffle_grouped_data(batch_container,  allocate_var,  keep_together_vars = c(),  keep_separate_vars = c(),  n_min = NA,  
                                                   n_max = NA,  n_ideal = NA,  subgroup_var_name = NULL,  report_grouping_as_attribute = FALSE,  
                                                   prefer_big_groups = FALSE,  strict = TRUE,  fullTree = FALSE, maxCalls = 1e+06)

designit::mk_subgroup_shuffling_function(subgroup_vars, restrain_on_subgroup_levels = c(), n_swaps = 1)

designit::shuffle_with_subgroup_formation(subgroup_object,  subgroup_allocations,  keep_separate_vars = c(),
                                                                     report_grouping_as_attribute = FALSE)
```

I guess they come from the invivo example and are tailored to that.

Maybe I didn't fully get them, but for a simple grouping problem I had, none of them were working.

* I had ~50 patients, 25with one, 25 with two measurements. 
* I wanted to put them in batches such that samples of the same patient were put in the same batch.

I thought a function "shuffle_grouped_data" would do this, given the variable names that form the groups (in my case Patient ID).

Iakov helped with a solution for that particular case, which, a bit more generalized, could be part of the package:

```
# not parametrized...
keep_groups_together <- function(bc, i) {
  d <- bc$get_samples(include_id = TRUE) |>
    mutate(location_id = row_number())
  # select random src location
  src_id <- d |>
    # exclude empty locations
    filter(!is.na(.sample_id)) |>
    sample_n(1) |>
    pull(location_id)
  stopifnot(length(src_id) == 1)

  # find all samples with matching `Subject ID` and timepoint
  all_src_id <- d |>
    filter(
      # exclude empty locations
      !is.na(.sample_id),
      # we are searching for matching samples
      `Subject ID` == d$`Subject ID`[src_id]
    ) |>
    pull(location_id)

  dst_id <- d |>
    filter(
      # we don't want source locations
      !location_id %in% all_src_id
    ) |>
    group_by(`Subject ID`) |>
    # we only choose empty or location of "lonely" samples
    filter(is.na(.sample_id) | n() == 1) |>
    # find suitable Run with enough space
    group_by(Run) |>
    filter(n_distinct(location_id) >= length(all_src_id)) |>
    ungroup() |>
    # choose destination Run
    filter(Run == sample(unique(Run), 1)) |>
    sample_n(length(all_src_id)) |>
    pull(location_id)
  list(
    src = c(all_src_id, dst_id),
    dst = c(dst_id, all_src_id)
  )
}
```

But then I wonder, should we discuss about the namings of all those functions so that its clearer what they do?

@ingitwetrust and @idavydov what are your thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

renaming grouping functions? #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

renaming grouping functions? #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions