Skip to content

Any way to save more column when carrying pairwise_ function? #29

@hope-data-science

Description

@hope-data-science

I've been benefited from widyr::pairwise_count for years. It is really fast, however, recently I need to get all the combinations within the group and I tried use it again, but this time I want to keep the group ID. Usually, I would mutate a new id (named "id2" usually) and group by this new column, and then use pairwise_count. But it is really slow!
Let me give an example:

> library(dplyr)
> dat <- tibble(group = rep(1:5, each = 2),
+                   letter = c("a", "b",
+                              "a", "c",
+                              "a", "c",
+                              "b", "e",
+                              "b", "f"))
> 
> # count the number of times two letters appear together
> pairwise_count(dat, letter, group)

# A tibble: 8 x 3
  item1 item2     n
  <chr> <chr> <dbl>
1 b     a         1
2 c     a         2
3 a     b         1
4 e     b         1
5 f     b         1
6 a     c         2
7 b     e         1
8 b     f         1

Any way I could get the group number? Just like below:

library(dplyr)
library(widyr)

dat <- tibble(group = rep(1:5, each = 2),
                  letter = c("a", "b",
                             "a", "c",
                             "a", "c",
                             "b", "e",
                             "b", "f"))

dat %>% 
  mutate(group2 = group) %>% 
  group_by(group) %>% 
  pairwise_count(letter,group2) %>% 
  ungroup()

# A tibble: 10 x 4
   group item1 item2     n
   <int> <chr> <chr> <dbl>
 1     1 b     a         1
 2     1 a     b         1
 3     2 c     a         1
 4     2 a     c         1
 5     3 c     a         1
 6     3 a     c         1
 7     4 e     b         1
 8     4 b     e         1
 9     5 f     b         1
10     5 b     f         1

But it is rather slow when there are more groups, any solutions to make it faster?
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions