Frequent Itemset Clustering (Apriori and ECLAT) by Wander03 · Pull Request #210 · tidymodels/tidyclust

Wander03 · 2025-06-19T23:17:37Z

@kbodwin

Relates to other conversations about column-based clustering, e.g. Consider partition data reduction algorithm #66
Adds a partition mode with engine arules to tidyclust (freq_itemsets)
Adds custom cluster and predict functions for freq_itemsets()
Adds extract_predictions() which reformates predict() output into a more readable format
Adds augment_itemset_predict() which reformates predict() output for metric functions (e.g. in yardstick)

Note:

devtools::check() resulted in a warning about code dependencies from purr and stringr

add predict to vingette

… assignment

EmilHvitfeldt

To more things:

Add the following .pred_item item preds row_id setNames truth_value to utils::globalVariables() in aaa.R.
Add exported functions to _pkgdown.yml`

i think i would like to chat about these prediction types in #211 before going through with this PR

EmilHvitfeldt · 2025-06-24T20:44:16Z

R/extract_predictions.R

+#' @return A data frame with items as columns and non-NA values as rows.
+#' @export
+
+extract_predictions <- function(pred_output) {


I'm not a big fan of this function, and i think most of it is in the name.

I'm afraid users would confuse it with collect_predictions().

please enlighten me @kbodwin, @Wander03, does users need access to both formats of this input/output?

The idea here is essentially that we wanted to respect the tidyclust predict() output structure; namely, a one-column tibble. But the output of predictions in column-based clustering like association rules is not cluster assignments, but matrix completion.

What we arrived at was to return a list-col, where each element of the column represents the matrix completion result for that row of the test data.

However, in most use cases, the user wouldn't really need this list-col and would instead want the completed matrix. So, extract_predictions() was created to take the tidyclust output object and reconfigure it as the data matrix with predicted completions inserted.

We definitely have no issue with renaming it. But I believe helper function like this is very needed for methods of this structure - unless we choose to expand the allowed structures that predict() itself returns.

EmilHvitfeldt · 2025-06-24T20:45:17Z

R/extract_fit_summary.R

+#' @export
+extract_fit_summary.itemsets <- function(object, ...,
+                                         call = rlang::caller_env(n = 0)) {
+  rlang::abort(


please convert all rlang::abort() calls to use {cli}, see 13f30dd for inspiration, or tag me if you need help

EmilHvitfeldt · 2025-06-24T20:46:36Z

tests/testthat/test-extract_centroids.R

+toy_df <- data.frame(
+  'beer'    = c(F, T, T, T, F),
+  'milk'    = c(T, F, T, T, T),
+  'bread'   = c(T, T, F, T, T),
+  'diapers' = c(T, T, T, T, T),
+  'eggs'    = c(F, T, F, F, F)
+)


Suggested change

toy_df <- data.frame(

'beer' = c(F, T, T, T, F),

'milk' = c(T, F, T, T, T),

'bread' = c(T, T, F, T, T),

'diapers' = c(T, T, T, T, T),

'eggs' = c(F, T, F, F, F)

)

toy_df <- data.frame(

"beer" = c(FALSE, TRUE, TRUE, TRUE, FALSE),

"milk" = c(TRUE, FALSE, TRUE, TRUE, TRUE),

"bread" = c(TRUE, TRUE, FALSE, TRUE, TRUE),

"diapers" = c(TRUE, TRUE, TRUE, TRUE, TRUE),

"eggs" = c(FALSE, TRUE, FALSE, FALSE, FALSE)

)

This does two things, stops the usage of ' over " and uses the full name for TRUE and FALSE

should be changed all places

EmilHvitfeldt · 2025-06-24T21:18:08Z

R/augment_itemset_predict.R

+#' @export
+
+augment_itemset_predict <- function(pred_output, truth_output) {


All exported functions need examples.

I would also like to see the example to help determine the use of it

EmilHvitfeldt · 2025-06-24T21:18:57Z

tests/testthat/test-extract_centroids.R

 })
+
+test_that("extract_centroids errors for freq_itemsets", {
+  set.seed(1234)


please add skip_if_not_installed("arules") to all tests that use freq_itemsets()

EmilHvitfeldt · 2025-06-24T21:28:22Z

R/extract_cluster_assignment.R

+  items <- attr(object, "item_names")
+  itemsets <- arules::DATAFRAME(object)
+
+  itemset_list <- lapply(strsplit(gsub("[{}]", "", itemsets$items), ","), stringr::str_trim)


more stringr https://stringr.tidyverse.org/articles/from-base.html

EmilHvitfeldt · 2025-06-24T21:28:27Z

R/predict_helpers.R

+  # Extract frequent itemsets and their supports
+  items <- attr(object, "item_names")
+  itemsets <- arules::DATAFRAME(object)
+  frequent_itemsets <- lapply(strsplit(gsub("[{}]", "", itemsets$items), ","), stringr::str_trim)


more stringr https://stringr.tidyverse.org/articles/from-base.html

EmilHvitfeldt · 2025-06-24T21:28:34Z

R/predict_helpers.R

+
+    # Create result data frame
+    data.frame(
+      item = stringr::str_remove_all(items, "`"), # Remove backticks from item names


more stringr https://stringr.tidyverse.org/articles/from-base.html

EmilHvitfeldt · 2025-06-24T21:30:01Z

R/extract_predictions.R

+
+  # Process each observation and combine results using reduce
+  result_df <- data_frames %>%
+    purrr::reduce(.f = ~ {


please use the reduce() from compat-purrr.R

EmilHvitfeldt · 2025-06-24T21:30:26Z

R/extract_cluster_assignment.R

+  unique_non_zero_clusters <- unique(non_zero_clusters)
+
+  # Map each unique non-zero cluster to a new cluster starting from Cluster_1
+  cluster_map <- setNames(paste0(prefix, seq_along(unique_non_zero_clusters)), unique_non_zero_clusters)


Suggested change

cluster_map <- setNames(paste0(prefix, seq_along(unique_non_zero_clusters)), unique_non_zero_clusters)

cluster_map <- stats::setNames(paste0(prefix, seq_along(unique_non_zero_clusters)), unique_non_zero_clusters)

.pred_item item preds row_id setNames truth_value to utils::globalVariables() in aaa.R

add example to `extract_itemset_predictions`

… of T/F

…ts()

Wander03 · 2025-07-03T23:53:02Z

Hi Emil! I believe that I addressed all your comments, please let me know if I missed something or if there is something else I need to edit.

Wander03 added 30 commits November 10, 2024 17:14

created bsaed function for frequent itemsets and association rules

c785217

testing new functions

4fed387

clustering for freq itemsets function

f70d6f6

progress!

7fb5ede

fix conditions

f2a51f5

fix conditions

f8787f4

Update text

ea1c695

change method to mining_method

2d30942

create vignette for freq itemsets

9459749

bug fixing

144036c

fixed name

1c19e9d

premptive changes

6ae24be

bug fixing freq itemsets

94dcf46

code formatting

f5c8dd1

bug fixes

d01188f

updating cluster functions

1074d74

save average supports for each cluster (to be used in predict)

fdeac74

predict not saving output

02dbce1

some change

ecf7486

fixed predcit! Proba is now put in N/A spots

cff078c

remove avg support tracker (unused)

4d45209

change best cluster to prioritize size then support

1f8f633

add predict to vingette

vignette testing

690c385

predict output formated & cutoff implemented

ae9917b

create holder for extract_predictions function (placeholder name)

a6826c3

hard code cutoff

cf3e82b

change predict formating

c6acf01

something>

d81fc52

extract_predictions complete! (still needs a better name)

cfb5d74

move detail text

5fc3153

Wander03 added 16 commits May 6, 2025 20:35

hide predict dataframe from arules::inspect()

0f9e947

remove `` from predict output item names

e12ae62

added note

3bc1e43

re-roder doesnt matter for fit

ace0ce4

hide freq itemset output from auto displaying when extracting cluster…

2ed1a98

… assignment

rename col name in predict

18566d2

vignettes update with new info from thesis

4315559

added header descriptions about functions

083cb4d

added convergence limit and warning message

1923aea

update freq_itemsets extract_fit_summary and ? information

3bfdba4

vignette update

39a92eb

create test cases for freq_itemsets

d68410d

move min_support tuning to dials

19c58ae

Merged upstream/main into main

b59df0b

re-ran test cases

fe2537d

remove assoc_rules

96ad9f2

Wander03 mentioned this pull request Jun 19, 2025

add freq_itemsets tunable param tidymodels/dials#390

Open

EmilHvitfeldt reviewed Jun 24, 2025

View reviewed changes

Wander03 added 11 commits July 3, 2025 14:50

Add the following

6e5a28f

.pred_item item preds row_id setNames truth_value to utils::globalVariables() in aaa.R

rename extract_predictions to extract_itemset_predictions

1ae32d8

add example to `extract_itemset_predictions`

rename extract_predictions to extract_itemset_predictions

f808e10

Add exported functions to _pkgdown.yml

0629618

convert all rlang::abort() calls to use {cli}

e22c3c6

edit toy_df and toy_pred to use " instead of ' and TRUE/FALSE instead…

842f8d7

… of T/F

add example to augment_itemset_predict

305b078

add skip_if_not_installed("arules") to all tests that use freq_itemse…

5860141

…ts()

use base R rather than stringr

32705bf

use the reduce() from compat-purrr.R

7b2751c

stats::setNames

fbe29b3

		#' @export

		augment_itemset_predict <- function(pred_output, truth_output) {

	cluster_map <- setNames(paste0(prefix, seq_along(unique_non_zero_clusters)), unique_non_zero_clusters)
	cluster_map <- stats::setNames(paste0(prefix, seq_along(unique_non_zero_clusters)), unique_non_zero_clusters)

Conversation

Wander03 commented Jun 19, 2025 • edited by EmilHvitfeldt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilHvitfeldt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wander03 commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wander03 commented Jun 19, 2025 •

edited by EmilHvitfeldt

Loading