Skip to content

[ENH]: Implement a Repetead Group K Fold CV #281

@fraimondo

Description

@fraimondo

Which feature do you want to include?

We need a repeated (non-deterministic) group k-fold

How do you imagine this integrated in julearn?

Something like this, from @kaurao using ChatGPT

import numpy as np
from sklearn.model_selection import GroupKFold

class RepeatedGroupKFold:
    def __init__(self, n_splits=5, n_repeats=5, random_state=None):
        self.n_splits = n_splits
        self.n_repeats = n_repeats
        self.random_state = np.random.RandomState(random_state)

    def split(self, X, y=None, groups=None):
        if groups is None:
            raise ValueError("Groups must be provided for GroupKFold.")

        unique_groups = np.unique(groups)

        for repeat in range(self.n_repeats):
            # Shuffle groups before each repeat
            shuffled_groups = self.random_state.permutation(unique_groups)

            folds = np.array_split(shuffled_groups, self.n_splits)

            for fold_groups in folds:
                test_idx = np.isin(groups, fold_groups)
                train_idx = ~test_idx

                yield np.where(train_idx)[0], np.where(test_idx)[0]

    def get_n_splits(self, X=None, y=None, groups=None):
        return self.n_splits * self.n_repeats

Do you have a sample code that implements this outside of julearn?

Anything else to say?

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions