Skip to content

CV on preprocessed training set #1

@silvanhi

Description

@silvanhi

Hey there Richard,
I like your tutorials on Youtube very much. I believe there is an issue in your script:
https://github.com/RichardOnData/YouTube-Scripts/blob/master/R%20Tutorial%20(ML)%20-%20tidymodels.Rmd
on line 298:
folds <- vfold_cv(trainingSet_processed, v = 5, repeats = 5)
shouldn't you use
folds <- vfold_cv(trainingSet, v = 5, repeats = 5)? I believe there is some information leakage when you apply cv on your preprocessed training set instead of on the unpreprocessed training set. Thus, step 1: create splits on training set and step 2: apply recipe on splits.
Please correct me if I'm wrong :)
Best,
Silvan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions