-
Notifications
You must be signed in to change notification settings - Fork 45
CV on preprocessed training set #1
Copy link
Copy link
Open
Description
Hey there Richard,
I like your tutorials on Youtube very much. I believe there is an issue in your script:
https://github.com/RichardOnData/YouTube-Scripts/blob/master/R%20Tutorial%20(ML)%20-%20tidymodels.Rmd
on line 298:
folds <- vfold_cv(trainingSet_processed, v = 5, repeats = 5)
shouldn't you use
folds <- vfold_cv(trainingSet, v = 5, repeats = 5)? I believe there is some information leakage when you apply cv on your preprocessed training set instead of on the unpreprocessed training set. Thus, step 1: create splits on training set and step 2: apply recipe on splits.
Please correct me if I'm wrong :)
Best,
Silvan
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels