Which feature do you want to include?
For extreme imbalance scenarios (1 to 5% positive classes), F1 should not be calculated in each fold and then averaged. Instead, the True positives and False positives should be counted in each fold, and then a final F1 score calculated. This avoids biased results when computing F1 in each fold (which could also be undetermined, if no True classes are in the test set).
Both performances converge when the problem is balanced.
Forman et at.
How do you imagine this integrated in julearn?
Retain the True positives and False positives for each fold, and then calculate a final F1 score.
Do you have a sample code that implements this outside of julearn?
Anything else to say?
No response