Related: #20
Currently no measure is computed that's useful for highly imbalanced classes.
Take for example sick:
https://www.openml.org/t/3021
I would like to see the "mean" measures be computed in particular (they also are helpful for comparison with D3M, cc @joaquinvanschoren).
On the other hand, the "weighted" measures are not computed but seem to be duplicates of the measure without prefix, which is also weighted by class size:
https://www.openml.org/a/evaluation-measures/mean-weighted-f-measure
https://www.openml.org/a/evaluation-measures/f-measure
Though that's not entirely clear from the documentation. If the f-measure documentation is actually accurate (which I don't think it is), that would be worse because it's unclear for which class the f-measure is reported.