Add harmonization of metrics to Metric API#9
Conversation
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
api/METRICS.md
Outdated
| _Alternative proposals_: | ||
| * `MeanLoss(<loss>)` | ||
| * `OverallLoss(<loss>)` | ||
| * the object `<loss>` doubles as (1) the loss evaluator and (2) the moment implementing the objective |
There was a problem hiding this comment.
Can we foresee any situation where clubbing them together will hurt us in the future?
Why not just LossMoment(<loss>)? Then it's clear that the moment takes the loss evaluator. With OverallLoss and the others you mention it's still not entirely clear that it's a moment (at least not from the name).
There was a problem hiding this comment.
One use case where this might bite us is if we want to add support for various kinds of sample weights, e.g.:
- balanced loss (the re-weighting happens on the level of the moment, but not on the level of the loss evaluator)
- counterfactual loss (this style of re-weighting corrects for partial observability in data; again it's only incurred on the moment level, but not on the loss level)
I would prefer calling it SomethingLoss to better match what we do for the classification moments--where I think we will be simply calling them <Metric> rather than <Metric>Moment. Also, I don't expect that the user would ever specify this objective explicitly (it would be automatically derived from the BGL constraint object).
There was a problem hiding this comment.
The special case you mention could still be handled fairly simply with if-else, so I'm not too worried.
Hmm, then how do we distinguish between loss evaluator and loss moment in terms of names?
There was a problem hiding this comment.
It occurred to me that we may need a special loss object if we ever want to implement demographic parity for regression--and I just heard that somebody will be working on it this summer, so we definitely need both a loss evaluator object and a loss moment object.
I don't think it's necessarily a problem that we have moments called BoundedGroupLoss and MeanLoss, and loss evaluators called SquareLoss, AbsoluteLoss, and ZeroOneLoss. But if you feel strongly about this, we could maybe change the evaluators to SquareLossEvaluator etc.?
| |`mean_absolute_error` | G,D,R,Max | class, reg | sklearn | class only: `error_rate` | | ||
| |`mean_squared_error`| G,Max | prob, reg | sklearn | - | | ||
| |`r2_score`| G,Min | reg | sklearn | - | | ||
| |`_mean_overprediction` | G | class, reg | | - | |
There was a problem hiding this comment.
When do you think you'll have an alternative to these underscores?
There was a problem hiding this comment.
I'd like to ask the community what their thoughts are given that we're doing somewhat non-standard things here.
| _r_ ⋅ _metric_(_a_) - _metric_(\*) ≤ ε, <br> | ||
| _r_ ⋅ _metric_(\*) - _metric_(_a_) ≤ ε. | ||
|
|
||
| * `DemographicParity` and `EqualizedOdds` have the same calling convention as `<Metric>Parity` |
There was a problem hiding this comment.
Does this mean that we should be able to run direct comparisons between things in metrics and things in moments to make sure they agree that they're calculating the same thing?
There was a problem hiding this comment.
Not quite, but almost... there's a small discrepancy between metrics and moments:
<metric>_differenceand<metric>_ratiolook at the difference / ratio between the max and the min<Metric>Parityis looking at the max difference/ratio between any group and the overall metric
I think that the final solution would be to enable a flag relative_to_overall in <metric>_difference and <metric>_ratio to get the same functionality as in the moments when the flag equals True and the current functionality when the flag equals False. But I'm not sure we need to put it in this proposal (but definitely will put it in the user guide).
There was a problem hiding this comment.
Great catch, Richard. I think this definitely needs to be documented well.
|
Given that this is actually already largely implemented, can we finalise this, and get it merged? |
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
riedgar-ms
left a comment
There was a problem hiding this comment.
A few points I'm not quite sure about.
Signed-off-by: Miro Dudik <mdudik@gmail.com>
| * For each _base metric_, we provide the list of predefined derived metrics, using D, R, Min, Max to refer to the transformations from the table above, and G to refer to `<metric>_group_summary`. We follow these rules: | ||
| * always provide G (except for demographic parity and equalized odds, which do not make sense as group-level metrics) | ||
| * provide D and R for confusion-matrix metrics | ||
| * provide Min for score functions (worst-case score) |
There was a problem hiding this comment.
I can't think of any such cases, but are there scores people could come up with that would work the other way round? Or would you just consider them losses then?
romanlutz
left a comment
There was a problem hiding this comment.
Looks great. We should get to this work asap to avoid having more releases before we finalize this. If you need me to put some time into any of this please reach out.
| _r_ ⋅ _metric_(_a_) - _metric_(\*) ≤ ε, <br> | ||
| _r_ ⋅ _metric_(\*) - _metric_(_a_) ≤ ε. | ||
|
|
||
| * `DemographicParity` and `EqualizedOdds` have the same calling convention as `<Metric>Parity` |
There was a problem hiding this comment.
Great catch, Richard. I think this definitely needs to be documented well.
No description provided.