Skip to content

Add harmonization of metrics to Metric API#9

Merged
MiroDudik merged 10 commits intofairlearn:masterfrom
MiroDudik:more-metrics-API
May 29, 2020
Merged

Add harmonization of metrics to Metric API#9
MiroDudik merged 10 commits intofairlearn:masterfrom
MiroDudik:more-metrics-API

Conversation

@MiroDudik
Copy link
Member

No description provided.

Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
@MiroDudik MiroDudik requested review from riedgar-ms and romanlutz May 4, 2020 18:26
api/METRICS.md Outdated
_Alternative proposals_:
* `MeanLoss(<loss>)`
* `OverallLoss(<loss>)`
* the object `<loss>` doubles as (1) the loss evaluator and (2) the moment implementing the objective
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we foresee any situation where clubbing them together will hurt us in the future?
Why not just LossMoment(<loss>)? Then it's clear that the moment takes the loss evaluator. With OverallLoss and the others you mention it's still not entirely clear that it's a moment (at least not from the name).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One use case where this might bite us is if we want to add support for various kinds of sample weights, e.g.:

  • balanced loss (the re-weighting happens on the level of the moment, but not on the level of the loss evaluator)
  • counterfactual loss (this style of re-weighting corrects for partial observability in data; again it's only incurred on the moment level, but not on the loss level)

I would prefer calling it SomethingLoss to better match what we do for the classification moments--where I think we will be simply calling them <Metric> rather than <Metric>Moment. Also, I don't expect that the user would ever specify this objective explicitly (it would be automatically derived from the BGL constraint object).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special case you mention could still be handled fairly simply with if-else, so I'm not too worried.

Hmm, then how do we distinguish between loss evaluator and loss moment in terms of names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me that we may need a special loss object if we ever want to implement demographic parity for regression--and I just heard that somebody will be working on it this summer, so we definitely need both a loss evaluator object and a loss moment object.

I don't think it's necessarily a problem that we have moments called BoundedGroupLoss and MeanLoss, and loss evaluators called SquareLoss, AbsoluteLoss, and ZeroOneLoss. But if you feel strongly about this, we could maybe change the evaluators to SquareLossEvaluator etc.?

|`mean_absolute_error` | G,D,R,Max | class, reg | sklearn | class only: `error_rate` |
|`mean_squared_error`| G,Max | prob, reg | sklearn | - |
|`r2_score`| G,Min | reg | sklearn | - |
|`_mean_overprediction` | G | class, reg | | - |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you think you'll have an alternative to these underscores?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask the community what their thoughts are given that we're doing somewhat non-standard things here.

_r_ &sdot; _metric_(_a_) - _metric_(\*) &le; &epsilon;, <br>
_r_ &sdot; _metric_(\*) - _metric_(_a_) &le; &epsilon;.

* `DemographicParity` and `EqualizedOdds` have the same calling convention as `<Metric>Parity`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we should be able to run direct comparisons between things in metrics and things in moments to make sure they agree that they're calculating the same thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, but almost... there's a small discrepancy between metrics and moments:

  • <metric>_difference and <metric>_ratio look at the difference / ratio between the max and the min
  • <Metric>Parity is looking at the max difference/ratio between any group and the overall metric

I think that the final solution would be to enable a flag relative_to_overall in <metric>_difference and <metric>_ratio to get the same functionality as in the moments when the flag equals True and the current functionality when the flag equals False. But I'm not sure we need to put it in this proposal (but definitely will put it in the user guide).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, Richard. I think this definitely needs to be documented well.

@riedgar-ms
Copy link
Member

Given that this is actually already largely implemented, can we finalise this, and get it merged?

MiroDudik added 2 commits May 25, 2020 21:54
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Copy link
Member

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points I'm not quite sure about.

Signed-off-by: Miro Dudik <mdudik@gmail.com>
* For each _base metric_, we provide the list of predefined derived metrics, using D, R, Min, Max to refer to the transformations from the table above, and G to refer to `<metric>_group_summary`. We follow these rules:
* always provide G (except for demographic parity and equalized odds, which do not make sense as group-level metrics)
* provide D and R for confusion-matrix metrics
* provide Min for score functions (worst-case score)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any such cases, but are there scores people could come up with that would work the other way round? Or would you just consider them losses then?

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. We should get to this work asap to avoid having more releases before we finalize this. If you need me to put some time into any of this please reach out.

_r_ &sdot; _metric_(_a_) - _metric_(\*) &le; &epsilon;, <br>
_r_ &sdot; _metric_(\*) - _metric_(_a_) &le; &epsilon;.

* `DemographicParity` and `EqualizedOdds` have the same calling convention as `<Metric>Parity`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, Richard. I think this definitely needs to be documented well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants