Conversation
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
|
I've been thinking a bit about this over the weekend, and there are two slightly different sets of changes which are required. The first is on the level of the entire dashboard JSON. I think that we should merge the The second is the trickier one, and involves the contents of those dictionaries in the |
Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Miro Dudik <mdudik@gmail.com>
|
|
||
| ```python | ||
| { | ||
| "prediction_type": "binary_classification" or "probabilistic_binary_classification" or "regression", |
There was a problem hiding this comment.
Any reason why we're omitting multiclass classification?
There was a problem hiding this comment.
because we don't have any support for it yet, but we can definitely add other prediction types in future.
| "name": "y_true", | ||
| "values": [0, 1, 1, 1, 0], | ||
| }, | ||
| "sample_weight": { |
There was a problem hiding this comment.
This is user-provided, not the one set within ExponentiatedGradient, right?
There was a problem hiding this comment.
If so, perhaps it's worth documenting this with a short comment
There was a problem hiding this comment.
will do. this is just an example of an array that we may want to pass to the metrics--since many metrics work with this kind of an argument.
| "sensitive_feature gender" : { # an example feature | ||
| "name": "gender", | ||
| "values": [0, 1, 0, 0, 2], | ||
| "value_names": ["female", "male", "non-binary"], |
There was a problem hiding this comment.
Should there be a 'type' field in here, so things like 'prediction' and 'sensitive_feature' don't have to go into the key?
| }, | ||
| "cache" : [ | ||
| { | ||
| "function": string, # python function name; we could either limit to fairlearn.metrics |
There was a problem hiding this comment.
Use fully qualified names for sure.
| "return_value": { | ||
| "overall": 0.11, | ||
| "by_group": { | ||
| "keys": [0, 1, 2], |
There was a problem hiding this comment.
Are the 'keys' necessary, if we required all categoricals to be integer-encoded?
| "<array_key>" : { # the keys can be arbitrary strings; not sure we need to force any convention, but see examples below | ||
| "name": string, # the name of a feature would be the feature name, of a prediction vector would be the model name | ||
| "values": number[], | ||
| "value_names": string[], # an optional field to encode categorical data |
There was a problem hiding this comment.
Presumably we also specify that extra keys (e.g. inserted by AzureML) are to be preserved.
|
This is something I've started cogitating on again, in the context of the AzureML - now MLFlow - integration. We do want to enable composability, but also avoid saving lots copies of Then again, we also need an API which allows users to 'mess around in a notebook' without having to set up a bunch of prerequisites. A small example of this is how the dashboard doesn't require model and sensitive feature names, but will generate them itself if invoked without them. In contrast |
No description provided.