Conversation
|
@davidslater The poisoning scenario adds this metric automatically, right in the code. Is this an acceptable way to make it available to other scenarios through the config/metric/task section? |
|
I like adding it to the config part of instrumentation. |
|
@davidslater So far I've added (beside per-class accuracy) a confusion matrix, and precision and recall for each class. Two questions: Will you check that I'm computing the right thing (I described my understanding in the comments), and that the output is in a sufficiently useful format (dict, array, etc)? Can you think of any other metrics or statistics that would be nice to have? |
davidslater
left a comment
There was a problem hiding this comment.
Let's move the metrics to task, as they take in the paired y and y_pred as inputs.
armory/metrics/statistical.py
Outdated
| if y_pred.ndim == 2: | ||
| y_pred = np.argmax(y_pred, axis=1) | ||
| N = len(np.unique(y)) |
There was a problem hiding this comment.
If y_pred is 2D, you can use that to derive N. (Or at least check to ensure that they match).
There was a problem hiding this comment.
I may be misunderstanding, but N is the number of classes, not the total number of items. Hence length of np.unique(y) and not length of y. I don't think we can assume every class will show up in y_pred. For that matter it seems a little risky to assume they will all be present in y.
There was a problem hiding this comment.
If y_pred is 2D, then it outputs either logits or probability distributions over the set of predicted classes, so you can do N = y_pred.shape[1].
If y_pred is 1D, however, that doesn't work.
There was a problem hiding this comment.
I think this implicitly assumes that the classes are all integers from 0 to N - 1. However, if y has missing classes, then there will be some misalignment.
There was a problem hiding this comment.
Oh right, of course. Is it true that Armory scenarios will always have a 2D y_pred? Or it just depends on how the meter and probes are set up? So far the only source of a 1D y_pred I've encountered is my own unit tests, but I can expand those to 2D and then get N the way you described.
There was a problem hiding this comment.
Right now it's dependent on the underlying model, unfortunately.
There was a problem hiding this comment.
Well after all, if a class is totally absent from y, its row in the matrix would be all zeros since it was never classified as anything. So maybe what I need to do is make this a dictionary after all, and key it with class labels, so if one is missing, then at least it will be clear what rows are what class. Alternatively, I could add a row of zeros at the index of missing class labels, but this would only be possible for missing labels less than the greatest non-missing label.
There was a problem hiding this comment.
Let's just have the function assume that y_pred is 2D (and add that to the docstring). Other things can be handled by the user.
armory/metrics/statistical.py
Outdated
| total_selected = C[:, class_].sum() | ||
| precision = tp / total_selected | ||
|
|
||
| # recall: true positives / number of actual items in class_ |
There was a problem hiding this comment.
per-class recall is the exact same as per-class accuracy, which I didn't realize till now. Is it still useful to have two separate per_class_accuracy and per_class_precision_and_recall functions?
|
See my two recent comments. Beyond that, I think what needs to be done is: |
For #492