Why not use cross entropy for determining which features to mask?

In equation 1 in the paper, you compute the gradient of the element-wise product and the ground truth one-hot label with respect to the input feature vector. This is to find the features that contribute most to the ground truth class logit. For a softmax output, ideally we want the true label logit to be towards positive infinity while the other logits to be towards negative infinity.

So my question is, why not  compute a more classical cross-entropy loss here:
https://github.com/DeLightCMU/RSC/blob/63726803bafd66184cac87d0db8de0c0d58889ba/models/resnet.py#L90

instead of just the sum of the true logits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not use cross entropy for determining which features to mask? #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why not use cross entropy for determining which features to mask? #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions