Skip to content

Why not use cross entropy for determining which features to mask? #11

@BrianPugh

Description

@BrianPugh

In equation 1 in the paper, you compute the gradient of the element-wise product and the ground truth one-hot label with respect to the input feature vector. This is to find the features that contribute most to the ground truth class logit. For a softmax output, ideally we want the true label logit to be towards positive infinity while the other logits to be towards negative infinity.

So my question is, why not compute a more classical cross-entropy loss here:

one_hot = torch.sum(output * one_hot_sparse)

instead of just the sum of the true logits?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions