-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
I saw your code like below.
def softmax_with_temperature(self, x, beta, d = 1):
M, _ = x.max(dim=d, keepdim=True)
x = x - M # subtract maximum value for stability
exp_x = torch.exp(beta*x)
exp_x_sum = exp_x.sum(dim=d, keepdim=True)
return exp_x / exp_x_sum
but, in your paper, m_p(q)= softmax(beta * kp * n_p(q)).
According to your paper, It seems that x = x - M -> x = x * M more proper.
Is it right?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels