I am confused by the implementation of pseudo-labelling in this library (lib/algs/pseudo_label.py). Especially, the forward() has:
y_probs = y.softmax(1)
onehot_label = self.__make_one_hot(y_probs.max(1)[1]).float()
gt_mask = (y_probs > self.th).float()
gt_mask = gt_mask.max(1)[0] # reduce_any
lt_mask = 1 - gt_mask # logical not
p_target = gt_mask[:,None] * 10 * onehot_label + lt_mask[:,None] * y_probs
output = model(x)
loss = (-(p_target.detach() * F.log_softmax(output, 1)).sum(1)*mask).mean()
return loss
I am confused why when computing p_target, the gt_mask is multiplied by 10? What is meaning of 10 here?
Also, I believe the lt_mask means the examples with max probability smaller than threshold and thus should be ignored when computing the loss. However, the p_target has the + lt_mask[:,None] * y_probs.
This seems to be different from what is described in the paper. If you are implementing a variant of pseudo-labelling loss function, could you point me to that paper?
I am confused by the implementation of pseudo-labelling in this library (lib/algs/pseudo_label.py). Especially, the forward() has:
I am confused why when computing
p_target, thegt_maskis multiplied by10? What is meaning of10here?Also, I believe the
lt_maskmeans the examples with max probability smaller than threshold and thus should be ignored when computing the loss. However, thep_targethas the+ lt_mask[:,None] * y_probs.This seems to be different from what is described in the paper. If you are implementing a variant of pseudo-labelling loss function, could you point me to that paper?