Hi and I am studying your approach with your implementation. My question is that in your paper you use
(Equation 12) to compute the BNM loss, and the divisor is the batch size. But in BNM/DA/BNM/train_image.py L#164 I found that this is done with torch.mean(). Then if the class number is smaller than batch size, the SVD operation will generate a s_tgt with length C instead of B. Wouldn't that be incorrect according to the original equation? Why don't explicitly divide with the batch size?