To reproduce the baseline result on my machine (kd from rn34 to rn18), I would like to know the hyperparameter settings for knowledge distillation on Imagenet.
Especially the weights for cross entropy loss and KLDiv loss, temperature, and batch_size.
Thanks.