
Thank you for your help many times!I re-set the seeds, but it seems that the result is still not satisfactory. If it is convenient, can you tell me which specific seeds are suitable? I also used the GTX 3090 for the experiment
Originally posted by @guaguaxyf in #9
The results lower than that in paper may be caused by the parameters different from the reported Implementation Details: "We set 0.5 for both α and β", which is not in code.
You could try modify ViT_Network.py as
def knowledge_boosting(self, lang_embed, word_embed, query_info, label):
P_head = query_info['proto'].clone().cuda()
T = 2.
lang_logit = F.linear(lang_embed, P_head)
loss_seman = F.cross_entropy(lang_logit, label)
loss_kd = F.kl_div(F.log_softmax(lang_embed/T,dim=1), F.softmax(word_embed[label]/T,dim=1), reduction='batchmean')
loss = loss_kd + 0.1*loss_seman
return 0.5*loss
# return 0.1*loss
and set
parser.add_argument('-ED_hp', type=float, default=0.5)
But I wonder why there's still one point lower in acc (seed=1): [81.502, 80.115, 79.478, 76.566, 76.649, 74.65, 74.251, 74.718, 73.821, 73.85, 74.165]
Here's my configs:
python train.py \
-gpu 7 \
-project base \
-dataset cub200 \
-shot 5 \
-base_mode ft_dot \
-new_mode avg_cos \
-gamma 0.1 \
-lr_base 2e-4 \
-lr_new 2e-4 \
-decay 0.0005 \
-epochs_base 5 \
-epochs_new 3 \
-schedule Cosine \
-milestones 20 30 45 \
-temperature 16 \
-start_session 0 \
-batch_size_base 128 \
-seed 1 \
-vit \
-comp_out 1 \
-ED \
-SKD \
-LT \
-out PriViLege \
Originally posted by @guaguaxyf in #9
The results lower than that in paper may be caused by the parameters different from the reported Implementation Details: "We set 0.5 for both α and β", which is not in code.
You could try modify ViT_Network.py as
and set
But I wonder why there's still one point lower in acc (seed=1): [81.502, 80.115, 79.478, 76.566, 76.649, 74.65, 74.251, 74.718, 73.821, 73.85, 74.165]
Here's my configs: