-
Notifications
You must be signed in to change notification settings - Fork 50
Description
I am using fgd_retina_r101_fpn_2x_distill_retina_r50_fpn_2x_coco.py to distill Retinanet-r101 (Using my own CoCo formatted) with mAP 0.75 but my Retinanet-r50 mAP significantly drops to 0.58. I am using SGD with same parameters set in orignal config. Training stops with loss_cls around 2.5 after 20 epoch:
"epoch": 20, "iter": 200, "lr": 0.001, "memory": 5598, "data_time": 0.03356, "loss_cls": 0.25744, "loss_bbox": 0.19824, "loss_fgd_fpn_4": 2.68374, "loss_fgd_fpn_3": 2.18955, "loss_fgd_fpn_2": 0.43481, "loss_fgd_fpn_1": 1.17425, "loss_fgd_fpn_0": 4.25011, "loss": 11.18815, "grad_norm": 53.40476, "time": 0.65314
"bbox_mAP": 0.561
"epoch": 21, "iter": 200, "lr": 0.001, "memory": 5598, "data_time": 0.02153, "loss_cls": 0.26974, "loss_bbox": 0.21168, "loss_fgd_fpn_4": 2.03119, "loss_fgd_fpn_3": 1.96431, "loss_fgd_fpn_2": 0.43961, "loss_fgd_fpn_1": 1.17302, "loss_fgd_fpn_0": 4.24643, "loss": 10.33597, "grad_norm": 43.16615, "time": 0.60769}
"bbox_mAP": 0.571,
"epoch": 22, "iter": 200, "lr": 0.001, "memory": 5598, "data_time": 0.02091, "loss_cls": 0.25263, "loss_bbox": 0.19929, "loss_fgd_fpn_4": 1.99079, "loss_fgd_fpn_3": 1.83311, "loss_fgd_fpn_2": 0.42802, "loss_fgd_fpn_1": 1.14707, "loss_fgd_fpn_0": 4.15834, "loss": 10.00925, "grad_norm": 46.78779, "time": 0.62421
bbox_mAP": 0.578,
"epoch": 23, "iter": 200, "lr": 0.0001, "memory": 5598, "data_time": 0.01811, "loss_cls": 0.25804, "loss_bbox": 0.20361, "loss_fgd_fpn_4": 1.42493, "loss_fgd_fpn_3": 1.68207, "loss_fgd_fpn_2": 0.41495, "loss_fgd_fpn_1": 1.12626, "loss_fgd_fpn_0": 4.09925, "loss": 9.20911, "grad_norm": 20.09985, "time": 0.60161
"bbox_mAP": 0.577,
"epoch": 24, "iter": 200, "lr": 0.0001, "memory": 5598, "data_time": 0.01983, "loss_cls": 0.25139, "loss_bbox": 0.1968, "loss_fgd_fpn_4": 1.32875, "loss_fgd_fpn_3": 1.57256, "loss_fgd_fpn_2": 0.41188, "loss_fgd_fpn_1": 1.11039, "loss_fgd_fpn_0": 4.06505, "loss": 8.93682, "grad_norm": 19.73, "time": 0.62296}
"bbox_mAP": 0.581,
Is this happening because I am distilling only using my fine tuning data? If not what can be the problem?