-
Notifications
You must be signed in to change notification settings - Fork 13
Description
./examples/train_fp16.sh aognet_s configs/aognet_imagenet_12M.yaml first_try
we use 4 P40 GPUs, so change the "train_fp16.sh" like this:
Change accordingly
GPUS=0,1,2,3
NUM_GPUS=4
NUM_WORKERS=4
the log is below:
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([206])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([824, 206, 1, 1])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000, 824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000, 824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
Epoch: [0][0/1252] Time 12.503 (12.503) Speed 81.903 (81.903) Data 0.984 (0.984) Loss 6.9218750000 (6.9219) Prec@1 0.195 (0.195) Prec@5 0.586 (0.586) lr 0.000064
Epoch: [0][100/1252] Time 2.100 (2.206) Speed 487.515 (464.191) Data 0.001 (0.011) Loss nan (nan) Prec@1 0.000 (0.003) Prec@5 0.000 (0.009) lr 0.006454
Epoch: [0][200/1252] Time 2.116 (2.155) Speed 483.939 (475.125) Data 0.001 (0.006) Loss nan (nan) Prec@1 0.000 (0.001) Prec@5 0.000 (0.004) lr 0.012843
Epoch: [0][300/1252] Time 2.103 (2.137) Speed 487.036 (479.160) Data 0.001 (0.004) Loss nan (nan) Prec@1 0.000 (0.001) Prec@5 0.000 (0.003) lr 0.019233