not convergence

./examples/train_fp16.sh aognet_s configs/aognet_imagenet_12M.yaml first_try

we use 4 P40 GPUs, so change the "train_fp16.sh" like this:
### Change accordingly
GPUS=0,1,2,3
NUM_GPUS=4
NUM_WORKERS=4

the log is below:
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([206])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([824, 206, 1, 1])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000, 824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.FloatTensor with torch.Size([824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000, 824])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000])
FP16_Optimizer received torch.cuda.HalfTensor with torch.Size([1000])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
torch.Size([256, 3, 224, 224])
Epoch: [0][0/1252]	Time 12.503 (12.503)	Speed 81.903 (81.903)	Data 0.984 (0.984)	Loss 6.9218750000 (6.9219)	Prec@1 0.195 (0.195)	Prec@5 0.586 (0.586)	lr 0.000064	
Epoch: [0][100/1252]	Time 2.100 (2.206)	Speed 487.515 (464.191)	Data 0.001 (0.011)	Loss nan (nan)	Prec@1 0.000 (0.003)	Prec@5 0.000 (0.009)	lr 0.006454	
Epoch: [0][200/1252]	Time 2.116 (2.155)	Speed 483.939 (475.125)	Data 0.001 (0.006)	Loss nan (nan)	Prec@1 0.000 (0.001)	Prec@5 0.000 (0.004)	lr 0.012843	
Epoch: [0][300/1252]	Time 2.103 (2.137)	Speed 487.036 (479.160)	Data 0.001 (0.004)	Loss nan (nan)	Prec@1 0.000 (0.001)	Prec@5 0.000 (0.003)	lr 0.019233	


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not convergence #4

Change accordingly

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

not convergence #4

Description

Change accordingly

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions