-
Notifications
You must be signed in to change notification settings - Fork 12
Error training the model on Nuscenes #19
Description
I am training the model on Nuscenes dataset and I get the below error:
Traceback (most recent call last):
File "projects/FastPillars/tools/train.py", line 168, in
main()
File "projects/FastPillars/tools/train.py", line 157, in main
train_detector(
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/apis/train.py", line 326, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 542, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 408, in train
outputs = self.batch_processor_inline(
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 367, in batch_processor_inline
losses = model(example, return_loss=True)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/detectors/fastpillar.py", line 46, in forward
return self.bbox_head.loss(example, preds, self.test_cfg)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/bbox_heads/center_head.py", line 703, in loss
iou_loss = self.crit_iou(preds_dict['iou'], example['mask'][task_id], example['ind'][task_id],
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/losses/centernet_loss.py", line 76, in forward
target = boxes_aligned_iou3d_gpu(pred_box[mask], box_gt[mask]) # pred_box gt_box-> iou (2iou-1)->target todo: +rdiou loss
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/_tensor.py", line 970, in array
return self.numpy()
TypeError: can't convert cuda:1 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
I compare the code with PillarNet's centernet_loss.py and it is exactly the same. I cannot figure out what the problem might be. Can you help please?