Skip to content

Error training the model on Nuscenes #19

@cadip92

Description

@cadip92

I am training the model on Nuscenes dataset and I get the below error:

Traceback (most recent call last):
File "projects/FastPillars/tools/train.py", line 168, in
main()
File "projects/FastPillars/tools/train.py", line 157, in main
train_detector(
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/apis/train.py", line 326, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 542, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 408, in train
outputs = self.batch_processor_inline(
File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 367, in batch_processor_inline
losses = model(example, return_loss=True)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/detectors/fastpillar.py", line 46, in forward
return self.bbox_head.loss(example, preds, self.test_cfg)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/bbox_heads/center_head.py", line 703, in loss
iou_loss = self.crit_iou(preds_dict['iou'], example['mask'][task_id], example['ind'][task_id],
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/beegfs/chandorkar/projects/FastPillars/det3d/models/losses/centernet_loss.py", line 76, in forward
target = boxes_aligned_iou3d_gpu(pred_box[mask], box_gt[mask]) # pred_box gt_box-> iou (2iou-1)->target todo: +rdiou loss
File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/_tensor.py", line 970, in array
return self.numpy()
TypeError: can't convert cuda:1 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I compare the code with PillarNet's centernet_loss.py and it is exactly the same. I cannot figure out what the problem might be. Can you help please?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions