Error training the model on Nuscenes

I am training the model on Nuscenes dataset and I get the below error:

Traceback (most recent call last):
  File "projects/FastPillars/tools/train.py", line 168, in <module>
    main()
  File "projects/FastPillars/tools/train.py", line 157, in main
    train_detector(
  File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/apis/train.py", line 326, in train_detector
    trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
  File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 542, in run
    epoch_runner(data_loaders[i], self.epoch, **kwargs)
  File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 408, in train
    outputs = self.batch_processor_inline(
  File "/beegfs/chandorkar/projects/FastPillars/det3d/torchie/trainer/trainer.py", line 367, in batch_processor_inline
    losses = model(example, return_loss=True)
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/chandorkar/projects/FastPillars/det3d/models/detectors/fastpillar.py", line 46, in forward
    return self.bbox_head.loss(example, preds, self.test_cfg)
  File "/beegfs/chandorkar/projects/FastPillars/det3d/models/bbox_heads/center_head.py", line 703, in loss
    iou_loss = self.crit_iou(preds_dict['iou'], example['mask'][task_id], example['ind'][task_id],
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/chandorkar/projects/FastPillars/det3d/models/losses/centernet_loss.py", line 76, in forward
    target = boxes_aligned_iou3d_gpu(pred_box[mask], box_gt[mask]) # pred_box gt_box-> iou  (2iou-1)->target  todo: +rdiou loss
  File "/beegfs/chandorkar/miniforge3/envs/det3d/lib/python3.8/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
TypeError: can't convert cuda:1 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I compare the code with PillarNet's `centernet_loss.py` and it is exactly the same. I cannot figure out what the problem might be. Can you help please?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error training the model on Nuscenes #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error training the model on Nuscenes #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions