Skip to content
This repository was archived by the owner on Oct 2, 2025. It is now read-only.
This repository was archived by the owner on Oct 2, 2025. It is now read-only.

module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1 #73

@RuojiWang

Description

@RuojiWang

i meet the following error:

ssh://root@10.4.208.56:10001/root/anaconda3/bin/python3.8 -u /workspace/project/huafeng/big_transfer-master/train.py --name huaweishengteng --model BiT-M-R50x1 --logdir ./logs --dataset cifar10 --datadir ./cifar
2022-09-14 11:42:29,707 [INFO] bit_common: Namespace(base_lr=0.003, batch=512, batch_split=1, bit_pretrained_dir='.', datadir='./cifar', dataset='cifar10', eval_every=None, examples_per_class=None, examples_per_class_seed=0, logdir='./logs', model='BiT-M-R50x1', name='huaweishengteng', save=True, workers=8)
2022-09-14 11:42:29,707 [INFO] bit_common: Going to train on cuda:1
Files already downloaded and verified
Files already downloaded and verified
2022-09-14 11:42:31,373 [INFO] bit_common: Using a training set with 50000 images.
2022-09-14 11:42:31,373 [INFO] bit_common: Using a validation set with 10000 images.
2022-09-14 11:42:31,373 [INFO] bit_common: Loading model from BiT-M-R50x1.npz
2022-09-14 11:42:31,893 [INFO] bit_common: Moving model onto all GPUs
2022-09-14 11:42:31,908 [INFO] bit_common: Model will be saved in './logs/huaweishengteng/bit.pth.tar'
2022-09-14 11:42:31,908 [INFO] bit_common: Fine-tuning from BiT
2022-09-14 11:42:34,812 [INFO] bit_common: Starting training!
Traceback (most recent call last):
File "/workspace/project/huafeng/big_transfer-master/train.py", line 296, in
main(parser.parse_args())
File "/workspace/project/huafeng/big_transfer-master/train.py", line 237, in main
logits = model(x)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

Process finished with exit code 1

cuda:0 is used by my lab, i can only use other cuda, maybe cuda:7, how can i solve the problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions