Skip to content

Multi-GPUs Runtime Error #3

@RyanG41

Description

@RyanG41

Hi,
Thanks for the woderful job.
I encountered a error caused by distributed training, maybe? I ran the code on multi-gpus and got the error below:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel,.....
In the train.py I see the code for multi processing, but here I dont know how to fix it, or can I force the code to run on only 1 gpu?
Thanks for the help of any kind you provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions