Skip to content

dist_c10d is not defined training error - distributed_utils #9

@NikhilCherian

Description

@NikhilCherian

@rgcottrell @tianfeichen @cqlijingwei Hey again. Thanks for all the earlier replies. I could preprocess, train and test everything in Google Colab. But recently, I switched to training it on my Gaming Laptop and i got the error.

dist_c10d is not defined.
image
Can you explain me more about has_c10d etc?
Because in Google Colab, these were the parameters.
ddp_backend='c10d'
distributed_backend='nccl',
distributed_init_method=None,
distributed_port=-1,
distributed_rank=0, distributed_world_size=1
But in the distributed_utils.py, i cannot import torch.distributed as dist_c10d. It always do to the torch.distributed as dist.no_c10d. Can you guide me here?
image
and when i would use init_fn = dist_no_c10d.init_process_group. It would start import the data and all.
image

Any help would be appreciated. Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions