Skip to content

TCP port collision when running two different tests on the same machine #15

@jamesETsmith

Description

@jamesETsmith

🔨Work Item

When running two separate test runs on the same machine, the tests can crash because they are both trying to use the same tcp port. See

 rg "tcp://127.0.0.1:12345"
...
tests/python/pytorch/graphbolt/test_item_sampler.py
886:        else "tcp://127.0.0.1:12345"

tests/python/pytorch/graphbolt/test_dataloader.py
92:            else "tcp://127.0.0.1:12345"

tests/python/pytorch/cuda/test_nccl.py
17:        init_method="tcp://127.0.0.1:12345",
40:        init_method="tcp://127.0.0.1:12345",
63:        init_method="tcp://127.0.0.1:12345",
89:        init_method="tcp://127.0.0.1:12345",
...

Description

We should have the tcp port be some random number within a certain range to ensure that TCP port collision is really unlikely.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions