ProcessPoolExecutor fork vs. spawn

Terra resource management is not compatible with the spawn start method for `ProcessPoolExecutor`.  Current workaround is to use a `ThreadPoolExecutor`.

torch appears to require the spawn start method for `ProcessPoolExecutor`
https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing

This can be set using the following torch code
``` 
torch.multiprocessing.set_start_method("spawn", force=True)
```

Or setting `mp_context` when initializing the `ProcessPoolExecutor`
```
from multiprocessing import get_context
from terra.executor.process import ProcessPoolExecutor
mp_context = get_context('spawn')
Executor = ProcessPoolExecutor(max_workers=3, mp_context=mp_context)
```

Unfortunately, a spawned ProcessPoolExecutor will re-import python modules for each child process, meaning the resource lock directory is different for each child process due to the dependency on the `os.getpid()`
https://github.com/VisionSystemsInc/terra/blob/e24792b8d0ec91f7c054c21930564ab3c586115e/terra/executor/resources.py#L126-L129

As each child process uses a different lock directory, the result is each child process has no awareness of other child process resource locks.  Each child process is thus able to claim the first resource which results in processing failure.

Testing the spawn start method is possible by adding the following to `test_executor_resources.py` after `TestResourceProcess`.  However, this change currently results in a different error where the `data` dictionary is empty due to each spawned child re-importing the test module (e.g., `simple_acquire` is unable to find `data[name]`)

https://github.com/VisionSystemsInc/terra/blob/e24792b8d0ec91f7c054c21930564ab3c586115e/terra/tests/test_executor_resources.py

```
class ProcessPoolExecutorSpawn(ProcessPoolExecutor):
  def __init__(self, *args, **kwargs):
    kwargs['mp_context'] = get_context('spawn')
    return super().__init__(*args, **kwargs)

class TestResourceProcessSpawn(TestResourceProcess):
  # Test for multiprocess spwan case
  def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.Executor = ProcessPoolExecutorSpawn
```

Issue discovered by @decrispell during terra_real3d development, attempting to run multiple torch tasks each with a single assigned GPU.

	self.lock_dir = os.path.join(settings.terra.lock_dir,
	platform.node(),
	str(os.getpid()),
	resource_name)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ProcessPoolExecutor fork vs. spawn #153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ProcessPoolExecutor fork vs. spawn #153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions