Potential reason for fork-mode processes never starting / ending

During my use of Mpire, I encountered a peculiar situation randomly:

1. The program would hang indefinitely on the last few tasks.
2. Using Ctrl+C to terminate the program would fail to exit.

I tried to gain `worker_insights` for the hanging state, which showed that some workers were not running any tasks and were stuck in the `waiting_time` state for a long time. At the same time, by monitoring the main thread state using `debugpy`, I found that the main thread was waiting for these workers (either waiting for them to start or to finish), and these worker processes did exist.

I attempted to connect and debug these hanging processes using `debugpy`, but I couldn't attach (later I learned it was due to blocking instructions preventing the connection). Eventually, I successfully used `py-spy` to capture their stack traces. I found that they were stuck at `_flush_std_streams` when being started by multiprocessing:

```python
def _flush_std_streams():
    try:
        sys.stdout.flush()
    except (AttributeError, ValueError):
        pass
    try:
        sys.stderr.flush()
    except (AttributeError, ValueError):
        pass
```

(located in multiprocessing.util)

I inspected `sys.stdout` and found that it was never overwritten with another stream, so it was unlikely that the flush operation would hang.

Later, I found a related issue on Python's GitHub: https://github.com/python/cpython/issues/91776. This issue remains unresolved and is believed to be caused by internal locks during writing. In some cases, users might be running other threads while executing `WorkerPool`, and these threads might be writing to `sys.stdout` or `sys.stderr`. If a fork occurs at this moment, the forked process, controlled by mpire and multiprocessing, executes `_flush_std_streams`. However, if the internal lock of stdout or stderr is in a writing state, the flush operation will wait indefinitely, causing a hang.

Even without threads, there is a small probability of hanging (much lower, as mentioned in your code when waiting for the `task_queue` to end with a timeout), possibly due to file descriptor delays related to the operating system, for unknown reasons.

Although this issue is not directly related to mpire, the fork mode is a primary usage method for mpire, and writing to stdout/stderr from other threads when running a WorkerPool is a common operation. So I'm recording it here in case someone needs it:

Reopening stdout and stderr:

```python
def reopen_std_streams():
    sys.stdout = os.fdopen(1, "w")
    sys.stderr = os.fdopen(2, "w")

os.register_at_fork(after_in_child=reopen_std_streams)
```
(This is only effective on Linux & python >= 3.7)

Another approach is to provide a thread lock for the process's fork (`_start_worker`) to help synchronize other logging threads, and with documentation.

If you could delve deeper into solving this issue, I would greatly appreciate it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential reason for fork-mode processes never starting / ending #141

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential reason for fork-mode processes never starting / ending #141

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions