Skip to content

worker died unexpectedly #138

@tfriedel

Description

@tfriedel

I have been getting this error frequently.
In a batch processing pipeline involving image downloading and processing with rasterio from s3 it happend regularly in AWS batch containers.
In the same container running locally I couldn't reproduce it. I have tried fork and spawn.
(almost) the same code with dask didn't cause these issues.

Now I wanted to compare dask vs mpire in a benchmark and I got the same issue!
I think there may be some bug in mpire.

This is the benchmark:
https://github.com/sybrenjansen/multiprocessing_benchmarks

I run it with python 3.12 on linux.

❯ python benchmark_mpire.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738766355.878362 1560537 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738766355.882581 1560537 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
=====
MPIRE
=====
Setting up benchmark 1 ...
Benchmark #1 started
Running with 1 job
Numerical computation (MPIRE) took 59.898664474487305 seconds.
Numerical computation (MPIRE) took 60.894070625305176 seconds.
Numerical computation (MPIRE) took 60.4944064617157 seconds.
Numerical computation (MPIRE) took 60.799951791763306 seconds.
Numerical computation (MPIRE) took 60.616430044174194 seconds.
Running with 2 jobs
Numerical computation (MPIRE) took 41.30088138580322 seconds.
Numerical computation (MPIRE) took 35.43703866004944 seconds.
Numerical computation (MPIRE) took 32.693188190460205 seconds.
Numerical computation (MPIRE) took 32.87881588935852 seconds.
Numerical computation (MPIRE) took 31.104052543640137 seconds.
Running with 4 jobs
Numerical computation (MPIRE) took 18.82200002670288 seconds.
Numerical computation (MPIRE) took 17.475537300109863 seconds.
Numerical computation (MPIRE) took 16.806360006332397 seconds.
Numerical computation (MPIRE) took 16.586058139801025 seconds.
Numerical computation (MPIRE) took 17.376100063323975 seconds.
Results:
- Numerical computation:
  - 1 jobs, runtime mean: 60.54070467948914, std: 0.34990571231388484, total mean: 60.54070467948914, std: 0.34990571231388484
  - 2 jobs, runtime mean: 34.6827953338623, std: 3.5885435587666694, total mean: 34.6827953338623, std: 3.5885435587666694
  - 4 jobs, runtime mean: 17.41321110725403, std: 0.7800510616255131, total mean: 17.41321110725403, std: 0.7800510616255131
Setting up benchmark 2 ...
Creating benchmark #2 documents data ...
Benchmark #2 started
Running with 1 job
Traceback (most recent call last):
  File "/home/thomas/multiprocessing_benchmarks/benchmark_mpire.py", line 155, in <module>
    main()
  File "/home/thomas/multiprocessing_benchmarks/benchmark_mpire.py", line 142, in main
    run_trials(partial(benchmark_2, documents), "Stateful computation",
  File "/home/thomas/multiprocessing_benchmarks/util.py", line 146, in run_trials
    benchmark_function(n_jobs)
  File "/home/thomas/multiprocessing_benchmarks/benchmark_mpire.py", line 65, in benchmark_2
    pool.map_unordered(streaming_actor.add_document,
  File "/home/thomas/multiprocessing_benchmarks/venv/lib/python3.12/site-packages/mpire/pool.py", line 534, in map_unordered
    return list(self.imap_unordered(func, iterable_of_args, iterable_len, max_tasks_active, chunk_size,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/multiprocessing_benchmarks/venv/lib/python3.12/site-packages/mpire/pool.py", line 788, in imap_unordered
    self._handle_exception()
  File "/home/thomas/multiprocessing_benchmarks/venv/lib/python3.12/site-packages/mpire/pool.py", line 926, in _handle_exception
    raise exception
RuntimeError: Worker-0 died unexpectedly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions