Skip to content

Oktoberfest hangs in case of child worker OOM #372

@gritukan

Description

@gritukan

Describe the bug

This line will hang for three hours in case if child process is killed due to OOM or any other reason. Because of instant death it cannot notify parent about failure, so parent continues to wait for it until timeout is passed. After timeout is expired Oktoberfest will simply fail with generic timeout error without clear interpretation of what happened.

To Reproduce

One needs to launch multi-threaded Oktoberfest that consumes too much memory. Probably manual kill of a worker thread with SIGKILL will be also sufficient.

Expected behavior

Oktoberest fails with a clear error message.

System [please complete the following information]:

Ubuntu 22.04, Python 3.12.3 with oktoberfest==0.10.0

Additional context

In my case Oktoberfest stucked after

2026-03-02 23:20:14,675 - INFO - oktoberfest.predict.predictor::from_config Using model Prosit_2020_intensity_HCD via Koina
Prosit_2020_intensity_HCD:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 628/628 [01:05<00:00,  9.63it/s]

with zero CPU usage and after manual SIGINT to a parent process the error was

Waiting for tasks to complete:   0%|                                                                                                                                                                                             | 0/1 [03:08<?, ?it/s]
2026-03-02 23:23:21,814 - ERROR - oktoberfest.utils.multiprocessing_pool::check_pool Caught KeyboardInterrupt, terminating workers

suggesting that the freeze was indeed inside the .get() method.

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions