Skip to content

[BUG] concurrency/race condition in unimol_tools/data/conformer.py causes prediction to hang #19

@newtontech

Description

@newtontech

Describe the bug

When multi_process=True, the predict method in unimol_tools/data/conformer.py can hang intermittently. The hang occurs when calling MolPredict.predict(); the process stops at pool.imap() with no error output. Memory usage looks normal while the CPU is idle, and the process must be terminated manually.

Root cause analysis:

  1. Missing pool.join() call: In the two transform() methods in conformer.py (around lines 191 and 466), the code calls pool.close() but never calls pool.join(). This can allow the main process to continue before worker processes have finished, causing a race condition.
  2. Insufficient exception handling: An except: at line 279 catches all exceptions (including KeyboardInterrupt) and uses print instead of a logger to record the error.
  3. No context manager usage: The process pool should be used with a context manager (with Pool() as pool:) to ensure proper cleanup.
  4. No timeout mechanism: pool.imap() is used without any timeout handling; if a worker blocks, the main process cannot recover.

unimol_tools Version

0.1.5

Expected behavior

When multi_process=True, multiprocessing should operate reliably and not hang. Specifically:

  1. All worker processes should finish before the main process continues.
  2. Exceptions should be logged and handled appropriately.
  3. Processes should be terminable by signals such as KeyboardInterrupt.
  4. Long-running or blocking tasks should be handled with timeouts.

To Reproduce

Steps to reproduce:

  1. Set multi_process=True by default.
  2. Call MolPredict.predict() again and again.
  3. Observe the process state. Sometimes, calling a ctrl+c would end the process and the process would run again to get the right result.
from unimol_tools import MolPredict
from pathlib import Path
from typing import List

def UniMolPredict(model_dir: Path, csv_path: Path) -> List[float]:
    logger.info(f"   Start predicting: {csv_path}")
    clf = MolPredict(load_model=model_dir)
    logger.info("   Prediction model: clf is a MolPredict object")
    y_pred = clf.predict(str(csv_path))  # Hangs here
    logger.info(f"   Prediction result: {y_pred}")
    return y_pred

Environment

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions