Skip to content

Conversation

@BlankCheng
Copy link
Collaborator

Problem

The training hangs during the math reward computation phase when meeting timeouts.

Cause

The suspected cause is attributed to hanging worker processes within the math reward function. For example, calls to time-consuming operations like sympy.simplify() may enter an indefinite state on certain inputs if not properly killed.

The existing asyncio timeout mechanism captures the top-level task but fails to terminate the orphaned sympy subprocess. This might result in the ProcessPoolExecutor being unable to shut down cleanly, leading to a deadlock. Also, since the reward calculation is a CPU-bound operation, the asyncio event loop might not bring benefits.

Solution

This PR implements mp_reward_manager.py, a multiprocessing reward manager that replaces the previous asyncio implementation.

  • Removal of asyncio
  • Explicit Process Timeouts: A timeout is now enforced on a per-process level. This guarantees that any single hanging reward process is terminated, preventing it from blocking the ProcessPoolExecutor shutdown.

The implementation is currently undergoing testing to validate the fix.

@BlankCheng BlankCheng requested a review from AndreasXie July 7, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants