[fix] Fix math reward hanging [WIP] #109

BlankCheng · 2025-07-07T02:43:15Z

Problem

The training hangs during the math reward computation phase when meeting timeouts.

Cause

The suspected cause is attributed to hanging worker processes within the math reward function. For example, calls to time-consuming operations like sympy.simplify() may enter an indefinite state on certain inputs if not properly killed.

The existing asyncio timeout mechanism captures the top-level task but fails to terminate the orphaned sympy subprocess. This might result in the ProcessPoolExecutor being unable to shut down cleanly, leading to a deadlock. Also, since the reward calculation is a CPU-bound operation, the asyncio event loop might not bring benefits.

Solution

This PR implements mp_reward_manager.py, a multiprocessing reward manager that replaces the previous asyncio implementation.

Removal of asyncio
Explicit Process Timeouts: A timeout is now enforced on a per-process level. This guarantees that any single hanging reward process is terminated, preventing it from blocking the ProcessPoolExecutor shutdown.

The implementation is currently undergoing testing to validate the fix.

BlankCheng added 2 commits July 7, 2025 02:24

Implement multiprocess reward manager without async

a640e0c

Remove unused timeout class

a9c1cf7

BlankCheng requested a review from AndreasXie July 7, 2025 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] Fix math reward hanging [WIP] #109

[fix] Fix math reward hanging [WIP] #109

Uh oh!

BlankCheng commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[fix] Fix math reward hanging [WIP] #109

Are you sure you want to change the base?

[fix] Fix math reward hanging [WIP] #109

Uh oh!

Conversation

BlankCheng commented Jul 7, 2025

Problem

Cause

Solution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants