Add barrier between cudaIpcCloseMemHandle and cudaFree in Buffer::destroy #533

yurekami · 2025-12-25T13:27:20Z

Summary

Add a barrier between closing remote IPC handles and freeing local buffer
Prevents undefined behavior per CUDA docs for cudaIpcOpenMemHandle
Ensures all ranks close remote handles before any rank frees its buffer

Background

According to CUDA documentation, it is undefined behavior to cudaFree an exported memory region before cudaIpcCloseMemHandle is called on all processes that opened the handle.

Without this fix, in a multi-GPU scenario:

Rank 1 finishes closing remote handles and proceeds to free its local buffer
Rank 2 is still trying to close the handle to Rank 1's buffer
This is UB since the underlying memory has been freed

Test plan

Verify Buffer::destroy completes without hangs on multi-GPU systems
Run stress tests with explicit buffer destruction

Fixes #497

🤖 Generated with Claude Code

…troy According to CUDA documentation, it is undefined behavior to cudaFree an exported memory region before cudaIpcCloseMemHandle is called on all processes that opened the handle. This adds a barrier to ensure all ranks have closed their remote IPC handles before any rank frees its local buffer, preventing the race condition where rank A frees its buffer while rank B is still closing the handle to that buffer. Fixes deepseek-ai#497 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add barrier between cudaIpcCloseMemHandle and cudaFree in Buffer::destroy #533

Add barrier between cudaIpcCloseMemHandle and cudaFree in Buffer::destroy #533

Uh oh!

yurekami commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add barrier between cudaIpcCloseMemHandle and cudaFree in Buffer::destroy #533

Are you sure you want to change the base?

Add barrier between cudaIpcCloseMemHandle and cudaFree in Buffer::destroy #533

Uh oh!

Conversation

yurekami commented Dec 25, 2025

Summary

Background

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant