Skip to content

Conversation

@yurekami
Copy link
Contributor

Summary

  • Add a barrier between closing remote IPC handles and freeing local buffer
  • Prevents undefined behavior per CUDA docs for cudaIpcOpenMemHandle
  • Ensures all ranks close remote handles before any rank frees its buffer

Background

According to CUDA documentation, it is undefined behavior to cudaFree an exported memory region before cudaIpcCloseMemHandle is called on all processes that opened the handle.

Without this fix, in a multi-GPU scenario:

  1. Rank 1 finishes closing remote handles and proceeds to free its local buffer
  2. Rank 2 is still trying to close the handle to Rank 1's buffer
  3. This is UB since the underlying memory has been freed

Test plan

  • Verify Buffer::destroy completes without hangs on multi-GPU systems
  • Run stress tests with explicit buffer destruction

Fixes #497

🤖 Generated with Claude Code

…troy

According to CUDA documentation, it is undefined behavior to cudaFree an
exported memory region before cudaIpcCloseMemHandle is called on all
processes that opened the handle.

This adds a barrier to ensure all ranks have closed their remote IPC
handles before any rank frees its local buffer, preventing the race
condition where rank A frees its buffer while rank B is still closing
the handle to that buffer.

Fixes deepseek-ai#497

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question] Is there undefined behavior between the calls to cudaFree and cudaIpcCloseMemHandle?

1 participant