Skip to content

Conversation

@Xinyu-Kang
Copy link
Contributor

  • Mirrors CUDA_VISIBLE_DEVICES into HIP_VISIBLE_DEVICES for ROCm during proc mesh setup.
  • Fixes ROCm multinode training crash (IndexError: tuple index out of range in torch.cuda.default_generators) caused by missing HIP device visibility.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. module: rocm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants