Skip to content

[Feature]: Improved Broadcom NIC support #69

@Bellk17

Description

@Bellk17

Suggestion Description

Currently, AMD’s official ROCm container images provide limited support for bnxt modules. This has been a friction point for customers who are used to base images working out-of-the-box on the Nvidia side with Mellanox.

Current setup docs:
https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/system-setup/multi-node-setup.html

This limits the usability of ROCm base containers for distributed workloads, as users must manually install and configure bnxt drivers or rebuild container images to support RoCE, which can require additional CI work to support different deployments.

We understand that supporting multiple RoCE implementations (Broadcom, etc.) may be non-trivial. However, even partial out-of-the-box support or the ability to select the target version via an ENV would significantly improve the usability and portability of ROCm containers for multi-node workloads.

Operating System

No response

GPU

No response

ROCm Component

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions