Skip to content

Conversation

@mcharles-brcm
Copy link

@mcharles-brcm mcharles-brcm commented Jan 26, 2025

As part of this feature,

  1. Added BRCM devices in amd-smi framework
  2. The affected command is 'amd-smi list'
  3. Discovered and added BRCM's NIC and SWITCH devices in the list
  4. The discovery is specific to sysfs entries, which is created by BRCM driver. Hence, without BRCM driver the necessary NIC and SWITCH devices will not be listed in the list command
  5. As of now added BFD and UUID elements added as part of the individual devices

Updated (1/27)

  1. Added BRCM devices as part of amd-smi monitor command
  2. Very limited monitor attributes are added as part of this feature
  3. The affected command is 'amd-smi monitor'
  4. If hwmon sysfs entry not created for the BRCM NIC devices then those controllers are removed from the monitor command
  5. NIC's Monitor Attributes:
    i) NIC_TEMP_CURRENT
    ii) NIC_TEMP_CRIT_ALARM
    iii) NIC_TEMP_EMERGENCY_ALARM
    iv) NIC_TEMP_SHUTDOWN_ALARM
    v) NIC_TEMP_MAX_ALARM
  6. SWITCH's Monitor Attributes:
    i) CURRENT_LINK_SPEED
    ii) MAX_LINK_SPEED
    iii) CURRENT_LINK_WIDTH
    iv) MAX_LINK_WIDTH

amd-josnarlo and others added 10 commits January 23, 2025 11:34
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
1. As part of this we have added Broadcom's NIC and SWITCH devices in the amd-smi framework.
2. The affected command is 'amd-smi list'
3. Filter the NIC and SWITCH devices based on the Broadcom's vendor ids
4. Since these changes are depending on sysfs entries which is created by broadcom drivers, if no brcm driver present in the system then necessary elements will not created as part of the list command.
…MD-ROCm-Internal/amdsmi into SWDEV-504389/Synch_Comment_In_Linux_BM
NIC/Switch LIST
mcharles-brcm and others added 8 commits January 26, 2025 14:17
As part of this feature,
1. Added monitor attributes for BRCM devices - NIC and SWITCH
2. The affected amd-smi command is 'amd-smi monitor'
3. The list of monitor attributes are read from the corresponding BRCM device's sysfs path
BRCM Monitor
…imestamp from firmware. (ROCm#61)

* SWDEV-511296 - update violation_status->violation_timestamp to read values from firmware.

Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>

* SWDEV-511296 - update violation_status->violation_timestamp to read values from firmware.

Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>

* SWDEV-511296 - update violation_status->violation_timestamp to read values from firmware.

Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>

---------

Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>
rahulc-gh pushed a commit that referenced this pull request Jan 30, 2025
Fix ordering of RHEL 8 build process

Signed-off-by: Williams, Justin <Justin.Williams@amd.com>
@dmitrii-galantsev
Copy link
Collaborator

dmitrii-galantsev commented Feb 21, 2025

ohhh that's cool, I'll ask some teammates to see if they want to take on broadcom things.

continue internally

Affected Commands:
1. amd-smi topology -nic
Display nic and gpu connectivity

2. amd-smi topology -show_numa
Display nic,gpu's numa and cpu affinity
1. Remove duplicate declaration
2. Resolve Alignment issue
@mcharles-brcm
Copy link
Author

Hi All,

We initiated this pull request on January 26th and have delivered most of the changes over the past six months.

The recent suggestions are specific to reorganize / modernize the delivered code, it appears to be a rework of our previous deliveries. It would have been more efficient if we had received this feedback earlier, as we could have incorporated it into our regular delivery process.

Our goal is to upstream some of the initial deliveries to the mainline and gather feedback from BRCM customers on the integrated feature. In parallel, we can certainly implement your suggested changes, though this will require additional effort and time.

Here are some examples of our initial deliveries that cannot be moved out of the core (except by adding the preprocessor):

  1. Discovering BROADCOM devices in the rocm_smi module.
  2. Maintaining the device cache in the amd-smi core module.

Please let me know if you have any questions.

@mcharles-brcm mcharles-brcm requested a review from a team as a code owner September 30, 2025 09:27
Make brcm-smi as a separate module and on demand basis include it in amd-smi framework.

ENABLE_BRCM_SMI ON / OFF
Test ENABLE_BRCM_SMI with OFF option
Guarded with ENABLE_BRCM_SMI flag
@mcharles-brcm
Copy link
Author

Hi bill-shuzhou-liu
Based on your comment, introduced a new flag ENABLE_BRCM_SMI in cmake file and we can ON / OFF as per our requirement.
The detailed documentation is available in below location,
amdsmi/brcm-smi/BRCM_SMI_DOCUMENTATION.MD

Below is the sample output when ENABLE_BRCM_SMI is OFF.

amd-smi monitor

GPU XCP POWER GPU_T MEM_T GFX_CLK GFX% MEM% ENC% DEC% VRAM_USAGE
0 0 47 W 81 °C 80 °C 800 MHz 0 % 0 % N/A 0 % 0.0/ 64.0 GB

amd-smi monitor -nic

ERROR | 2025-09-30 18:10:07.923 | amdsmi_commands.py:7588 | NIC monitoring requires BRCM SMI support. Please rebuild with -DENABLE_BRCM_SMI=ON

Please let us know any query on this implementation.

@mcharles-brcm
Copy link
Author

Hi bill-shuzhou-liu and oliveiradan,
Any update on this ? This review is in open state more than 2 months.
Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants