Skip to content

[Feature]: Expose multiple network interfaces from SMD in cloud-init metadata #102

@Katakam-Rakesh

Description

@Katakam-Rakesh

Describe the feature

We define multiple network interfaces (management + InfiniBand) in nodes.yaml and they are correctly stored in SMD, but cloud-init metadata only exposes the primary IP (local_ipv4). Secondary interfaces (InfiniBand) are not accessible to nodes during boot via cloud-init query.

Current Behavior
Step 1: We define multiple interfaces in nodes.yaml

nodes:    
- name: slurmcontrol    
  xname: x1000c0s0b0n0    
  nid: 1    
  group: slurm_control_node_x86_64    
  interfaces:    
  - mac_addr: c4:cb:e1:cb:26:f2    
    ip_addrs:    
    - name: management    
      ip_addr: 172.16.255.10    
  - mac_addr: aa:bb:cc:dd:ee:10    
    ip_addrs:    
    - name: ib    
      ip_addr: 192.168.100.10    
     
- name: slurmnode    
  xname: x1000c0s0b1n0    
  nid: 2    
  group: slurm_node_x86_64    
  interfaces:    
  - mac_addr: 70:b5:e8:f0:54:78    
    ip_addrs:    
    - name: management    
      ip_addr: 172.16.255.12    
  - mac_addr: aa:bb:cc:dd:ee:12    
    ip_addrs:    
    - name: ib    
      ip_addr: 192.168.100.12    

Step 2: Discovery correctly stores both interfaces in SMD
$ ochami smd iface get --comp-id x1000c0s0b0n0

Output:

[    
  {    
    "ComponentID": "x1000c0s0b0n0",    
    "Description": "Interface 0 for slurmcontrol",    
    "ID": "c4cbe1cb26f2",    
    "MACAddress": "c4:cb:e1:cb:26:f2",    
    "IPAddresses": [    
      {"IPAddress": "172.16.255.10"}    
    ],    
    "Type": "Node"    
  },    
  {    
    "ComponentID": "x1000c0s0b0n0",    
    "Description": "Interface 1 for slurmcontrol",    
    "ID": "aabbccddee10",    
    "MACAddress": "aa:bb:cc:dd:ee:10",    
    "IPAddresses": [    
      {"IPAddress": "192.168.100.10"}    
    ],    
    "Type": "Node"    
  }    
]    

Both management (172.16.255.10) and InfiniBand (192.168.100.10) interfaces are stored in SMD

Step 3: cloud-init query on node doesn't show InfiniBand IP

On compute node (x1000c0s0b0n0):
[root@nid001 ~]# cloud-init query ds.meta_data
{
"instance-id": "i-280e516c",
"local-hostname": "x1000c0s0b0n0",
"local_ipv4": "172.16.255.10"
}
Cannot access InfiniBand IP (192.168.100.10) via cloud-init query

Expected Behavior
What we want:
Cloud-init metadata should include all interfaces from SMD:

instance-id: i-280e516c
local-hostname: x1000c0s0b0n0
local_ipv4: 172.16.255.10
network:
interfaces:
- mac: c4:cb:e1:cb:26:f2
ipv4: 172.16.255.10
network: management
- mac: aa:bb:cc:dd:ee:10
ipv4: 192.168.100.10
network: ib

Then on compute node, we can query it:

[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces    
[    
  {    
    "mac": "c4:cb:e1:cb:26:f2",    
    "ipv4": "172.16.255.10",    
    "network": "management"    
  },    
  {    
    "mac": "aa:bb:cc:dd:ee:10",    
    "ipv4": "192.168.100.10",    
    "network": "ib"    
  }    
]    
     
[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces[1].ipv4    
192.168.100.10    
     
[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces[1].mac    
aa:bb:cc:dd:ee:10    

Why do you want this feature?

This feature would help me because...

Context

We are deploying an HPC cluster with dual networks:

  • Management Network (Ethernet): Configured automatically via DHCP/cloud-init
  • InfiniBand Network: Must be configured manually with static IPs during node boot

The Problem

To manually configure InfiniBand interfaces, we need the IP address for each node. This information is:

  • ✅ Defined in nodes.yaml
  • ✅ Stored in SMD by OpenCHAMI discovery
  • NOT accessible to nodes during boot via cloud-init

What We Need

During node boot, we need to run:

# Get the InfiniBand IP assigned to this node
IB_IP=$(cloud-init query ds.meta_data.network.interfaces[1].ipv4)

# Manually configure InfiniBand interface with that IP
nmcli con add type infiniband con-name ib0 ifname ib0 \
  ipv4.method manual ipv4.addresses $IB_IP/24
nmcli con up ib0

Currently this fails because the InfiniBand IP is not in cloud-init metadata.

Current Workaround (Not Sustainable)

We have to query the SMD API with authentication:

TOKEN=$(cat /root/ochami.token)
IB_IP=$(curl -sk -H "Authorization: Bearer $TOKEN" \
  "https://heliumcp.omnia.test:8443/hsm/v2/Inventory/EthernetInterfaces?ComponentID=$(hostname)" | \
  jq -r '.[] | select(.MACAddress=="aa:bb:cc:dd:ee:10") | .IPAddresses[0].IPAddress')

Problems:

  • Requires token management (expiry, distribution)
  • Network dependency during boot
  • Complex and fragile

Why This Feature Solves It

If cloud-init metadata includes all interfaces from SMD, we can simply query it locally:

IB_IP=$(cloud-init query ds.meta_data.network.interfaces[1].ipv4)
  • ✅ No tokens needed
  • ✅ No API calls
  • ✅ Works offline
  • ✅ Simple and reliable

Use Case Summary

Goal: Manually assign InfiniBand static IPs to compute nodes during boot

Requirement: Access to the InfiniBand IP address assigned to each node in nodes.yaml

Solution: Expose SMD interface data in cloud-init metadata so nodes can query it with cloud-init query


Alternatives you've considered

I considered doing...

Current Workaround (Not Sustainable)

We have to query the SMD API with authentication:

TOKEN=$(cat /root/ochami.token)
IB_IP=$(curl -sk -H "Authorization: Bearer $TOKEN" \
  "https://heliumcp.omnia.test:8443/hsm/v2/Inventory/EthernetInterfaces?ComponentID=$(hostname)" | \
  jq -r '.[] | select(.MACAddress=="aa:bb:cc:dd:ee:10") | .IPAddresses[0].IPAddress')

Problems:

  • Requires token management (expiry, distribution)
  • Network dependency during boot
  • Complex and fragile

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions