-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Describe the feature
We define multiple network interfaces (management + InfiniBand) in nodes.yaml and they are correctly stored in SMD, but cloud-init metadata only exposes the primary IP (local_ipv4). Secondary interfaces (InfiniBand) are not accessible to nodes during boot via cloud-init query.
Current Behavior
Step 1: We define multiple interfaces in nodes.yaml
nodes:
- name: slurmcontrol
xname: x1000c0s0b0n0
nid: 1
group: slurm_control_node_x86_64
interfaces:
- mac_addr: c4:cb:e1:cb:26:f2
ip_addrs:
- name: management
ip_addr: 172.16.255.10
- mac_addr: aa:bb:cc:dd:ee:10
ip_addrs:
- name: ib
ip_addr: 192.168.100.10
- name: slurmnode
xname: x1000c0s0b1n0
nid: 2
group: slurm_node_x86_64
interfaces:
- mac_addr: 70:b5:e8:f0:54:78
ip_addrs:
- name: management
ip_addr: 172.16.255.12
- mac_addr: aa:bb:cc:dd:ee:12
ip_addrs:
- name: ib
ip_addr: 192.168.100.12
Step 2: Discovery correctly stores both interfaces in SMD
$ ochami smd iface get --comp-id x1000c0s0b0n0
Output:
[
{
"ComponentID": "x1000c0s0b0n0",
"Description": "Interface 0 for slurmcontrol",
"ID": "c4cbe1cb26f2",
"MACAddress": "c4:cb:e1:cb:26:f2",
"IPAddresses": [
{"IPAddress": "172.16.255.10"}
],
"Type": "Node"
},
{
"ComponentID": "x1000c0s0b0n0",
"Description": "Interface 1 for slurmcontrol",
"ID": "aabbccddee10",
"MACAddress": "aa:bb:cc:dd:ee:10",
"IPAddresses": [
{"IPAddress": "192.168.100.10"}
],
"Type": "Node"
}
]
Both management (172.16.255.10) and InfiniBand (192.168.100.10) interfaces are stored in SMD
Step 3: cloud-init query on node doesn't show InfiniBand IP
On compute node (x1000c0s0b0n0):
[root@nid001 ~]# cloud-init query ds.meta_data
{
"instance-id": "i-280e516c",
"local-hostname": "x1000c0s0b0n0",
"local_ipv4": "172.16.255.10"
}
Cannot access InfiniBand IP (192.168.100.10) via cloud-init query
Expected Behavior
What we want:
Cloud-init metadata should include all interfaces from SMD:
instance-id: i-280e516c
local-hostname: x1000c0s0b0n0
local_ipv4: 172.16.255.10
network:
interfaces:
- mac: c4:cb:e1:cb:26:f2
ipv4: 172.16.255.10
network: management
- mac: aa:bb:cc:dd:ee:10
ipv4: 192.168.100.10
network: ib
Then on compute node, we can query it:
[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces
[
{
"mac": "c4:cb:e1:cb:26:f2",
"ipv4": "172.16.255.10",
"network": "management"
},
{
"mac": "aa:bb:cc:dd:ee:10",
"ipv4": "192.168.100.10",
"network": "ib"
}
]
[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces[1].ipv4
192.168.100.10
[root@nid001 ~]# cloud-init query ds.meta_data.network.interfaces[1].mac
aa:bb:cc:dd:ee:10
Why do you want this feature?
This feature would help me because...
Context
We are deploying an HPC cluster with dual networks:
- Management Network (Ethernet): Configured automatically via DHCP/cloud-init
- InfiniBand Network: Must be configured manually with static IPs during node boot
The Problem
To manually configure InfiniBand interfaces, we need the IP address for each node. This information is:
- ✅ Defined in
nodes.yaml - ✅ Stored in SMD by OpenCHAMI discovery
- ❌ NOT accessible to nodes during boot via cloud-init
What We Need
During node boot, we need to run:
# Get the InfiniBand IP assigned to this node
IB_IP=$(cloud-init query ds.meta_data.network.interfaces[1].ipv4)
# Manually configure InfiniBand interface with that IP
nmcli con add type infiniband con-name ib0 ifname ib0 \
ipv4.method manual ipv4.addresses $IB_IP/24
nmcli con up ib0Currently this fails because the InfiniBand IP is not in cloud-init metadata.
Current Workaround (Not Sustainable)
We have to query the SMD API with authentication:
TOKEN=$(cat /root/ochami.token)
IB_IP=$(curl -sk -H "Authorization: Bearer $TOKEN" \
"https://heliumcp.omnia.test:8443/hsm/v2/Inventory/EthernetInterfaces?ComponentID=$(hostname)" | \
jq -r '.[] | select(.MACAddress=="aa:bb:cc:dd:ee:10") | .IPAddresses[0].IPAddress')Problems:
- Requires token management (expiry, distribution)
- Network dependency during boot
- Complex and fragile
Why This Feature Solves It
If cloud-init metadata includes all interfaces from SMD, we can simply query it locally:
IB_IP=$(cloud-init query ds.meta_data.network.interfaces[1].ipv4)- ✅ No tokens needed
- ✅ No API calls
- ✅ Works offline
- ✅ Simple and reliable
Use Case Summary
Goal: Manually assign InfiniBand static IPs to compute nodes during boot
Requirement: Access to the InfiniBand IP address assigned to each node in nodes.yaml
Solution: Expose SMD interface data in cloud-init metadata so nodes can query it with cloud-init query
Alternatives you've considered
I considered doing...
Current Workaround (Not Sustainable)
We have to query the SMD API with authentication:
TOKEN=$(cat /root/ochami.token)
IB_IP=$(curl -sk -H "Authorization: Bearer $TOKEN" \
"https://heliumcp.omnia.test:8443/hsm/v2/Inventory/EthernetInterfaces?ComponentID=$(hostname)" | \
jq -r '.[] | select(.MACAddress=="aa:bb:cc:dd:ee:10") | .IPAddresses[0].IPAddress')Problems:
- Requires token management (expiry, distribution)
- Network dependency during boot
- Complex and fragile
Additional context
No response
Code of Conduct
- I agree to follow this project's Code of Conduct