Update CPU topology discovery to support ARM#106
Update CPU topology discovery to support ARM#106k8s-ci-robot merged 6 commits intokubernetes-sigs:mainfrom
Conversation
c0fa408 to
523caf5
Compare
|
@pravk03: GitHub didn't allow me to request PR reviews from the following users: catblade. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
@pravk03 I see that in the new code, if there is no difference between CoreID and SocketID, the default value of -1 will still be retained. The old code will make a judgment, and if it is less than 0, it will be skipped. Should we also retain this boundary judgment method?
|
Good point @AutuSnow. Updated to add the boundary check back. We currently just log errors and exclude the CPU from the ResourceSlice. |
| clusterID, _ := strconv.Atoi(strings.TrimSpace(clusterStr)) | ||
| cpuInfo.ClusterID = clusterID | ||
| } else { | ||
| cpuInfo.ClusterID = 0 // Default to 0 if not present |
There was a problem hiding this comment.
I remember on x86 systems, the ClusterID of all CPUs is set to 0. Will this lead to false sharing or incorrect core counts, and should we use the default value of -1 to indicate unknown
There was a problem hiding this comment.
Yeah. -1 is a better default. Updated.
On x86, the cluster_id file is present but reports 65535 to indicate unknown. I am also setting that to -1.
|
Maybe we discussed this in the past but still: my main concern is to have a reliable testing pipeline in place which exercises the code. Are there good enough (or at all?) ARM machines we can leverage from github actions and/or from prow? I understand the requirements and I agree is nice to support ARM, but without CI in place the support itself will be brittle and unreliable, not great for users. |
One option, is to request CNCF service desk for ARM machines from Oracle Cloud as they have done some donation to CNCF.
|
|
and then we could even setup self hosted github runners or things like that |
|
I am thinking about how we handle this in kubelet today and whether we can follow the same approach here. I believe kubelet relies on cAdvisor for all topology information. For the long term, would it make sense for us to adopt the same mechanism here? |
|
Once kubernetes/test-infra#36640 merges, I think we can have arm e2e tests on prow (example) |
ffromani
left a comment
There was a problem hiding this comment.
Initial review, will need another pass. Nothing concerning so far and some nice improvements worth merging anyway.
It seems we have a bit of chicken/egg problem: we can't enable ARM CI without this PR, but we can't test this PR without CI OR without developers getting access to ARM boxes, which is unpractical.
But I'm positive about this PR, so we can probably bootstrap merging this code first.
| // On many x86 systems, the kernel reports 65535 to indicate "No Cluster". | ||
| // We treat this as -1 to mean unknown/not applicable. | ||
| if clusterID == 65535 { | ||
| cpuInfo.ClusterID = -1 |
There was a problem hiding this comment.
Is this behavior documented somewhere in the kernel docs? if so please add a link.
It would also be nice to move the magic numbers as module-private constants.
There was a problem hiding this comment.
Updated the comment and added link to kernel documentation
| return nil | ||
| } | ||
|
|
||
| // TODO: Handle more complex sibling relationships (e.g. 4-way SMT) if needed in the future. For now we only handle 2-way hyperthreading which is the most common case. |
There was a problem hiding this comment.
I agree: let's note this may happen but let's not implement if we can't test it on real HW
|
From a quick test run on my Pi, everything looked correct except I noticed that the NUMA id is missing. ownerReferences:
- apiVersion: v1
controller: true
kind: Node
name: dra-driver-cpu-control-plane
uid: 4806a758-aa47-4075-8088-a63328e3d1fb
resourceVersion: "1421"
uid: e3d80168-473d-4b01-bcd3-adf7cd910253
spec:
devices:
- attributes:
dra.cpu/cacheL3ID:
int: -1
dra.cpu/coreID:
int: 0
dra.cpu/coreType:
string: standard
dra.cpu/cpuID:
int: 0
dra.cpu/numaNodeID:
int: 0
dra.cpu/smtEnabled:
bool: false
dra.cpu/socketID:
int: 0
dra.net/numaNode:
int: 0
name: cpudev000
- attributes:
dra.cpu/cacheL3ID:
int: -1
dra.cpu/coreID:
int: 1
dra.cpu/coreType:
string: standard
dra.cpu/cpuID:
int: 1
dra.cpu/numaNodeID:
int: 0
dra.cpu/smtEnabled:
bool: false
dra.cpu/socketID:
int: 0
dra.net/numaNode:
int: 0
name: cpudev001
- attributes:
dra.cpu/cacheL3ID:
int: -1
dra.cpu/coreID:
int: 2
dra.cpu/coreType:
string: standard
dra.cpu/cpuID:
int: 2
dra.cpu/numaNodeID:
int: 0
dra.cpu/smtEnabled:
bool: false
dra.cpu/socketID:
int: 0
dra.net/numaNode:
int: 0
name: cpudev002
- attributes:
dra.cpu/cacheL3ID:
int: -1
dra.cpu/coreID:
int: 3
dra.cpu/coreType:
string: standard
dra.cpu/cpuID:
int: 3
dra.cpu/numaNodeID:
int: 0
dra.cpu/smtEnabled:
bool: false
dra.cpu/socketID:
int: 0
dra.net/numaNode:
int: 0
name: cpudev003
driver: dra.cpu
nodeName: dra-driver-cpu-control-planeDefaulted container "dracpu" out of: dracpu, enable-nri-and-cdi (init)
I0409 13:49:05.294122 2405 app.go:236] dracpu go go1.25.9 build: fa54c505a210940fdbc0290a77e8bb85cef48826 time: 2026-04-09T12:51:01Z
I0409 13:49:05.294362 2405 app.go:118] FLAG: --add_dir_header="false"
I0409 13:49:05.294399 2405 app.go:118] FLAG: --alsologtostderr="false"
I0409 13:49:05.294408 2405 app.go:118] FLAG: --bind-address=":8080"
I0409 13:49:05.294420 2405 app.go:118] FLAG: --cpu-device-mode="individual"
I0409 13:49:05.294433 2405 app.go:118] FLAG: --group-by="numanode"
I0409 13:49:05.294444 2405 app.go:118] FLAG: --hostname-override=""
I0409 13:49:05.294453 2405 app.go:118] FLAG: --kubeconfig=""
I0409 13:49:05.294460 2405 app.go:118] FLAG: --log_backtrace_at=":0"
I0409 13:49:05.294484 2405 app.go:118] FLAG: --log_dir=""
I0409 13:49:05.294493 2405 app.go:118] FLAG: --log_file=""
I0409 13:49:05.294500 2405 app.go:118] FLAG: --log_file_max_size="1800"
I0409 13:49:05.294516 2405 app.go:118] FLAG: --logtostderr="true"
I0409 13:49:05.294523 2405 app.go:118] FLAG: --one_output="false"
I0409 13:49:05.294530 2405 app.go:118] FLAG: --reserved-cpus=""
I0409 13:49:05.294537 2405 app.go:118] FLAG: --skip_headers="false"
I0409 13:49:05.294544 2405 app.go:118] FLAG: --skip_log_headers="false"
I0409 13:49:05.294552 2405 app.go:118] FLAG: --stderrthreshold="2"
I0409 13:49:05.294562 2405 app.go:118] FLAG: --v="4"
I0409 13:49:05.294579 2405 app.go:118] FLAG: --vmodule=""
I0409 13:49:05.296637 2405 envvar.go:172] "Feature gate default state" feature="InOrderInformers" enabled=true
I0409 13:49:05.296717 2405 envvar.go:172] "Feature gate default state" feature="InOrderInformersBatchProcess" enabled=true
I0409 13:49:05.296772 2405 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=true
I0409 13:49:05.296793 2405 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=true
I0409 13:49:05.296807 2405 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
I0409 13:49:05.296832 2405 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
I0409 13:49:05.302406 2405 draplugin.go:602] "Starting"
I0409 13:49:05.303261 2405 nonblockinggrpcserver.go:90] "GRPC server started" logger="dra" endpoint="/var/lib/kubelet/plugins/dra.cpu/dra.sock"
I0409 13:49:05.303719 2405 nonblockinggrpcserver.go:90] "GRPC server started" logger="registrar" endpoint="/var/lib/kubelet/plugins_registry/dra.cpu-reg.sock"
I0409 13:49:07.304262 2405 cdi.go:76] Initialized CDI file manager for "/var/run/cdi/dra.cpu.json"
time="2026-04-09T13:49:07Z" level=info msg="Created plugin 00-dra.cpu (dracpu, handles CreateContainer,StopContainer,RemoveContainer)"
I0409 13:49:07.304877 2405 app.go:204] driver started
I0409 13:49:07.305027 2405 dra_hooks.go:223] Publishing resources
I0409 13:49:07.305879 2405 resourceslicecontroller.go:538] "Starting ResourceSlice informer and waiting for it to sync" logger="ResourceSlice controller"
I0409 13:49:07.306187 2405 reflector.go:367] "Starting reflector" logger="ResourceSlice controller" type="*v1.ResourceSlice" resyncPeriod="0s" reflector="pkg/mod/k8s.io/client-go@v0.35.0/tools/cache/reflector.go:289"
I0409 13:49:07.306252 2405 reflector.go:411] "Listing and watching" logger="ResourceSlice controller" type="*v1.ResourceSlice" reflector="pkg/mod/k8s.io/client-go@v0.35.0/tools/cache/reflector.go:289"
time="2026-04-09T13:49:07Z" level=info msg="Registering plugin 00-dra.cpu..."
time="2026-04-09T13:49:07Z" level=info msg="Configuring plugin 00-dra.cpu for runtime containerd/v2.2.0..."
time="2026-04-09T13:49:07Z" level=info msg="Started plugin 00-dra.cpu..."
I0409 13:49:07.360822 2405 reflector.go:978] "Exiting watch because received the bookmark that marks the end of initial events stream" logger="ResourceSlice controller" reflector="pkg/mod/k8s.io/client-go@v0.35.0/tools/cache/reflector.go:289" totalItems=1 duration="54.187257ms"
I0409 13:49:07.361135 2405 reflector.go:446] "Caches populated" logger="ResourceSlice controller" type="*v1.ResourceSlice" reflector="pkg/mod/k8s.io/client-go@v0.35.0/tools/cache/reflector.go:289"
I0409 13:49:07.387422 2405 nri_hooks.go:33] Synchronized state with the runtime (10 pods, 10 containers)...
I0409 13:49:07.387791 2405 nri_hooks.go:41] Synchronize pod kube-system/kindnet-gc4wj UID 93c7fefd-9cb7-4688-9457-c13e5671ba19
I0409 13:49:07.387889 2405 nri_hooks.go:41] Synchronize pod kube-system/coredns-7d764666f9-j8ldv UID b3521ec7-baa2-4d34-b630-1fd7ba302d30
I0409 13:49:07.387908 2405 nri_hooks.go:41] Synchronize pod kube-system/coredns-7d764666f9-8grpj UID 5fdb8242-071d-4a7d-84e3-d074fd4be08f
I0409 13:49:07.387923 2405 nri_hooks.go:41] Synchronize pod kube-system/dracpu-2nxh6 UID b8c35ec3-4666-40e6-9fd6-eca33a9038c6
I0409 13:49:07.387936 2405 nri_hooks.go:41] Synchronize pod kube-system/kube-apiserver-dra-driver-cpu-control-plane UID 37ed9f0908e60821052c00fd207c2940
I0409 13:49:07.387950 2405 nri_hooks.go:41] Synchronize pod local-path-storage/local-path-provisioner-67b8995b4b-drswn UID fb74fb34-6064-43a4-ad28-bf74b59ac871
I0409 13:49:07.387961 2405 nri_hooks.go:41] Synchronize pod kube-system/etcd-dra-driver-cpu-control-plane UID 1f74daaa1d7133221e2c745d85d4a2dc
I0409 13:49:07.387972 2405 nri_hooks.go:41] Synchronize pod kube-system/kube-proxy-q8gwb UID a809b7a5-298d-48b7-8bd3-1c1d61dbbf1c
I0409 13:49:07.387984 2405 nri_hooks.go:41] Synchronize pod kube-system/kube-controller-manager-dra-driver-cpu-control-plane UID e6452df7fa0fae1030da6ad72e966c17
I0409 13:49:07.388004 2405 nri_hooks.go:41] Synchronize pod kube-system/kube-scheduler-dra-driver-cpu-control-plane UID 0f01c9d9e97c323e3aa548ef8e0e1a1f
I0409 13:49:08.306950 2405 resourceslicecontroller.go:553] "ResourceSlice informer has synced" logger="ResourceSlice controller"
I0409 13:49:08.307061 2405 resourceslicecontroller.go:184] "Starting" logger="ResourceSlice controller" |
|
Thanks for testing the changes locally @fmuyassarov. Taking a look. |
To add here as well as we discussed with @pravk03 offline. I realized from the output of numactl, that the same CPU belongs to more than one NUMA This isn't a real NUMA setup I would say. Normally each CPU belongs to exactly one NUMA node, but here all 4 cores appear in both nodes which shouldn't happen. Although it is weird - it is common for Raspberry Pi 4 (BCM2711 SoC) which exposes two regions in its physical address map, a low memory window and a high memory window. Basically, it was not the best idea to try out the changes on the Pi :D. Sorry for confusion @pravk03 |
|
/retest |
|
The flaky presubmit should be fixed by #116 |
The current mechanism of parsing `/proc/cpuinfo` does not work on ARM architectures as SocketID and CoreID is missing
On ARM, id file is missing from `devices/system/cpu/cpuX/cache/index3`. We fall backto using the smallest CPU ID in the shared_cpu_list at that cache level.
ffromani
left a comment
There was a problem hiding this comment.
Besides a terminology thingy which we can fix later, LGTM
This is a nice improvement in how we extract the topology data from the system I'd merge anyway even without arm support.
|
/approve @pravk03 your call if you want to address the terminology item here, later (or at all?). Unhold when you prefer. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, pravk03 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
/unhold |
/proc/cpuinfoto sysfs based approach (/sys/devices/system/cpu)notimplementedstatus