Skip to content

Refactor CPU Topology Representation for Asymmetric/Heterogeneous Systems#59

Closed
AutuSnow wants to merge 1 commit intokubernetes-sigs:mainfrom
AutuSnow:fix/Topology/CPU_Topology_Representation
Closed

Refactor CPU Topology Representation for Asymmetric/Heterogeneous Systems#59
AutuSnow wants to merge 1 commit intokubernetes-sigs:mainfrom
AutuSnow:fix/Topology/CPU_Topology_Representation

Conversation

@AutuSnow
Copy link
Copy Markdown
Contributor

@AutuSnow AutuSnow commented Feb 9, 2026

Refactor CPU topology representation to eliminate symmetric/uniform assumptions in CPU allocation logic, as discussed in #16 (comment).

Motivation
The previous CPUTopology methods (CPUsPerCore, CPUsPerSocket, CPUsPerUncore) computed global averages via simple division (e.g., NumCPUs / NumCores). This is incorrect when CPUsare offlined asymmetrically or on heterogeneous systems — all the math goes haywire as

Not fully resolved:

  1. takePartialUncore still uses an average within the uncore cache scope. The cpusPerCore calculation (cpusInUncore.Size() / numCores, rounded up) is a per-uncore-cache average rather than the old global average. This is significantly more accurate, but still an approximation,However, this scenario is extremely rare in actual hardware - inside the uncore cache Core is usually symmetrical.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AutuSnow
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 9, 2026
@AutuSnow AutuSnow force-pushed the fix/Topology/CPU_Topology_Representation branch from 2b0f858 to f16a624 Compare February 9, 2026 15:58
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 9, 2026
@AutuSnow
Copy link
Copy Markdown
Contributor Author

AutuSnow commented Feb 9, 2026

/assign @ffromani @pravk03

@ffromani
Copy link
Copy Markdown
Contributor

This PR changes the cpumanager code we imported. At glance the changes seems sensible, but I need to careful review. So, the first question here is: is it time already to drift from the cpumanager reference code we imported?

@AutuSnow
Copy link
Copy Markdown
Contributor Author

This PR changes the cpumanager code we imported. At glance the changes seems sensible, but I need to careful review. So, the first question here is: is it time already to drift from the cpumanager reference code we imported?


I believe it's the right time for this specific drift. The project README already states strict cpumanager compatibility is not a goal, and the codebase has already diverged with UncoreCache, CoreType, and NUMA distribution features. The removed methods (CPUsPerCore,CPUsPerSocket, CPUsPerUncore) had an explicit TODO (added in PR #16 review) requesting exactly this refactor. The change is narrowly scoped — it replaces global-average division with pre-computed per-entity maps while keeping the overall allocation algorithm structure intact. The upstream cpumanager is unlikely to address this since it's being superseded by DRA. The backport path remains unaffected since no files were restructured.


@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@AutuSnow: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-dra-driver-cpu-e2e-device-mode-individual-amd64 f16a624 link true /test pull-dra-driver-cpu-e2e-device-mode-individual-amd64
pull-dra-driver-cpu-e2e-device-mode-grouped-amd64 f16a624 link true /test pull-dra-driver-cpu-e2e-device-mode-grouped-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AutuSnow
Copy link
Copy Markdown
Contributor Author

/cc @ffromani I would like to hear your thoughts. Is it necessary to continue waiting for the current PR

@AutuSnow
Copy link
Copy Markdown
Contributor Author

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@AutuSnow: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants