Skip to content

[WIP][FG:InPlacePodVerticalScaling] Consider exist CPU in topology manager when pod resize#10

Open
Chunxia202410 wants to merge 9 commits intoesotsal:policy_staticfrom
Chunxia202410:topology_hints
Open

[WIP][FG:InPlacePodVerticalScaling] Consider exist CPU in topology manager when pod resize#10
Chunxia202410 wants to merge 9 commits intoesotsal:policy_staticfrom
Chunxia202410:topology_hints

Conversation

@Chunxia202410
Copy link
Copy Markdown

@Chunxia202410 Chunxia202410 commented Apr 25, 2025

What type of PR is this?

/kind bug
/kind api-change

What this PR does / why we need it:

When Guaranteed QoS Class Pod scale up and down with Static CPU management policy alongside InPlacePodVerticalScaling, the additional CPUs should near the existed CPUs.

This proposal consider existed CPUs in topology manager, that means before allocate additional CPUs, we can selecte the prefer numa nodes which near the existed CPUs (allocated CPUs).

There are two scope, container scope and pod scope.

For container scope:

  • When container scale up,

    If allocated CPUs number + reusable CPUs number > request CPU number of the container, that means the reusable CPU is enough to allocate additional CPUs, so topologyHint generation based on the reusable CPUs, and allocated CPUs (existed CPUs).
    image

    If allocated CPUs number + reusable CPUs number < request CPU number of the container, that means all of the reusable CPU will allocate to this container, and the remaining CPUs will be allocated from available CPUs, topologyHint generation based on the available CPUs, and allocated CPUs + reusable CPUs.
    image

Note: the reusable CPU will be first to allocate to the resize container in kubernetes#131966.

  • When container scale down, topologyHint generation based on the allocated CPUs and the promised CPUs.
    image

For pod scope:
A lot of containers may scale up and scale down at the same time, and the order of scale up and down uncertain, so the topologyHint generation based on the available CPUs and pod assigned CPUs + reusable CPUs whether it scale up and down.

Similar purpose with #3, but the difference is this is handled in CPU manager for previous, and this proposal is more simple as discussed in kubernetes#129719 (comment).

This implement is based on the code of kubernetes#129719, and kubernetes#131966.

Further Thinking
Should we need to consider below cases?
image
The reserved CPUs are {0,10}, the available CPU is {9,19}.
There are 2 containers of 1 pod, every container have 8 CPUs.
When container 0 scale up from 8 → 14, and container 1 scale down from 8 → 2 at the same time, it will failed because the container 0 scale up first, and there are not enough CPUs, and the container1 will not resize anymore.

The first proposal is: For every resource (CPU and memory), handle scale down resource first, and scale up resource later.
The Detail Design:
In Admit function of None Scope / Container Scope / Pod Scope, for previous code, there are only one loop to process all containers, and the process order as defined in the yaml feil.
Add another container loop.

  1. In the first Loop, process all device, CPU and memory allocate when Pod creation. process CPU and memory scale down, process no scaling case like kubelet restart.
  2. In the second loop, process CPU and memory scale up.

[Update]
The second proposal is: If one container allocate resource failed, continue to allocate resource to other containers.
==> For this case solved by second proposal the process like: In first resize round, the container 0 failed and container 1 success. In second resize round, the container 0 success.

If this cases can be solved, the Pod scope topology hints step can be

  • When total request CPU number of Pod(sum request of all containers) > number of assigned CPU of pod(sum of assigned CPUs of all containers), and reusableCPUs + assignedCPUs of pod >= total request CPU number of pod. Generate the topology hints based on reusableCPU and assignedCPUs of pod. ==》generateCPUTopologyHints(reusableCPU, assignedCPUs of pod, total request CPU number of Pod)
  • When total request CPU number of Pod(sum request of all containers) <= number of assigned CPU of pod(sum of assigned CPUs of all containers). Generate the topology hints only based on assignedCPUs of pod. ==》 generateCPUTopologyHints(assignedCPUs of pod, assignedCPUs of pod, requested of Pod)
  • For other cases, generate the topology hints based on availableCPUs and reusableCPU + assignedCPUs of pod. ==》 generateCPUTopologyHints(availableCPUs, reusableCPUs + assignedCPUs of pod, total request CPU number of Pod)

@esotsal esotsal force-pushed the policy_static branch 9 times, most recently from dc25a5a to dcd3846 Compare May 5, 2025 15:12
@esotsal esotsal force-pushed the policy_static branch 3 times, most recently from cb7ed3f to d9af3c8 Compare May 16, 2025 13:53
@esotsal esotsal force-pushed the policy_static branch 5 times, most recently from 8e7e1ec to f61ff7b Compare May 24, 2025 11:59
@esotsal esotsal force-pushed the policy_static branch 9 times, most recently from 0153b60 to 09b2d95 Compare June 1, 2025 09:02
@esotsal esotsal force-pushed the policy_static branch 2 times, most recently from 4ce5b01 to e626b44 Compare June 4, 2025 13:30
@Chunxia202410 Chunxia202410 force-pushed the topology_hints branch 2 times, most recently from 67db105 to bc0c6b1 Compare July 1, 2025 10:43
@esotsal esotsal force-pushed the policy_static branch 5 times, most recently from 6796c9f to d4b3faf Compare July 15, 2025 15:46
@esotsal esotsal force-pushed the policy_static branch 6 times, most recently from 5208b8c to 2f0571c Compare July 30, 2025 07:52
@esotsal esotsal force-pushed the policy_static branch 4 times, most recently from 5c67c1c to 74ca59e Compare July 31, 2025 18:02
esotsal and others added 9 commits September 8, 2025 16:59
Use new topology.Allocation struct (a CPU set plus
alignment metadata) instead of CPU set, due to rebase.

Remove duplicate unecessary SetDefaultCPUSet call as per
review comment.
- Revert introduction of API env mustKeepCPUs
- Replace mustKeepCPUs with local checkpoint "promised"
- Introduce "promised" in CPUManagerCheckpointV3 format
- Add logic, refactor with Beta candidate
- Fix lint issues
- Fail if mustKeepCPUs are not subset of resulted CPUs
- Fail if reusableCPUsForResize, mustKeepCPUs are not a subset
  of aligned CPUs
- Fail if mustKeepCPUs are not a subset of reusable CPUs
- TODO improve align resize tests, go through testing, corner cases
       refactor using cpumanager_test.go
- TODO improve CPUManagerCheckpointV3 tests
- TODO address code review/feedback to try different approach to allocate
       stepwise instead of once off when resizing
- TODO check init-containers
- TODO check migration from v2 to v3 CPU Manager checkpoint
- TODO check kubectl failure when prohibited can this be done earlier?
- WIP  update CPU Manager tests to use refactored cpu_manager_test
- TODO update topologymanager,cpumanager,memorymanager documentation
# Conflicts:
#	pkg/kubelet/cm/cpumanager/policy_static.go
@AutuSnow
Copy link
Copy Markdown

AutuSnow commented Mar 3, 2026

Is this the new KEP? It looks very interesting, and I have some questions after reading some code:

  1. Each layer of allocation logic has added * ForResize replicas, and the core logic of these ForResize methods is the same as the original method, with the only difference being the sorting preference. Can existing sorting methods be reused by accepting a "preference set" parameter instead of copying the entire set?
  2. When preemptAlignByUncoreCache is enabled, resizing does not prioritize replenishing CPUs within the allocated UncoreCache domain. This means that the newly added CPUs during scale up may be allocated across Uncore Cache, which goes against the original intention of Uncore alignment, right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants