diff --git a/content/en/blog/_posts/2026/pod-level-resource-managers.md b/content/en/blog/_posts/2026/pod-level-resource-managers.md new file mode 100644 index 0000000000000..6d9c50195666b --- /dev/null +++ b/content/en/blog/_posts/2026/pod-level-resource-managers.md @@ -0,0 +1,165 @@ +--- +layout: blog +title: "Kubernetes 1.36: Pod-Level Resource Managers (Alpha)" +date: 2026-03-31 +slug: kubernetes-1-36-feature-pod-level-resource-managers-alpha +author: Kevin Torres (Google) +--- + +This blog post describes the Pod-Level Resource Managers, a new alpha feature +introduced in Kubernetes v1.36. This enhancement extends the Kubelet's Topology, +CPU, and Memory Managers to support pod-level resource specifications. + +This feature evolves the resource managers from a strictly per-container +allocation model to a pod-centric one. It enables them to use +`pod.spec.resources` to perform NUMA alignment for the pod as a whole, and +introduces a partitioning scheme to manage resources for containers within that +pod-level grouping. This change introduces a more flexible and powerful resource +management model, particularly for performance-sensitive workloads, allowing you +to define hybrid allocation models where some containers receive exclusive +resources while others share the remaining resources from a pod shared pool. + +This blog post covers: + +1. [Why do we need Pod-Level Resource Managers?](#why-do-we-need-pod-level-resource-managers) +2. [Glossary](#glossary) +3. [How do Pod-Level Resource Managers work?](#how-do-pod-level-resource-managers-work) +4. [Current limitations and caveats](#current-limitations-and-caveats) + +## Why do we need Pod-Level Resource Managers? + +When working with performance-critical workloads (like AI/ML, High-Performance +Computing, or others), you often need exclusive, NUMA-aligned resources for your +primary application containers. However, modern Kubernetes pods frequently +include sidecar containers (e.g., for logging, monitoring, or data ingestion). + +Historically, you either had to allocate exclusive, NUMA-aligned resources to +every container in a Guaranteed pod (which is wasteful for lightweight sidecars) +or forfeit the pod-level Guaranteed QoS class entirely. + +By enabling the `PodLevelResourceManagers` feature (which also requires the +`PodLevelResources` feature gate), the kubelet can create hybrid resource +allocation models, bringing flexibility and efficiency to high-performance +workloads without sacrificing NUMA alignment. + +## Glossary + +To fully understand this new feature, it helps to define a few key terms: + +- **Pod Level Resources**: The resource budget defined at the pod level in + `pod.spec.resources`, which specifies the collective requests and limits for + the entire pod. +- **Guaranteed Container**: Within the context of this feature, a container is + considered `Guaranteed` if it specifies resource requests equal to its + limits for both CPU (exclusive CPU allocation requires a positive integer + value) and Memory. This status makes it eligible for exclusive resource + allocation from the resource managers. +- **Pod Shared Pool**: The subset of a pod's allocated resources that remains + after all exclusive slices have been reserved. These resources are shared by + all containers in the pod that do not receive an exclusive allocation. While + containers in this pool share resources with each other, they are strictly + isolated from the exclusive slices and the general node-wide shared pool. +- **Exclusive Slice**: A dedicated portion of resources (e.g., specific CPUs + or memory pages) allocated solely to a single container, ensuring isolation + from other containers. + +## How do Pod-Level Resource Managers work? + +The resource managers operate differently depending on the configured Topology +Manager scope: + +### Pod Scope + +When the Topology Manager scope is set to `pod`, the Kubelet performs a single +NUMA alignment for the entire pod based on the resource budget defined in +`pod.spec.resources`. + +The resulting NUMA-aligned resource pool is then partitioned: + +1. **Exclusive Slices:** Containers that specify `Guaranteed` resources are + allocated exclusive slices from the pod's total allocation. +2. **Pod Shared Pool:** The remaining resources form a shared pool that is + shared among all other non-Guaranteed containers in the pod. While + containers in this pool share resources with each other, they are strictly + isolated from the exclusive slices and the general node-wide shared pool. + +Note that when standard init containers run to completion, their resources are +added to a per-pod reusable set, rather than being returned to the node's +resource pool. Because they run sequentially, these resources are made reusable +for subsequent app containers (either for their own exclusive slices or for the +shared pool). + +This allows you to co-locate containers that require exclusive resources with +those that do not, all within a single NUMA-aligned pod. + +**Important Pod Scope considerations:** + +- Empty Shared Pool Rejection: If the sum of all exclusive container requests + exactly matches the pod's total budget, but there is another container that + requires the shared pool, the pod will be rejected at admission. For + example, a pod asking for a pod-level budget of 4 CPUs, where `container-1` + requires an exclusive 1 CPU and `container-2` requires an exclusive 3 CPUs. + Because there are 0 CPUs left in the shared pool for `container-3`, this pod + is rejected. + +### Container Scope + +When the Topology Manager scope is set to `container`, the Kubelet evaluates +each container individually for exclusive allocation. + +If the overall pod achieves a `Guaranteed` QoS class via `pod.spec.resources`, +you can mix and match containers: + +- Containers with their own `Guaranteed` requests receive exclusive + NUMA-aligned resources. +- Other non-Guaranteed containers in the pod run in the node's general shared + pool. +- The collective resource consumption of all containers is still enforced by + the pod's `pod.spec.resources` limits. + +This scope is extremely useful when an infrastructure sidecar needs to be +aligned to a specific NUMA node for device access, while the main workload can +run in the general node shared pool. + +### Under-the-hood: CPU Quotas (CFS) + +When running mixed workloads within a pod, isolation is enforced differently +depending on the allocation: + +- **Exclusive Containers:** Containers granted exclusive CPU slices have their + CPU CFS quota enforcement disabled (`ResourceIsolationContainer`), allowing + them to run without being throttled by the Linux scheduler. +- **Pod Shared Pool Containers:** Containers falling into the pod shared pool + have CPU CFS quotas enabled (`ResourceIsolationPod`), ensuring they do not + consume more than the leftover pod budget. + +## Current limitations and caveats + +- The functionality is currently implemented only for the `static` CPU Manager + policy and the `Static` Memory Manager policy. +- This feature is only supported on Linux nodes. On Windows nodes, the + resource managers will act as a no-op for pod-level allocations. +- As a fundamental requirement of using pod.spec.resources, the sum of all + container-level resource requests must not exceed the pod-level resource + budget. +- If you downgrade the Kubelet to a version that does not support this + feature, the older Kubelet will fail to read the newer checkpoint files. + This incompatibility occurs because the newer schema introduces new + top-level fields to store pod-level allocations, which older Kubelet + versions cannot parse. + +## Getting started and providing feedback + +You can read the +[Assign Pod-level CPU and memory resources](/docs/tasks/configure-pod-container/assign-pod-level-resources/) +to understand how to use the overall Pod Level Resource feature, and +[Use Pod-level Resources with Resource Managers](/docs/tasks/administer-cluster/pod-level-resource-managers/) +documentation to learn more about how to use this feature! + +As this feature moves through Alpha, your feedback is invaluable. Please report +any issues or share your experiences via the standard Kubernetes communication +channels: + +* Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node) +* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node) +* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)