Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions content/en/blog/_posts/2026/pod-level-resource-managers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
layout: blog
title: "Kubernetes 1.36: Pod-Level Resource Managers (Alpha)"
date: 2026-03-31
slug: kubernetes-1-36-feature-pod-level-resource-managers-alpha
author: Kevin Torres (Google)
---

This blog post describes the Pod-Level Resource Managers, a new alpha feature
introduced in Kubernetes v1.36. This enhancement extends the Kubelet's Topology,
CPU, and Memory Managers to support pod-level resource specifications.

This feature evolves the resource managers from a strictly per-container
allocation model to a pod-centric one. It enables them to use
`pod.spec.resources` to perform NUMA alignment for the pod as a whole, and
introduces a partitioning scheme to manage resources for containers within that
pod-level grouping. This change introduces a more flexible and powerful resource
management model, particularly for performance-sensitive workloads, allowing you
to define hybrid allocation models where some containers receive exclusive
resources while others share the remaining resources from a pod shared pool.

This blog post covers:

1. [Why do we need Pod-Level Resource Managers?](#why-do-we-need-pod-level-resource-managers)
2. [Glossary](#glossary)
3. [How do Pod-Level Resource Managers work?](#how-do-pod-level-resource-managers-work)
4. [Current limitations and caveats](#current-limitations-and-caveats)

## Why do we need Pod-Level Resource Managers?

When working with performance-critical workloads (like AI/ML, High-Performance
Computing, or others), you often need exclusive, NUMA-aligned resources for your
primary application containers. However, modern Kubernetes pods frequently
include sidecar containers (e.g., for logging, monitoring, or data ingestion).

Historically, you either had to allocate exclusive, NUMA-aligned resources to
every container in a Guaranteed pod (which is wasteful for lightweight sidecars)
or forfeit the pod-level Guaranteed QoS class entirely.

By enabling the `PodLevelResourceManagers` feature (which also requires the
`PodLevelResources` feature gate), the kubelet can create hybrid resource
allocation models, bringing flexibility and efficiency to high-performance
workloads without sacrificing NUMA alignment.

## Glossary

To fully understand this new feature, it helps to define a few key terms:

- **Pod Level Resources**: The resource budget defined at the pod level in
`pod.spec.resources`, which specifies the collective requests and limits for
the entire pod.
- **Guaranteed Container**: Within the context of this feature, a container is
considered `Guaranteed` if it specifies resource requests equal to its
limits for both CPU (exclusive CPU allocation requires a positive integer
value) and Memory. This status makes it eligible for exclusive resource
allocation from the resource managers.
- **Pod Shared Pool**: The subset of a pod's allocated resources that remains
after all exclusive slices have been reserved. These resources are shared by
all containers in the pod that do not receive an exclusive allocation. While
containers in this pool share resources with each other, they are strictly
isolated from the exclusive slices and the general node-wide shared pool.
- **Exclusive Slice**: A dedicated portion of resources (e.g., specific CPUs
or memory pages) allocated solely to a single container, ensuring isolation
from other containers.

## How do Pod-Level Resource Managers work?

The resource managers operate differently depending on the configured Topology
Manager scope:

### Pod Scope

When the Topology Manager scope is set to `pod`, the Kubelet performs a single
NUMA alignment for the entire pod based on the resource budget defined in
`pod.spec.resources`.

The resulting NUMA-aligned resource pool is then partitioned:

1. **Exclusive Slices:** Containers that specify `Guaranteed` resources are
allocated exclusive slices from the pod's total allocation.
2. **Pod Shared Pool:** The remaining resources form a shared pool that is
shared among all other non-Guaranteed containers in the pod. While
containers in this pool share resources with each other, they are strictly
isolated from the exclusive slices and the general node-wide shared pool.

Note that when standard init containers run to completion, their resources are
added to a per-pod reusable set, rather than being returned to the node's
resource pool. Because they run sequentially, these resources are made reusable
for subsequent app containers (either for their own exclusive slices or for the
shared pool).

This allows you to co-locate containers that require exclusive resources with
those that do not, all within a single NUMA-aligned pod.

**Important Pod Scope considerations:**

- Empty Shared Pool Rejection: If the sum of all exclusive container requests
exactly matches the pod's total budget, but there is another container that
requires the shared pool, the pod will be rejected at admission. For
example, a pod asking for a pod-level budget of 4 CPUs, where `container-1`
requires an exclusive 1 CPU and `container-2` requires an exclusive 3 CPUs.
Because there are 0 CPUs left in the shared pool for `container-3`, this pod
is rejected.

### Container Scope

When the Topology Manager scope is set to `container`, the Kubelet evaluates
each container individually for exclusive allocation.

If the overall pod achieves a `Guaranteed` QoS class via `pod.spec.resources`,
you can mix and match containers:

- Containers with their own `Guaranteed` requests receive exclusive
NUMA-aligned resources.
- Other non-Guaranteed containers in the pod run in the node's general shared
pool.
- The collective resource consumption of all containers is still enforced by
the pod's `pod.spec.resources` limits.

This scope is extremely useful when an infrastructure sidecar needs to be
aligned to a specific NUMA node for device access, while the main workload can
run in the general node shared pool.

### Under-the-hood: CPU Quotas (CFS)

When running mixed workloads within a pod, isolation is enforced differently
depending on the allocation:

- **Exclusive Containers:** Containers granted exclusive CPU slices have their
CPU CFS quota enforcement disabled (`ResourceIsolationContainer`), allowing
them to run without being throttled by the Linux scheduler.
- **Pod Shared Pool Containers:** Containers falling into the pod shared pool
have CPU CFS quotas enabled (`ResourceIsolationPod`), ensuring they do not
consume more than the leftover pod budget.
Comment on lines +129 to +134
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC ResourceIsolation is an internal concept. If I'm right, let's not surface the naming to users. It may beneficial to describe the concept (e.g. "resource isolation is at XYZ level") but using the specific term in code hints at something we should not expose.


## Current limitations and caveats

- The functionality is currently implemented only for the `static` CPU Manager
policy and the `Static` Memory Manager policy.
- This feature is only supported on Linux nodes. On Windows nodes, the
resource managers will act as a no-op for pod-level allocations.
- As a fundamental requirement of using pod.spec.resources, the sum of all
container-level resource requests must not exceed the pod-level resource
budget.
- If you downgrade the Kubelet to a version that does not support this
feature, the older Kubelet will fail to read the newer checkpoint files.
This incompatibility occurs because the newer schema introduces new
top-level fields to store pod-level allocations, which older Kubelet
versions cannot parse.
Comment on lines +147 to +149
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the problem for us was the checksum not matching because was computed including fields an older kubelet would ignore. If my memory serves, this is an operational/internal detail which makes little sense for users: is not actionable.
So I'd avoid mention it.


## Getting started and providing feedback

You can read the
[Assign Pod-level CPU and memory resources](/docs/tasks/configure-pod-container/assign-pod-level-resources/)
to understand how to use the overall Pod Level Resource feature, and
[Use Pod-level Resources with Resource Managers](/docs/tasks/administer-cluster/pod-level-resource-managers/)
documentation to learn more about how to use this feature!

As this feature moves through Alpha, your feedback is invaluable. Please report
any issues or share your experiences via the standard Kubernetes communication
channels:

* Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node)
* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)