Blog post for DRA updates in 1.36#54567

Open

mortent wants to merge 2 commits intokubernetes:mainfrom

mortent:DRABlog136

Member

mortent commented Feb 20, 2026 •

edited

Loading

Description

This is a PR for the blog post covering DRA updates for 1.36. We plan a single blog post covering all DRA updates rather than individual blog posts for each feature.

Issue

k8s-ci-robot added this to the 1.36 milestone

k8s-ci-robot added do-not-merge/work-in-progress size/XS cncf-cla: yes labels

netlify bot commented Feb 20, 2026 •

edited

Loading

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`b0eea65`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-io-main-staging/deploys/69cc6945194c660007b864a8
😎 Deploy Preview	https://deploy-preview-54567--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Member

lmktfy commented Feb 21, 2026

/area blog

k8s-ci-robot added the area/blog label

Member

lmktfy commented Feb 21, 2026

This PR should target main (all PRs that add blog articles should target main)

mortent mentioned this pull request

DRA: Handle extended resource requests via DRA Driver kubernetes/enhancements#5004

Open

12 tasks

Member

nmn3m commented Feb 25, 2026

k8s-ci-robot requested a review from nmn3m

February 25, 2026 00:38

nmn3m mentioned this pull request

DRA: Resource Availability Visibility kubernetes/enhancements#5677

Open

6 tasks

This was referenced Mar 7, 2026

[WIP] Add HPA fallback external metrics blog #54649

Draft

Document DRA Device Binding Conditions in v1.36 #54541

Open

harche mentioned this pull request

Blog: KEP-4680 Resource Health Status reaches Beta in v1.36 #54534

Closed

Contributor

harche commented Mar 9, 2026

Hi @mortent, we're planning to fold our Resource Health Status feature (KEP-4680) into this umbrella blog post instead of maintaining a separate one (#54534).

KEP-4680 is reaching Beta in v1.36. It exposes device health information from Device Plugin and DRA in Pod Status. Let us know if you'd like us to contribute a section or provide any input for the post.

mortent force-pushed the DRABlog136 branch from d44bbd2 to 9b73102 Compare

March 17, 2026 22:26

k8s-ci-robot added area/localization language/en language/ja language/ko language/pl size/XXL language/zh sig/docs and removed size/XS labels

mortent changed the base branch from dev-1.36 to main

March 17, 2026 22:26

k8s-ci-robot added size/XS and removed size/XXL labels

Member Author

mortent commented Mar 17, 2026

/wg device-management

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation

pohly moved this from 🆕 New to 🏗 In progress in Dynamic Resource Allocation

mortent force-pushed the DRABlog136 branch from 9b73102 to 521cbc6 Compare

March 26, 2026 17:06

Contributor

k8s-ci-robot commented Mar 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lmktfy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

content/en/blog/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added size/L and removed size/XS labels

everpeace mentioned this pull request

[dev-1.36] KEP-5491: DRA: List Types for Attributes #54561

Open


          Blog post for DRA updates in 1.36

e3a6289

mortent force-pushed the DRABlog136 branch from 521cbc6 to e3a6289 Compare

March 26, 2026 17:22

mortent changed the title ~~[WIP] Blog post for DRA updates in 1.36~~ Blog post for DRA updates in 1.36

k8s-ci-robot removed the do-not-merge/work-in-progress label

yliaog reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Show resolved Hide resolved

SergeyKanzhelev reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

+              author: >
+                The DRA team
+              ---

Member

SergeyKanzhelev Mar 26, 2026

it will be great to include some information on adoption and gaps still left comparing to Device Plugin. Maybe a couple of words on available DRA drivers. So end users may make sense of this blog post.

Member Author

mortent Apr 1, 2026

Added a little section about the availability of drivers. I'm a little worried that by mentioning some drivers here, we might be forgetting others that also should be included. But I can ask in the device management chat if someone knows about other drivers that should be included.

I need to think a bit more about the gaps vs Device Plugin.

everpeace reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

+              more optimal scheduling decisions. To support this capability, the ResourceSlice
+              controller toolkit now automatically generates names that reflect the exact device
+              ordering specified by the driver author.

Contributor

everpeace Mar 26, 2026 •

edited

Loading

I want to include kubernetes/enhancements#5491 if it's worth putting in the feature blog.

ref: docs PR is #54561

Suggested change

      
            **List Types for Attributes**
          
            With
          
            [List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
          
            DRA can represent device attributes as typed lists (int, bool, string, and
          
            version), not just scalar values. This helps model real hardware topology, such
          
            as devices that belong to multiple PCIe roots or NUMA domains.
          
            This feature also extends `ResourceClaim` constraint behavior to work naturally
          
            with both scalar and list values: `matchAttribute` now checks for a non-empty
          
            intersection, and `distinctAttribute` checks for pairwise disjoint values.
          
            It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
          
            more easily when an attribute changes between scalar and list representations.

Member Author

mortent Apr 1, 2026

Sorry I forgot this one, it is definitely worth including. Added your suggestion.

Member Author

mortent Apr 1, 2026

Similar to my comment on #54567 (comment), do you think we could make it a bit more focused on just the benefits of the feature and leave some of the details to the DRA documentation? And see if we can keep it to a single paragraph?

lmktfy reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated

@@ @@ -0,0 +1,134 @@ @@
+              ---
+              layout: blog
+              title: "Kubernetes v1.36: DRA has graduated to GA"

Member

lmktfy Mar 26, 2026

Isn't it already GA?

Member Author

mortent Apr 1, 2026

Yeah, just forgot to update this when I used the template from a previous post. I've updated the title now, but open to better alternatives.

harche reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

+              devices or FPGAs—are fully prepared. By explicitly modeling resource readiness, this
+              prevents premature assignments that can lead to Pod failures, ensuring a much more robust
+              and predictable deployment process.

Contributor

harche Mar 27, 2026

I want to include kubernetes/enhancements#4680 in the feature blog.

ref: docs PR is #54420

Suggested change

      
            **Resource Health Status (Beta)**
          
            Knowing when a device has failed or become unhealthy is critical for
          
            workloads running on specialized hardware. With
          
            [Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
          
            Kubernetes now exposes device health information directly in the Pod
          
            Status through the `allocatedResourcesStatus` field. When a DRA driver
          
            detects that an allocated device has become unhealthy, it reports this
          
            back to the kubelet, which surfaces it in each container's status.
          
            In 1.36, the feature graduates to beta (enabled by default) and adds
          
            an optional `message` field providing human readable context about the
          
            health status, such as error details or failure reasons. DRA drivers
          
            can also configure per device health check timeouts, allowing different
          
            hardware types to use appropriate timeout values based on their
          
            health reporting characteristics. This gives users and controllers
          
            crucial visibility to quickly identify and react to hardware failures.

Member Author

mortent Apr 1, 2026

So I've added your proposal for now, but do you think we can shorten it a bit and make it just one paragraph? There is a large number of features and we don't want the blog post to be too long. Focus just on the benefits of this feature and what it enables and leave the details to the DRA docs which we link to. Also, including that it is graduating to beta in 1.36 is already given from the context.

Member Author

mortent Apr 1, 2026

Sorry I forgot to add this in the first draft, it is of course something we should include in the blog.

nmn3m reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md


		DRA Resource Availability Visibility

		One of the most requested features from cluster administrators has been better visibility

Member

nmn3m Mar 28, 2026

I'd like to improve this section to mention the actual API name (ResourcePoolStatusRequest), the feature gate, and the alpha status — consistent with how other features in this post reference their API objects and maturity level.

ref: docs PR is #54456

Suggested change

      
            One of the most requested features from cluster administrators has been better visibility
          
            One of the most requested features from cluster administrators has been better visibility
          
            into hardware capacity. The new
          
            [ResourcePoolStatusRequest](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resource-pool-status)
          
            API (alpha, behind the `DRAResourcePoolStatus` feature gate) allows you to query
          
            the availability of devices in DRA resource pools. By creating a
          
            ResourcePoolStatusRequest object, you get a point-in-time snapshot of device counts
          
            — total, allocated, available, and unavailable — for each pool managed by a given
          
            driver. This enables better integration with dashboards and capacity planning tools.

Member Author

mortent Apr 1, 2026

I added your suggestion, but made some small adjustments to make it similar to the other features mentioned. Let me know if you want to make some changes to it.

ttsuuubasa reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

Comment on lines +65 to +72

+              **Device Binding Conditions (Beta)**
+              To improve scheduling reliability, the Kubernetes scheduler can now use the
+              [Binding Conditions](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-taints-and-tolerations)
+              feature to delay committing a Pod to a Node until its required external resources—such as attachable
+              devices or FPGAs—are fully prepared. By explicitly modeling resource readiness, this
+              prevents premature assignments that can lead to Pod failures, ensuring a much more robust
+              and predictable deployment process.

Contributor

ttsuuubasa Mar 31, 2026

The Device Binding Conditions part looks good to me.

bart0sh reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

Member

lmktfy commented Mar 31, 2026

/remove-area localization
/remove-language ja
/remove-language ko
/remove-language pl
/remove-language zh

k8s-ci-robot removed area/localization language/ja language/ko language/pl language/zh labels


          Addressed comments

b0eea65

mortent force-pushed the DRABlog136 branch from 2c6595c to b0eea65 Compare

April 1, 2026 00:39

nojnhuh reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

+              **ResourceClaim Support for Workloads**
+              To optimize large-scale AI/ML workloads that rely on strict topological scheduling, the
+              [ResourceClaim Support for Workloads](add_link_here)

Contributor

nojnhuh Apr 1, 2026

This section looks good, thanks!

Right now I'm anticipating the link to be /docs/concepts/scheduling-eviction/dynamic-resource-allocation/#workload-resource-claims. I'll follow up once #54596 merges.

content/en/blog/_posts/2026/dra-136-update.md

+              Why should DRA only be for external accelerators? In v1.36, we are introducing the first
+              iterations of using the DRA API to manage Kubernetes native resources (like CPU and
+              Memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA

Contributor

nojnhuh Apr 1, 2026

nit: I don't think this needs to be capitalized.

Suggested change

      
            Memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA
          
            memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA

content/en/blog/_posts/2026/dra-136-update.md

+              With
+              [List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
+              DRA can represent device attributes as typed lists (int, bool, string, and version), not

Contributor

nojnhuh Apr 1, 2026

nit to match the names of the new fields:

Suggested change

      
            DRA can represent device attributes as typed lists (int, bool, string, and version), not
          
            DRA can represent device attributes as typed lists (`ints`, `bools`, `strings`, and `versions`), not

content/en/blog/_posts/2026/dra-136-update.md

+              just scalar values. This helps model real hardware topology, such as devices that belong
+              to multiple PCIe roots or NUMA domains.
+              This feature also extends `ResourceClaim` constraint behavior to work naturally

Contributor

nojnhuh Apr 1, 2026

nit: This shouldn't be formatted as code:

Suggested change

      
            This feature also extends `ResourceClaim` constraint behavior to work naturally
          
            This feature also extends ResourceClaim constraint behavior to work naturally

troychiu reviewed

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md

+              It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
+              more easily when an attribute changes between scalar and list representations.
+              **Device Allocation Ordering through Lexicographical Ordering**

troychiu Apr 1, 2026

The section title feels a bit repetitive. Would something like 'Lexicographical Device Allocation' or 'Lexicographical Ordering for Device Allocation' be better?

The content looks good to me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

lmktfy lmktfy left review comments

+9 more reviewers

everpeace everpeace left review comments

SergeyKanzhelev SergeyKanzhelev left review comments

harche harche left review comments

bart0sh bart0sh left review comments

nojnhuh nojnhuh left review comments

yliaog yliaog left review comments

ttsuuubasa ttsuuubasa left review comments

nmn3m nmn3m left review comments

troychiu troychiu left review comments

Labels

area/blog cncf-cla: yes language/en sig/docs size/L wg/device-management