Skip to content

Document DRA Device Binding Conditions in v1.36#54541

Open
ttsuuubasa wants to merge 2 commits intokubernetes:dev-1.36from
ttsuuubasa:dev-1.36-dra-device-binding-conditions
Open

Document DRA Device Binding Conditions in v1.36#54541
ttsuuubasa wants to merge 2 commits intokubernetes:dev-1.36from
ttsuuubasa:dev-1.36-dra-device-binding-conditions

Conversation

@ttsuuubasa
Copy link
Copy Markdown
Contributor

@ttsuuubasa ttsuuubasa commented Feb 19, 2026

Description

k/k development PR: kubernetes/kubernetes#137795

Summary

Promotes Device Binding Conditions from alpha to beta status in Kubernetes v1.36.

Changes Made

  1. Documentation Structure Update (dynamic-resource-allocation.md)

    • Moved Device Binding Conditions section from "DRA alpha features" to "DRA beta features"
  2. Feature Gate Lifecycle Update (DRADeviceBindingConditions.md)

    • Updated feature gate stages:
      • Alpha: v1.34 - v1.35 (default: false)
      • Beta: v1.36+ (default: true)

Technical Context

Device Binding Conditions enable the Kubernetes scheduler to delay Pod binding until external resources (such as fabric-attached GPUs or reprogrammable FPGAs) are confirmed ready. This feature:

  • Improves scheduling reliability by avoiding premature binding
  • Enables coordination with external device controllers
  • Implements waiting behavior in the PreBind phase of the scheduling framework
  • Supports configurable timeout (default: 600 seconds)

Impact

  • Users on v1.36+: Device Binding Conditions will be enabled by default
  • Feature stability: Reflects increased production readiness and API stability
  • Documentation accuracy: Ensures docs correctly categorize the feature's maturity level

Issue

k/enhancement issue: kubernetes/enhancements#5007

Closes: #

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2026
@k8s-ci-robot k8s-ci-robot added this to the 1.36 milestone Feb 19, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tengqm for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the language/en Issues or PRs related to English language label Feb 19, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 19, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Feb 19, 2026

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 050e88b
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-io-main-staging/deploys/69cb7ac105ad6900082e3fc8
😎 Deploy Preview https://deploy-preview-54541--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@chadmcrowell
Copy link
Copy Markdown
Contributor

Hi @ttsuuubasa 👋 v1.36 Communications team here,

@mariafromano-25 as author of #54709, I'd like you to be a writing buddy for @ttsuuubasa on this PR.

Please:

  • Review this PR, paying attention to the guidelines and review hints
  • Update your own PR based on any best practices you identify that should be applied
  • Remember to be compassionate with your fellow article author

@kernel-kun
Copy link
Copy Markdown
Contributor

Hello @ttsuuubasa 👋, v1.36 Docs Team here again!

Please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before Tuesday 31st March 2026.

Please let us know once your PR is fully Ready for Review -- meaning all documentation updates are complete and it's awaiting reviewer feedback -- so we can update our tracking.

Thank you!

@ttsuuubasa
Copy link
Copy Markdown
Contributor Author

/wg device-management

@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Mar 24, 2026
@ttsuuubasa ttsuuubasa force-pushed the dev-1.36-dra-device-binding-conditions branch from cf3fb15 to 4400fe0 Compare March 24, 2026 08:39
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 24, 2026
@ttsuuubasa ttsuuubasa changed the title Placeholder PR for KEP-5007: DRA Device Binding Conditions in v1.36 Document DRA Device Binding Conditions in v1.36 Mar 24, 2026
@ttsuuubasa ttsuuubasa marked this pull request as ready for review March 24, 2026 09:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2026
@k8s-ci-robot k8s-ci-robot requested a review from lmktfy March 24, 2026 09:05
@ttsuuubasa
Copy link
Copy Markdown
Contributor Author

@pohly
I’ve pushed a documentation update as part of the beta promotion of Device Binding Conditions, and I’d appreciate your review. I’d like to start with a technical review.
The change simply moves the Device Binding Conditions content from the alpha section to the beta section.
Please let me know if there are any other changes needed or additional information that should be added.

Copy link
Copy Markdown
Member

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At beta, and especially for features that are enabled by default, we ask for docs that are GA quality.

The binding conditions explanation seems, to me, that it mostly belongs in a page that driver authors would read (we don't yet have that page - we should aim to have one).

Please look at the following feedback in that light.

This ensures that non-admin users cannot misuse the feature.
Starting with Kubernetes v1.34, this label has been updated to `resource.kubernetes.io/admin-access: "true"`.

### Device Binding Conditions {#device-binding-conditions}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Device Binding Conditions {#device-binding-conditions}
### Device binding conditions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when we document (sub)features for DRA, we should place them where they would belong if they were stable.

If we do that, then when features graduate, the docs remain easy to find and use.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to pursue this, I feel that the current sections such as “DRA beta features” and “DRA alpha features” would no longer be appropriate, and that we would need to reconsider the overall structure of this chapter.


{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}

Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
As the author of a DRA driver, you can use
_device binding conditions_ to defer Pod binding
until

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is now rewritten for DRA driver developers, and I’d like to discuss whether we should proceed this way.

This improves scheduling reliability by avoiding premature binding and enables coordination
with external device controllers.

To use this feature, device drivers (typically managed by driver owners) must publish the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To use this feature, device drivers (typically managed by driver owners) must publish the
To use this ability to delay binding, the DRA driver that
you are writing needs to publish all of the

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this sentence, “you” refers to DRA driver developers, which means this is also written with driver authors in mind.

with external device controllers.

To use this feature, device drivers (typically managed by driver owners) must publish the
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
Copy link
Copy Markdown
Member

@lmktfy lmktfy Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
following fields in the `device` section of a ResourceSlice. Because this is relies on a beta feature, you should also clearly document that cluster administrators

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this sentence, “you” refers to DRA driver developers, which means this is also written with driver authors in mind.

inside the ResourceClaim, which external controllers can use to perform node-specific
operations such as device attachment or preparation.

All condition types listed in bindingConditions and bindingFailureConditions are evaluated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All condition types listed in bindingConditions and bindingFailureConditions are evaluated
The control plane discovers all the binding conditions (from `bindingConditions` and `bindingFailureConditions`) and evaluates those against the list of observed conditions, taken

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming that binding conditions in the ResourceSlice are compared against and evaluated together with the binding conditions in the ResourceClaim? In practice, I believe an external controller would evaluate only the binding conditions in the ResourceClaim.

In addition, based on our experience, the controller that sets binding conditions is not necessarily limited to the control plane. There are also designs where such controllers are distributed and run on each node. For this reason, rather than explicitly referring to the control plane, I thought it might be better to use a more general term such as an external controller.

The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
clears the allocation and reschedules the Pod.
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
A cluster administration can configure this timeout duration by editing the kube-scheduler configuration file.
#### Example {#device-binding-conditions-example}
Here is an example of a ResourceSlice that you might see in a cluster where there's a DRA driver in use, and that driver supports binding conditions:

(if making this change, check if you need other headings as well so that the new content makes sense)

apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
name: gpu-slice
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
name: gpu-slice
name: gpu-slice-1

- External controllers can use the node selector in the ResourceClaim to perform
node-specific setup on the selected node.

An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this just after the place where we mention that cluster administrators can configure this.

All condition types listed in bindingConditions and bindingFailureConditions are evaluated
from the `status.conditions` field of the ResourceClaim.
External controllers are responsible for updating these conditions using standard Kubernetes
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
If you are the driver author, you may prefer to
provide your own controller, that is custom to the
hardware or other dynamic resource that the driver works with.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver-author–focused text has also been added here, and I would like to discuss whether this is necessary.

@ttsuuubasa
Copy link
Copy Markdown
Contributor Author

@lmktfy
Thank you for the prompt review comments.

The binding conditions explanation seems, to me, that it mostly belongs in a page that driver authors would read (we don't yet have that page - we should aim to have one).

I agree that we should have documentation targeted at DRA driver developers.
However, I would like to discuss whether developer‑focused content should be included in this document.

I made a similar suggestion before, and at that time there was a proposal that updating the DRA example driver could be useful. My understanding is that this would mean implementing the DRA example driver so that it publishes BindingConditions, and then letting developers try it out. In that case, the usage and guidance for developers would be explained in places like the README.
kubernetes/enhancements#5007 (comment)

I agree with most of your suggestions, but regarding the text aimed at DRA driver developers, I would like to respond with comments and discuss it further.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2026
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
@ttsuuubasa ttsuuubasa force-pushed the dev-1.36-dra-device-binding-conditions branch from 4400fe0 to 050e88b Compare March 31, 2026 07:41
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026
@ttsuuubasa
Copy link
Copy Markdown
Contributor Author

ttsuuubasa commented Mar 31, 2026

@pohly @lmktfy
Sorry to reach out right after KubeCon, but I’d appreciate your review of my comments above and the document content. I’ve already addressed several of the points that were raised.

@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Apr 1, 2026
@kernel-kun
Copy link
Copy Markdown
Contributor

Hello @ttsuuubasa 👋, v1.36 Docs Team here again!

Just checking in as we approach Docs Freeze on Wednesday 8th April 2026 (AoE) / Thursday 9th April 2026, 12:00 UTC.

This documentation appears to still be under review. To meet the Docs Freeze, this PR must have a technical review as well as lgtm and approve labels applied, without any unaddressed comments or concerns from SIG Docs.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

6 participants