Document DRA Device Binding Conditions in v1.36#54541
Document DRA Device Binding Conditions in v1.36#54541ttsuuubasa wants to merge 2 commits intokubernetes:dev-1.36from
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @ttsuuubasa 👋 v1.36 Communications team here, @mariafromano-25 as author of #54709, I'd like you to be a writing buddy for @ttsuuubasa on this PR. Please:
|
|
Hello @ttsuuubasa 👋, v1.36 Docs Team here again! Please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before Tuesday 31st March 2026. Please let us know once your PR is fully Thank you! |
|
/wg device-management |
cf3fb15 to
4400fe0
Compare
|
@pohly |
lmktfy
left a comment
There was a problem hiding this comment.
At beta, and especially for features that are enabled by default, we ask for docs that are GA quality.
The binding conditions explanation seems, to me, that it mostly belongs in a page that driver authors would read (we don't yet have that page - we should aim to have one).
Please look at the following feedback in that light.
| This ensures that non-admin users cannot misuse the feature. | ||
| Starting with Kubernetes v1.34, this label has been updated to `resource.kubernetes.io/admin-access: "true"`. | ||
|
|
||
| ### Device Binding Conditions {#device-binding-conditions} |
There was a problem hiding this comment.
| ### Device Binding Conditions {#device-binding-conditions} | |
| ### Device binding conditions |
There was a problem hiding this comment.
nit: when we document (sub)features for DRA, we should place them where they would belong if they were stable.
If we do that, then when features graduate, the docs remain easy to find and use.
There was a problem hiding this comment.
If we were to pursue this, I feel that the current sections such as “DRA beta features” and “DRA alpha features” would no longer be appropriate, and that we would need to reconsider the overall structure of this chapter.
|
|
||
| {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}} | ||
|
|
||
| Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until |
There was a problem hiding this comment.
| Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until | |
| As the author of a DRA driver, you can use | |
| _device binding conditions_ to defer Pod binding | |
| until |
There was a problem hiding this comment.
This sentence is now rewritten for DRA driver developers, and I’d like to discuss whether we should proceed this way.
| This improves scheduling reliability by avoiding premature binding and enables coordination | ||
| with external device controllers. | ||
|
|
||
| To use this feature, device drivers (typically managed by driver owners) must publish the |
There was a problem hiding this comment.
| To use this feature, device drivers (typically managed by driver owners) must publish the | |
| To use this ability to delay binding, the DRA driver that | |
| you are writing needs to publish all of the |
There was a problem hiding this comment.
In this sentence, “you” refers to DRA driver developers, which means this is also written with driver authors in mind.
| with external device controllers. | ||
|
|
||
| To use this feature, device drivers (typically managed by driver owners) must publish the | ||
| following fields in the `Device` section of a `ResourceSlice`. Cluster administrators |
There was a problem hiding this comment.
| following fields in the `Device` section of a `ResourceSlice`. Cluster administrators | |
| following fields in the `device` section of a ResourceSlice. Because this is relies on a beta feature, you should also clearly document that cluster administrators |
There was a problem hiding this comment.
In this sentence, “you” refers to DRA driver developers, which means this is also written with driver authors in mind.
| inside the ResourceClaim, which external controllers can use to perform node-specific | ||
| operations such as device attachment or preparation. | ||
|
|
||
| All condition types listed in bindingConditions and bindingFailureConditions are evaluated |
There was a problem hiding this comment.
| All condition types listed in bindingConditions and bindingFailureConditions are evaluated | |
| The control plane discovers all the binding conditions (from `bindingConditions` and `bindingFailureConditions`) and evaluates those against the list of observed conditions, taken |
There was a problem hiding this comment.
Are you assuming that binding conditions in the ResourceSlice are compared against and evaluated together with the binding conditions in the ResourceClaim? In practice, I believe an external controller would evaluate only the binding conditions in the ResourceClaim.
In addition, based on our experience, the controller that sets binding conditions is not necessarily limited to the control plane. There are also designs where such controllers are distributed and run on each node. For this reason, rather than explicitly referring to the control plane, I thought it might be better to use a more general term such as an external controller.
| The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`. | ||
| If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler | ||
| clears the allocation and reschedules the Pod. | ||
| This timeout duration is configurable by the user through `KubeSchedulerConfiguration`. |
There was a problem hiding this comment.
(nit)
| This timeout duration is configurable by the user through `KubeSchedulerConfiguration`. | |
| A cluster administration can configure this timeout duration by editing the kube-scheduler configuration file. | |
| #### Example {#device-binding-conditions-example} | |
| Here is an example of a ResourceSlice that you might see in a cluster where there's a DRA driver in use, and that driver supports binding conditions: |
(if making this change, check if you need other headings as well so that the new content makes sense)
| apiVersion: resource.k8s.io/v1 | ||
| kind: ResourceSlice | ||
| metadata: | ||
| name: gpu-slice |
There was a problem hiding this comment.
(nit)
| name: gpu-slice | |
| name: gpu-slice-1 |
| - External controllers can use the node selector in the ResourceClaim to perform | ||
| node-specific setup on the selected node. | ||
|
|
||
| An example of configuring this timeout in `KubeSchedulerConfiguration` is given below: |
There was a problem hiding this comment.
Consider moving this just after the place where we mention that cluster administrators can configure this.
| All condition types listed in bindingConditions and bindingFailureConditions are evaluated | ||
| from the `status.conditions` field of the ResourceClaim. | ||
| External controllers are responsible for updating these conditions using standard Kubernetes | ||
| condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`). |
There was a problem hiding this comment.
| condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`). | |
| condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`). | |
| If you are the driver author, you may prefer to | |
| provide your own controller, that is custom to the | |
| hardware or other dynamic resource that the driver works with. |
There was a problem hiding this comment.
Driver-author–focused text has also been added here, and I would like to discuss whether this is necessary.
|
@lmktfy
I agree that we should have documentation targeted at DRA driver developers. I made a similar suggestion before, and at that time there was a proposal that updating the DRA example driver could be useful. My understanding is that this would mean implementing the DRA example driver so that it publishes BindingConditions, and then letting developers try it out. In that case, the usage and guidance for developers would be explained in places like the README. I agree with most of your suggestions, but regarding the text aimed at DRA driver developers, I would like to respond with comments and discuss it further. |
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
4400fe0 to
050e88b
Compare
|
@pohly @lmktfy |
|
Hello @ttsuuubasa 👋, v1.36 Docs Team here again! Just checking in as we approach Docs Freeze on Wednesday 8th April 2026 (AoE) / Thursday 9th April 2026, 12:00 UTC. This documentation appears to still be under review. To meet the Docs Freeze, this PR must have a technical review as well as Thank you! |
Description
k/k development PR: kubernetes/kubernetes#137795
Summary
Promotes Device Binding Conditions from alpha to beta status in Kubernetes v1.36.
Changes Made
Documentation Structure Update (dynamic-resource-allocation.md)
Feature Gate Lifecycle Update (DRADeviceBindingConditions.md)
Technical Context
Device Binding Conditions enable the Kubernetes scheduler to delay Pod binding until external resources (such as fabric-attached GPUs or reprogrammable FPGAs) are confirmed ready. This feature:
Impact
Issue
k/enhancement issue: kubernetes/enhancements#5007
Closes: #