diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index 3166a86c7816b..5ca5a87bfec9a 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -1042,6 +1042,109 @@ profiles: bindingTimeout: 60s ``` +### Node allocatable resources {#node-allocatable-resources} + +{{< feature-state feature_gate_name="DRANodeAllocatableResources" >}} + +Devices managed by DRA can have an underlying footprint composed of node +allocatable resources, such as `cpu`, `memory`, `hugepages` or `ephemeral-storage`. +This feature integrates these DRA based requests into the scheduler's standard +accounting alongside regular Pod `spec` requests for these resources. + +DRA drivers declare this node allocatable resource footprint using the +`nodeAllocatableResourceMappings` field on devices within a `ResourceSlice`. +This mapping translates the requested DRA device or capacity into standard +resources tracked in the Node's `status.allocatable` (note that extended +resources are not included here). This is useful both for drivers that directly +expose native resources (like a CPU or Memory DRA driver) and for devices that +require auxiliary node dependencies (like an accelerator that needs host memory). + +This mapping defines the translation of the requested DRA device or capacity +units to the corresponding quantity of the node-allocatable resource. The +scheduler calculates the exact quantity using: + +* **Device-based scaling:** If `capacityKey` is NOT set, the + `allocationMultiplier` multiplies the device count allocated to the claim. + `allocationMultiplier` defaults to 1 if not specified. +* **Capacity-based scaling:** If `capacityKey` IS set, it references a + capacity name defined in the device's `capacity` map. The scheduler looks + up the amount of that capacity consumed by the claim, and multiplies it by + the `allocationMultiplier`. + +#### Example: CPU DRA Driver (Capacity-based scaling) + +Here is an example where a CPU DRA driver exposes a CPU socket as a pool of 128 +CPUs using DRA consumable capacity. The `capacityKey` links the consumed +`cpu.example.com/cpu` capacity directly to the node's standard `cpu` +allocatable resource: + +```yaml +apiVersion: resource.k8s.io/v1 +kind: ResourceSlice +metadata: + name: my-node-cpus +spec: + driver: cpu.example.com + nodeName: my-node + pool: + name: socket-cpus + generation: 1 + resourceSliceCount: 1 + devices: + - name: socket0cpus + allowMultipleAllocations: true + capacity: + "cpu.example.com/cpu": "128" + nodeAllocatableResourceMappings: + cpu: + capacityKey: "cpu.example.com/cpu" + # allocationMultiplier defaults to 1 if omitted + - name: socket1cpus + allowMultipleAllocations: true + capacity: + "cpu.example.com/cpu": "128" + nodeAllocatableResourceMappings: + cpu: + capacityKey: "cpu.example.com/cpu" + # allocationMultiplier defaults to 1 if omitted +``` +#### Example: Accelerator with Auxiliary Resources (Device-based scaling) + +Here is an example of a resource slice where an accelerator requires an +additional 8Gi of memory per device instance to function: + +```yaml +apiVersion: resource.k8s.io/v1 +kind: ResourceSlice +metadata: + name: my-node-xpus +spec: + driver: xpu.example.com + nodeName: my-node + pool: + name: xpu-pool + generation: 1 + resourceSliceCount: 1 + devices: + - name: xpu-model-x-001 + attributes: + example.com/model: + string: "model-x" + nodeAllocatableResourceMappings: + memory: + allocationMultiplier: "8Gi" +``` + +After a Pod is successfully bound to the node, the exact quantities of +node allocatable resources allocated via DRA are included in the Pod's +`status.nodeAllocatableResourceClaimStatuses` field. + +Node allocatable resources is an alpha feature and is enabled when the +`DRANodeAllocatableResources` feature gate is enabled in the kube-apiserver, +kube-scheduler, and kubelet. In the Alpha phase, the kubelet does not account +for these resources when determining QoS classes, configuring cgroups, or making +eviction decisions. + ## {{% heading "whatsnext" %}} - [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/DRANodeAllocatableResources.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/DRANodeAllocatableResources.md new file mode 100644 index 0000000000000..6c4c38f1d824a --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/DRANodeAllocatableResources.md @@ -0,0 +1,26 @@ +--- +title: DRANodeAllocatableResources +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.36" +--- +Enables the kube-scheduler to incorporate Node Allocatable resources (such as +CPU, memory, and hugepages) managed by Dynamic Resource Allocation (DRA) into +its standard node resource accounting. + +When enabled, DRA drivers can use the `nodeAllocatableResourceMappings` field on +`ResourceSlice` devices to specify how their devices consume node allocatable +resources. This allows the scheduler to combine these DRA allocations with +standard Pod requests. +It also exposes the `status.nodeAllocatableResourceClaimStatuses` field on the +Pod API to track the resulting resource allocations. + +For more information, see +[Node Allocatable Resources](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#node-allocatable-resources) +in the Dynamic Resource Allocation documentation. \ No newline at end of file