Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,15 @@ The method that you use depends on your requirements, as follows:
separate, similarly-configured devices. Kubernetes generates ResourceClaims
from the specification in the ResourceClaimTemplate. The lifetime of each
generated ResourceClaim is bound to the lifetime of the corresponding Pod.
* [**PodGroup ResourceClaimTemplate**](#workload-resource-claims): you want
{{< glossary_tooltip text="PodGroups" term_id="podgroup" >}} to have
independent access to separate, similarly-configured devices that can be
shared by their Pods. Kubernetes generates one ResourceClaim for the PodGroup
from the specification in the ResourceClaimTemplate. The lifetime of each
generated ResourceClaim is bound to the lifetime of the corresponding
PodGroup. This requires the
[`DRAWorkloadResourceClaims`](/docs/reference/command-line-tools-reference/feature-gates/#DRAWorkloadResourceClaims)
feature to be enabled.

When you define a workload, you can use
{{< glossary_tooltip term_id="cel" text="Common Expression Language (CEL)" >}}
Expand All @@ -178,7 +187,7 @@ references it.

You can reference an auto-generated ResourceClaim in a Pod, but this isn't
recommended because auto-generated ResourceClaims are bound to the lifetime of
the Pod that triggered the generation.
the Pod or PodGroup that triggered the generation.

To learn how to claim resources using one of these methods, see
[Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra/).
Expand Down Expand Up @@ -237,6 +246,128 @@ The decision is made on a per-Pod basis, so if the Pod is a member of a ReplicaS
similar grouping, you cannot rely on all the members of the group having the same subrequest
chosen. Your workload must be able to accommodate this.

#### Workload ResourceClaims {#workload-resource-claims}

{{< feature-state feature_gate_name="DRAWorkloadResourceClaims" >}}

When you organize Pods with the
[Workload API](/docs/concepts/workloads/workload-api/),
you can reserve ResourceClaims for entire
{{< glossary_tooltip text="PodGroups" term_id="podgroup" >}}
instead of individual Pods and generate ResourceClaimTemplates for a
PodGroup instead of a single Pod, allowing the Pods within a PodGroup to share
access to devices allocated to the generated ResourceClaim.

This feature targets two problems:

- The ResourceClaim API's `status.reservedFor` list can only contain 256 items.
Since kube-scheduler only records individual Pods in that list, only 256 Pods
can share a ResourceClaim. By allowing PodGroups to be recorded in
`status.reservedFor`, many more than 256 Pods can share a ResourceClaim.
- Pods can only share a ResourceClaim when its exact name is known. For complex
workloads that replicate _groups_ of Pods, ResourceClaims shared by the Pods
in each group need to be created and deleted explicitly when the set of
groups scales up and down. By generating ResourceClaims for each PodGroup, a
single ResourceClaimTemplate can form the basis for ResourceClaims that are
both replicated automatically and shareable among the Pods in a PodGroup.

The PodGroup API defines a `spec.resourceClaims` field with the same structure
and similar meaning as the `spec.resourceClaims` field in the Pod API:

```yaml
apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
name: training-group
namespace: some-ns
spec:
...
resourceClaims:
- name: pg-claim
resourceClaimName: my-pg-claim
- name: pg-claim-template
resourceClaimTemplateName: my-pg-template
```

Like claims made by Pods, claims for PodGroups defining a `resourceClaimName`
refer to a ResourceClaim by name. Claims defining a `resourceClaimTemplateName`
refer to a ResourceClaimTemplate which replicates into one ResourceClaim for the
entire PodGroup that can be shared amongst its Pods.

When a Pod defines a claim with a `name`, `resourceClaimName`, and
`resourceClaimTemplateName` that all match one of its PodGroup's
`spec.resourceClaims`, then kube-scheduler reserves the ResourceClaim for the
PodGroup instead of the Pod. If the Pod's claim does not match one made by its
PodGroup, then kube-scheduler reserves the ResourceClaim for the Pod. In either
case, reservation is recorded in the ResourceClaim's `status.reservedFor`.
PodGroup reservations persist in the ResourceClaim until the PodGroup is
deleted, even if the group no longer has any Pods.

When a Pod claim matching a PodGroup claim defines a
`resourceClaimTemplateName`, then one ResourceClaim is generated for the
PodGroup. Other Pods in the group defining the same claim will share that
generated ResourceClaim instead of prompting a new ResourceClaim to be generated
for each Pod. Whether or not a `resourceClaimTemplateName` claim matches a
PodGroup claim, the name of the generated ResourceClaim is recorded in the Pod's
`status.resourceClaimStatuses`.

ResourceClaims generated from a ResourceClaimTemplate for a
PodGroup follow the lifecycle of the PodGroup. The ResourceClaim is first
created when both the PodGroup and its ResourceClaimTemplate exist. The
ResourceClaim is deleted after the PodGroup has been deleted and the
ResourceClaim is no longer reserved.

Consider the following example:

```yaml
apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
name: training-group
namespace: some-ns
spec:
...
resourceClaims:
- name: pg-claim
resourceClaimName: my-pg-claim
- name: pg-claim-template
resourceClaimTemplateName: my-pg-template
---
apiVersion: v1
kind: Pod
metadata:
name: training-group-pod-1
namespace: some-ns
spec:
...
schedulingGroup:
podGroupName: training-group
resourceClaims:
- name: pod-claim
resourceClaimName: my-pod-claim
- name: pod-claim-template
resourceClaimTemplateName: my-pod-template
- name: pg-claim
resourceClaimName: my-pg-claim
- name: pg-claim-template
resourceClaimTemplateName: my-pg-template
```

In this example, the `training-group` PodGroup has one Pod named `training-group-pod-1`.
The Pod's `pod-claim` and `pod-claim-template` claims do not match
any claim made by the PodGroup, so those claims are not affected by the
PodGroup: ResourceClaim `my-pod-claim` becomes reserved for the Pod and a
ResourceClaim is generated from ResourceClaimTemplate `my-pod-template` and also
becomes reserved for the Pod. The `pg-claim` and `pg-claim-template` do match
claims made by the PodGroup. ResourceClaim `my-pg-claim` becomes reserved for
the PodGroup and a ResourceClaim is generated from ResourceClaimTemplate
`my-pg-template` and also becomes reserved for the PodGroup.

Associating ResourceClaims with Workload API resources is an *alpha feature* and
only enabled when the `DRAWorkloadResourceClaims`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled in the kube-apiserver, kube-controller-manager, kube-scheduler, and kubelet.

### ResourceSlice {#resourceslice}

Each ResourceSlice represents one or more
Expand Down
33 changes: 33 additions & 0 deletions content/en/docs/concepts/workloads/workload-api/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,39 @@ The `controllerRef` field links the Workload back to the specific high-level obj
such as a [Job](/docs/concepts/workloads/controllers/job/) or a custom CRD. This is useful for observability and tooling.
This data is not used to schedule or manage the Workload.

### Requesting DRA devices for a PodGroup
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t @mm4tt This section will have to be moved within the new docs in #54490. I'm thinking here in the PodGroup doc under "API Structure" in between the "Template reference" and "Status" sections. Is there a better place?


{{< feature-state feature_gate_name="DRAWorkloadResourceClaims" >}}

{{< glossary_tooltip text="Devices" term_id="device" >}} available through
{{< glossary_tooltip text="Dynamic Resource Allocation (DRA)" term_id="dra" >}}
can be requested by a PodGroup through its `spec.resourceClaims` field:

```yaml
apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
name: training-group
namespace: some-ns
spec:
...
resourceClaims:
- name: pg-claim
resourceClaimName: my-pg-claim
- name: pg-claim-template
resourceClaimTemplateName: my-pg-template
```

{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}
associated with PodGroups can be shared by more than 256 Pods.
ResourceClaims can also be generated from
{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}
for each PodGroup, allowing the devices allocated to each generated
ResourceClaim to be shared by the Pods in each PodGroup.

For more details and a more complete example, see the
[DRA documentation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#workload-resource-claims).

## {{% heading "whatsnext" %}}

* See how to [reference a Workload](/docs/concepts/workloads/pods/workload-reference/) in a Pod.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: DRAWorkloadResourceClaims
content_type: feature_gate
_build:
list: never
render: false

stages:
- stage: alpha
defaultValue: false
fromVersion: "1.36"
---

Enables PodGroup resources from the
[Workload API](/docs/concepts/workloads/workload-api/) to make requests for
devices through
[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
that can be shared by their member Pods.
17 changes: 17 additions & 0 deletions content/en/docs/reference/glossary/podgroup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: PodGroup
id: podgroup
full_link: /docs/concepts/workloads/workload-api/#pod-groups
short_description: >
A PodGroup represents a set of Pods with common policy and configuration.

aka:
tags:
- core-object
- workload
---
A PodGroup is a runtime object that represents a group of Pods scheduled
together as a single unit. While the
[Workload API](/docs/concepts/workloads/workload-api/) defines scheduling policy
templates, PodGroups are the runtime counterparts that carry both the policy and
the scheduling status for a specific instance of that group.
13 changes: 8 additions & 5 deletions content/en/docs/reference/glossary/resourceclaimtemplate.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,23 @@ id: resourceclaimtemplate
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaims-templates
short_description: >
Defines a template for Kubernetes to create ResourceClaims. Used to provide
per-Pod access to separate, similar resources.
per-Pod or per-PodGroup access to separate, similar resources.

tags:
- workload
---
Defines a template that Kubernetes uses to create
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}.
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}.
ResourceClaimTemplates are used in
[dynamic resource allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
to provide _per-Pod access to separate, similar resources_.
to provide _per-Pod or per-{{< glossary_tooltip text="PodGroup" term_id="podgroup" >}} access to separate, similar resources_.

<!--more-->

When a ResourceClaimTemplate is referenced in a workload specification,
Kubernetes automatically creates ResourceClaim objects based on the template.
Each ResourceClaim is bound to a specific Pod. When the Pod terminates,
Kubernetes deletes the corresponding ResourceClaim.
Each ResourceClaim is bound to a specific Pod or PodGroup. When the Pod
terminates or the PodGroup is deleted, Kubernetes deletes the corresponding
ResourceClaim. PodGroup ResourceClaimTemplates require the
[`DRAWorkloadResourceClaims`](/docs/reference/command-line-tools-reference/feature-gates/#DRAWorkloadResourceClaims)
feature to be enabled.