Problem
When a pod is created with dra.cpu resource claims, there's currently no validation to ensure the pod's CPU requests match the claim allocation. This can lead to mismatched configurations that are only caught at container startup.
Proposal
Add validation in the driver's NodeStageVolume callback to check that:
- For non-PLR mode: sum of container CPU requests == sum of claim CPUs
- For PLR mode: pod-level CPU requests == sum of claim CPUs
This serves as the final line of defense when:
- The admission webhook is not deployed
- Claims are created asynchronously with pods
- Other validation layers fail
Key considerations
- Shared claims: Handle
allocation.shared list correctly for pods using sharePolicy: "WhenNotScheduled"
- PLR compatibility: Check if
PodLevelResources feature gate is enabled and adapt validation accordingly
- Error handling: Return clear error messages when validation fails
Related
Problem
When a pod is created with
dra.cpuresource claims, there's currently no validation to ensure the pod's CPU requests match the claim allocation. This can lead to mismatched configurations that are only caught at container startup.Proposal
Add validation in the driver's
NodeStageVolumecallback to check that:This serves as the final line of defense when:
Key considerations
allocation.sharedlist correctly for pods using sharePolicy: "WhenNotScheduled"PodLevelResourcesfeature gate is enabled and adapt validation accordinglyRelated