Skip to content

Verify by creating hooks in the driver's container as the last line of defense #108

@AutuSnow

Description

@AutuSnow

Problem

When a pod is created with dra.cpu resource claims, there's currently no validation to ensure the pod's CPU requests match the claim allocation. This can lead to mismatched configurations that are only caught at container startup.

Proposal

Add validation in the driver's NodeStageVolume callback to check that:

  1. For non-PLR mode: sum of container CPU requests == sum of claim CPUs
  2. For PLR mode: pod-level CPU requests == sum of claim CPUs
    This serves as the final line of defense when:
  • The admission webhook is not deployed
  • Claims are created asynchronously with pods
  • Other validation layers fail

Key considerations

  • Shared claims: Handle allocation.shared list correctly for pods using sharePolicy: "WhenNotScheduled"
  • PLR compatibility: Check if PodLevelResources feature gate is enabled and adapt validation accordingly
  • Error handling: Return clear error messages when validation fails

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions