Skip to content

Add Webadmission Controller#61

Open
catblade wants to merge 13 commits intokubernetes-sigs:mainfrom
catblade:marlow/webadmission-validation-rebuild
Open

Add Webadmission Controller#61
catblade wants to merge 13 commits intokubernetes-sigs:mainfrom
catblade:marlow/webadmission-validation-rebuild

Conversation

@catblade
Copy link
Copy Markdown
Contributor

Add in a webadmission controller to make sure that CPUs counts are in line together.
Change makefile and expand to make easy e2e testing.
Add in easy install targets for individual/grouped modes in Makefile.
Some changes to clean up NRI from bugs discovered with testing.

running, including some code fixes.

- NRI parses CDI claim UIDs when DRA env is absent.
- update test-e2e-all to force rebuild and use explicit reserved CPU wiring
- resolve claim UIDs from CDI devices in NRI sync/create paths with stronger ownership checks
- harden e2e failure detection and accept admission rejection as a valid negative claim-sharing outcome
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 10, 2026
@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented Feb 10, 2026

/assign @ffromani @pravk03

Copy link
Copy Markdown
Contributor

@pravk03 pravk03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some initial comments; I will continue reviewing

}

var errs []string
for _, container := range pod.Spec.Containers {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if we should just enforce the restriction a pod level instead of container level?

pod level requests >= sum of container requests (non-init containers) + claims. ,

Pro's:

Cons:

  • This would require that CPU requests are explicitly defined at the pod level

WDYT ? @catblade @ffromani . Would this work for slurm use-cases ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally in favor considering this is the latest trend in the community, but this would make us dependant on another FG, because IIRC pod-level-resources (PLR) is not yet beta.

I don't think this is necessarily a problem, just stating the fact.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we put an issue to update when the other FG goes in, and for now we do container-level?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We can figure out the Slurm use cases, regardless of the structure-we'll just map)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLR is already in beta (since 1.34)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to pod-level validation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the recent changes still sum the container requests and not Pod-Level Resources (pod.Spec.Resources) ?


// Prefer allocated device info when available.
if total, err := h.claimCPUCountFromSlices(claim); err != nil || total > 0 {
return total, err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just reject when a pod references a claim that is already allocated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we don't support this scenario. But I'd still keep the validation also in the driver because it's the last line of defense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to validate in the admission controller that the claim is not already allocated.

@ffromani
Copy link
Copy Markdown
Contributor

I'll review ASAP, thanks for posting this.

if !ok {
return 0, false
}
value, ok := cpuQuantity.AsInt64()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fractional requests are rounded off here. Should we use MilliValue() ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere we have logic to ensure the request is integral (correct thing to do!)
I think we should at least use the same code for rounding logic everywhere or do similarly the scheduler does and carry along auxiliary data with rounded, validated data and make logic on that.

Either works for me at this stage

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally forced integral. Thoughts?

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting this! it is surely a good contribution. Validation helps and this can be further extended later on with another PR to make it mutating.

The biggest open question here is if we can assume that resource claims are available when a pod is being validated. I'm not sure this is the case. And if so, the webhooks would need to deal with resourceclaims possibly being created later on.

@@ -0,0 +1,461 @@
/*
Copyright 2025 The Kubernetes Authors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

designwise: should this be a separate executable or should this be merged with the main plugin? If separate executable, I agree it should ship in the main (and only) container image.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer to keep things as modular as possible. It is potential for a user to want to do something specific with their admission controller, beyond (or less than) the capabilities this one provides.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is that if we decompose at package level and if we push all the logic in the packages, both of which are desirable anyway, then the main.go code, or equivalent, just becomes trivial orchestration and wiring, which perhaps some easy command line flag processing.

If that holds, then there is an argument to merge into the main executable (packages are shared anyway) for easier shipping and to actually reduce the trivial wiring.

That said, I'm fine keeping separate entry points and executable, we trend towards pushing logic in the packages anyway. If needed, we can always revisit the decision in the future.

Comment on lines -178 to -182
test-e2e: ## run e2e test against an existing configured cluster
env DRACPU_E2E_TEST_IMAGE=$(IMAGE_TEST) DRACPU_E2E_RESERVED_CPUS=$(DRACPU_E2E_RESERVED_CPUS) go test -v ./test/e2e/ --ginkgo.v

test-e2e-kind: ci-kind-setup test-e2e ## run e2e test against a purpose-built kind cluster

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here was not to assume that e2e runs against the CI/development kind cluster; our e2e tests should run just fine against a real cluster. This is to reduce the chance of hidden/implicit assumptions sneak into our own e2e tests and also to enable to use the e2e tests as validation tool for real deployments, even though not production deployments likely (but we're quite far from production ready anyway... :\ )
If possible, we should keep this distinction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last item-I'm going to think on the best way to do this. The rest should be addressed. I think Praveen had an outstanding thing he told me in slack, which hopefully I can handle early next week.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani I replaced all the e2e tests so that they will run on a cluster already up in the event that one is already up, BUT in the event that there is no cluster, starts one, runs the tests, and then shuts down the cluster. Is there a different flow you would like? I had to verify that I had this fully running with a different cluster.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: catblade
Once this PR has been reviewed and has the lgtm label, please ask for approval from ffromani. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@catblade catblade force-pushed the marlow/webadmission-validation-rebuild branch from 2ca2c64 to 0b97d3a Compare February 17, 2026 21:45
- Use same identifiers across the project
- Update message to use DRA Prioritized lists is not supported
- Move health check until after webhook is up
- Skip entries with ResourceClaimName == nil up front
- Add comments on error conditions
- Move pod validation into admission pkg
- Fix up various small items found by running tests
  - PinToNode handles casess where pinned pods may want to be scheduled on tainted nodes
  - Added control plane toleration for install-admission
- In cpu_assigmnet_test.go, pin exclusive-claim pods to the target node so they all run on the same node
@catblade
Copy link
Copy Markdown
Contributor Author

Still owe some logic changes. Hopefully will finish those tomorrow. Feel free to review structure changes. Also have the outstanding question whether to make one big app or have admission be separate.

@ffromani @pravk03

- Move to pod rather than container totals
- Simplify some logic to better map to other parts of the project
- requests have to match cpu requests
- Round up for fractional requests (this maybe we should change)
@ffromani
Copy link
Copy Markdown
Contributor

Still owe some logic changes. Hopefully will finish those tomorrow. Feel free to review structure changes. Also have the outstanding question whether to make one big app or have admission be separate.

@ffromani @pravk03

We can keep the admission a separate executable (therefore with a separate entry point).

However, we should strive to push all the logic in packages, so the entrypoint (the code holding the main function) becomes very simple code orchestrating all the packages. I think this take is mostly uncontroversial, but I'm stating only for completeness.

I'll have a review pass as soon as you are happy with the upload @catblade

@catblade
Copy link
Copy Markdown
Contributor Author

@ffromani I moved functionality into the packages. Please let me know if you want any more changes.

@catblade
Copy link
Copy Markdown
Contributor Author

@ffromani @pravk03 Just in case I wasn't clear, this is ready for re-review.

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. The biggest factor here is the claim wait logic. It's remarkably better than before, but not sure yet. Need to check with DRA folks to doublecheck.

Please extract from this PR the many QoL changes you made and extract them in one or better more targeted PRs. I think we can merge them quicker.

Misc comments about code org, the biggest/only blocker is how we handle claims, once we figure out I think we can quickly converge.

healthzPath string
claimGetRetryWait time.Duration
claimGetRetryTotal time.Duration
healthzStatus atomicBool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we don't need this, or do we? we now set asynchronously, but just once and just to cover a small (hopefully) window at startup. I'd rearrange the code not to need it and just hardcode true in the healthz page, we don't seem to need anything more complex than this anytime soon.

Or, from the other angle: what's the reason to keep it? If we HAD to keep it, can we at least use a stdlib type atomic.Bool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The healthzStatus flag is still needed as /healthz returns 503 until the server is listening. Store is called after the server is up, so readiness probes should fail until the webhook is actually ready. Will change code to atomic.Bool.

Comment on lines +153 to +158
type admissionHandler struct {
driverName string
clientset kubernetes.Interface
claimGetRetryWait time.Duration
claimGetRetryTotal time.Duration
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this type and all its methods and helpers in pkg/admission (or another package) and do only setup and wiring here in main.

Comment on lines +312 to +314
// Fetch the ResourceClaim and sum CPU requests for this driver.
// Claims can be created asynchronously (for example from Pod claim templates),
// so retry briefly on NotFound before treating the claim as not yet available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better, but not sure it's robust enough, literally not sure. We should check in with other DRA developers. I'll ask in the slack channel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more robustness... not sure what else we can add.

Comment on lines +98 to +99
claimGetRetryWait = durationFromEnv("DRACPU_ADMISSION_CLAIM_GET_RETRY_WAIT", claimGetRetryWait)
claimGetRetryTotal = durationFromEnv("DRACPU_ADMISSION_CLAIM_GET_RETRY_TOTAL", claimGetRetryTotal)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do these and only these have an environment variable to override?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing for simplicity. I was doing some testing and missed cleaning them up.


// TestAdmissionHandler_ValidatePodClaimsWiring ensures the handler implements
// ClaimCPUCountGetter and that admission.ValidatePodClaims works when called with it.
func TestAdmissionHandler_ValidatePodClaimsWiring(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, all tests should be pushed alongside the code in pkg/admission

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

@@ -0,0 +1,26 @@
apiVersion: resource.k8s.io/v1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please embed using the embed package or move in testdata

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

@@ -0,0 +1,13 @@
apiVersion: v1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

- Move code out of the cmd package into the pkg directory
- Use atomic.Bool instead of writing our own
- Move files into the testdata directory
- Harden claimCPUCount to handle other potential issues
- Remove unneeded environmental overrides
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented Mar 9, 2026

I'll have another look shortly. I'm inclined to move forward and to iterate in future PRs.

Comment on lines +74 to +96
func TestValidatePodClaims_CPURequestMatchesClaimCount(t *testing.T) {
getter := fakeClaimCPUCountGetter{"default/claim-4": 4}
pod := podWithClaims("default", "pod-ok", "claim-ref", "claim-4")
pod.Spec.Containers[0].Resources.Requests = corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("4"),
}

errs := ValidatePodClaims(context.Background(), pod, DefaultDriverName, getter)
if len(errs) != 0 {
t.Fatalf("expected no errors, got %v", errs)
}
}

// With pod-level validation, a pod with dra.cpu claims must have total CPU request equal to claim total.
func TestValidatePodClaims_NoCPURequestWithClaimRejected(t *testing.T) {
getter := fakeClaimCPUCountGetter{"default/claim-2": 2}
pod := podWithClaims("default", "pod-claim-only", "claim-ref", "claim-2")

errs := ValidatePodClaims(context.Background(), pod, DefaultDriverName, getter)
if len(errs) == 0 {
t.Fatal("expected error (pod CPU requests 0 < claim total 2), got none")
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we convert this and perhaps the other new test files to table driven tests ?

Comment on lines +93 to +104
func CPURequestCount(cpuQuantity resource.Quantity) int64 {
millis := cpuQuantity.MilliValue()
if millis <= 0 {
return 0
}
value := cpuQuantity.Value()
if value*1000 == millis {
return value
}
// Round up to whole cores: 400m -> 1, 500m -> 1, 1500m -> 2
return (millis + 999) / 1000
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be rounding off standard requests. If a pod spec requests 800m and the claim is requesting 1 CPU, we currently consider this as valid, which is incorrect.

// ValidatePodClaims enforces at pod level: when a pod has dra.cpu claims, the sum of
// non-init container CPU requests must equal the sum of CPUs from those claims.
// It returns a list of errors
func ValidatePodClaims(ctx context.Context, pod *corev1.Pod, driverName string, getter ClaimCPUCountGetter) []string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose an alternate logic for this function:

Case 1: pod.Spec.Resources not populated

  • For individual containers with claims, standard request == claim request
  • If multiple containers reference the same claim, sum of standard requests from containers referencing same claim == claim request

Case 2: pod.Spec.Resources is specified

  • Containers with claims should not specify standard CPU requests.
  • pod.Spec.Resources >= sum of claim requests from pod.Spec.ResourceClaims + sum of standard requests from non-init containers

Once we introduce a mutating webhook, we could always set pod.Spec.Resources which would make migration to KEP-5517 seamless.

cc: @ffromani

if count < 1 {
count = 1
}
if req.Capacity != nil && len(req.Capacity.Requests) > 0 {
Copy link
Copy Markdown
Contributor

@pravk03 pravk03 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized there is an edge case withcpu-device-mode=grouped. In grouped mode, a count of 1 without req.Capacity doesn't mean 1 CPU - —it means 1 device (like an entire NUMA node or socket depending on --cpu-device-group-by) is allocated to the claim.

To address this, we could pass the --cpu-device-mode flag value to the webhook

  • If mode == individual, we can keep the logic as-is and just read the count.
  • If mode == grouped, we require req.Capacity to also be set. We could temporarily throw a validation error if req.Capacity is nil with a TODO to handle this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants