Skip to content

Update claim deletion to use foreground propagation#367

Open
vkryachko wants to merge 2 commits intokubernetes-sigs:mainfrom
vkryachko:vk.claim-foreground-delete
Open

Update claim deletion to use foreground propagation#367
vkryachko wants to merge 2 commits intokubernetes-sigs:mainfrom
vkryachko:vk.claim-foreground-delete

Conversation

@vkryachko
Copy link

Use foreground propagation for claim deletion to ensure it remains in etcd until the sandbox and pod are stopped.

Use foreground propagation for claim deletion to ensure it remains in etcd until the sandbox and pod are stopped.
@netlify
Copy link

netlify bot commented Mar 5, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 6a3520e
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69a99652a29471000837cc51

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vkryachko
Once this PR has been reviewed and has the lgtm label, please assign barney-s for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 5, 2026
@k8s-ci-robot
Copy link
Contributor

Welcome @vkryachko!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @vkryachko. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 5, 2026
@vicentefb
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2026
Copy link
Member

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in the PR looks accurate from k8s pov, but I want to call out a few architectural side effects:

  • terminationGracePeriodSeconds: Foreground deletion still relies on the kubelet's grace period. If your snapshot upload (or any other cleanup logic you have) takes longer than the pod's configured grace period, the kubelet will issue a SIGKILL, the pod will drop, and the claim will disappear from etcd—even if the upload didn't finish. You'll need to make sure your pod templates have a long enough grace period configured.

  • If a node gets partitioned or a volume fails to unmount, the pod will hang in Terminating. Because of foreground propagation, the SandboxClaim will also hang in Terminating indefinitely.

  • CLI UX change: For general users, running kubectl delete sandboxclaim <name> will now block and hang in the terminal until the pod fully terminates, rather than returning instantly. This is expected, but worth noting.

@vkryachko
Copy link
Author

CLI UX change: For general users, running kubectl delete sandboxclaim will now block and hang in the terminal until the pod fully terminates, rather than returning instantly. This is expected, but worth noting.

@vicentefb can you explain why this would change? iiuc foreground deletion will only happen when kubectl delete sandboxclaim <name> --cascade=foreground is called, as my change only affects claim expiry case when the deletion is triggered by the controller rather than kubectl?

// Important: We use foreground propagation to ensure the SandboxClaim remains in etcd until its underlying Sandbox and Pod are fully deleted.
// Because the Pod may have important cleanup logic, this allows external systems to query the Claim to determine if the sandbox is still
// shutting down (Claim exists with a deletion timestamp) or has completely stopped (Claim is 404 Not Found).
if err := r.Delete(ctx, claim, client.PropagationPolicy(metav1.DeletePropagationForeground)); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! However, I think this is better handled on the caller side. If the external system needs to observe shutdown, it has two options already:

  1. Use ShutdownPolicy=Retain — the claim stays around with an expired condition. The external system deletes it when ready.
  2. Delete with --cascade=foreground itself — the external system chooses to wait for full cleanup at the point of deletion.

Both keep the propagation policy as a caller-side decision, which is the standard k8s pattern(e.g. Deployment, Job,..etc). Hardcoding it in the controller's expiry path means the controller makes that choice on behalf of all users, and only applies to expiry — a user-initiated delete would still behave differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants