Update claim deletion to use foreground propagation#367
Update claim deletion to use foreground propagation#367vkryachko wants to merge 2 commits intokubernetes-sigs:mainfrom
Conversation
Use foreground propagation for claim deletion to ensure it remains in etcd until the sandbox and pod are stopped.
✅ Deploy Preview for agent-sandbox canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vkryachko The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @vkryachko! |
|
Hi @vkryachko. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
There was a problem hiding this comment.
The approach in the PR looks accurate from k8s pov, but I want to call out a few architectural side effects:
-
terminationGracePeriodSeconds:Foregrounddeletion still relies on the kubelet's grace period. If your snapshot upload (or any other cleanup logic you have) takes longer than the pod's configured grace period, the kubelet will issue aSIGKILL, the pod will drop, and the claim will disappear from etcd—even if the upload didn't finish. You'll need to make sure your pod templates have a long enough grace period configured. -
If a node gets partitioned or a volume fails to unmount, the pod will hang in
Terminating. Because of foreground propagation, theSandboxClaimwill also hang inTerminatingindefinitely. -
CLI UX change: For general users, running
kubectl delete sandboxclaim <name>will now block and hang in the terminal until the pod fully terminates, rather than returning instantly. This is expected, but worth noting.
@vicentefb can you explain why this would change? iiuc foreground deletion will only happen when |
| // Important: We use foreground propagation to ensure the SandboxClaim remains in etcd until its underlying Sandbox and Pod are fully deleted. | ||
| // Because the Pod may have important cleanup logic, this allows external systems to query the Claim to determine if the sandbox is still | ||
| // shutting down (Claim exists with a deletion timestamp) or has completely stopped (Claim is 404 Not Found). | ||
| if err := r.Delete(ctx, claim, client.PropagationPolicy(metav1.DeletePropagationForeground)); err != nil { |
There was a problem hiding this comment.
Thanks for the PR! However, I think this is better handled on the caller side. If the external system needs to observe shutdown, it has two options already:
- Use
ShutdownPolicy=Retain— the claim stays around with an expired condition. The external system deletes it when ready. - Delete with
--cascade=foregrounditself — the external system chooses to wait for full cleanup at the point of deletion.
Both keep the propagation policy as a caller-side decision, which is the standard k8s pattern(e.g. Deployment, Job,..etc). Hardcoding it in the controller's expiry path means the controller makes that choice on behalf of all users, and only applies to expiry — a user-initiated delete would still behave differently.
Use foreground propagation for claim deletion to ensure it remains in etcd until the sandbox and pod are stopped.