Skip to content

feat: support inplace CPU resize for warm pool sandboxes#228

Merged
furykerry merged 4 commits intoopenkruise:masterfrom
PersistentJZH:feat/support-inplace-resize-cpu
Apr 20, 2026
Merged

feat: support inplace CPU resize for warm pool sandboxes#228
furykerry merged 4 commits intoopenkruise:masterfrom
PersistentJZH:feat/support-inplace-resize-cpu

Conversation

@PersistentJZH
Copy link
Copy Markdown
Member

@PersistentJZH PersistentJZH commented Mar 29, 2026

Ⅰ. Describe what this PR does

  • Implement pod inplace resize logic when Claim sandbox (both k8s crd api and e2b api).
  • Introduce SandboxClaimInPlaceCPUResizeGate to control the feature.
  • Fix json potentially unsafe quoting in CodeQL scan.
  • CI: Update test/kind-conf.yaml to enable InPlacePodVerticalScaling.

Ⅱ. Does this pull request fix one issue?

#68

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

TODO

  1. return the sandbox immediately once we are sure that the resizing is feasible (we can consider implement it in another PR, this PR already contains too much content)
  2. add docs to openkruise.io

Comment thread pkg/utils/inplaceupdate/inplace_update.go Fixed
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 2 times, most recently from b3b25d4 to 87510f7 Compare March 29, 2026 10:12
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 88.70293% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.28%. Comparing base (da8890f) to head (dcb00ed).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...ller/sandbox/core/common_inplace_update_handler.go 48.00% 39 Missing ⚠️
pkg/sandbox-manager/infra/sandboxcr/claim.go 82.45% 5 Missing and 5 partials ⚠️
pkg/servers/e2b/create.go 66.66% 3 Missing ⚠️
pkg/utils/inplaceupdate/inplace_update.go 99.27% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #228      +/-   ##
==========================================
+ Coverage   67.75%   69.28%   +1.52%     
==========================================
  Files         112      112              
  Lines        7689     8106     +417     
==========================================
+ Hits         5210     5616     +406     
- Misses       2185     2193       +8     
- Partials      294      297       +3     
Flag Coverage Δ
unittests 69.28% <88.70%> (+1.52%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread api/v1alpha1/sandboxclaim_types.go Outdated
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from 87510f7 to 9ececb9 Compare April 3, 2026 12:19
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from 9ececb9 to 73c494a Compare April 3, 2026 12:47
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 5 times, most recently from 8f0e5ed to 167d42c Compare April 6, 2026 16:58
@PersistentJZH PersistentJZH changed the title [WIP]feat: support inplace CPU resize for warm pool sandboxes feat: support inplace CPU resize for warm pool sandboxes Apr 6, 2026
@PersistentJZH
Copy link
Copy Markdown
Member Author

@furykerry @zmberg ready to review, thanks!

@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from 167d42c to 81938f3 Compare April 10, 2026 02:16
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 2 times, most recently from c85baf0 to 6ecb3bf Compare April 10, 2026 02:54
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from 6ecb3bf to 026e3ab Compare April 13, 2026 14:14
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 2 times, most recently from 90d431e to 95e0097 Compare April 13, 2026 14:36
Comment thread config/sandbox-manager/rbac.yaml Outdated
Comment thread pkg/utils/inplaceupdate/inplace_update.go
Comment thread pkg/controller/sandbox/core/common_inplace_update_handler.go
Comment thread pkg/controller/sandbox/core/common_inplace_update_handler.go Outdated
Comment thread pkg/controller/sandbox/core/common_inplace_update_handler.go Outdated
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from 95ba4af to 1e0b47e Compare April 17, 2026 12:26
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 3 times, most recently from 189be43 to e61dcaa Compare April 17, 2026 13:41
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from d75c962 to 369649a Compare April 19, 2026 10:40
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch 3 times, most recently from 3c2d542 to db5c963 Compare April 19, 2026 11:32
Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

Add the ability to resize CPU requests/limits on warm pool sandboxes

during claim time without pod recreation, using the Kubernetes pod

resize sub-resource API.

- Introduce SandboxClaimInPlaceCPUResizeGate (default: true) to

  control the feature in both SandboxClaim controller and E2B server.

- Implement pod inplace resize logic when Claim sandbox.

- Fix json potentially unsafe quoting in CodeQL scan.

- Update test/kind-conf.yaml to enable InPlacePodVerticalScaling feature.

# Conflicts:
#	pkg/controller/sandboxclaim/core/common_control.go
Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

- Fail fast on cpu resize.
- Shorten cpu resize related extension key names.
- Add resize subresource fallback for K8s < 1.33.
- Watch pod.Status.Resize field changes in pod event handler for K8s 1.27-1.32.
- Sync Sandbox Ready condition with Pod Ready regardless of inplace update outcome.
- Block checkSandboxReady during InplaceUpdating state to avoid premature ready signal before resize completes.
- Enhance E2E tests: verify actual pod spec resources after resize.
…antics

Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

- add RetryOnConflict for pod resize update and regenerate resize body with latest resourceVersion
- set Sandbox InplaceUpdate condition status to False on Failed, keep Succeeded as True
- avoid overwriting Ready condition transition metadata when pod ready status is unchanged
- move CPU resize feature-gate check from buildClaimOptions to EnsureClaimClaiming precondition
- remove unused `pods/resize` permission from sandbox-manager RBAC
- clean up/add some comments
- add cpu resize and image upgrade failed e2e cases
Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

- mark inplace update in-progress per successful sub-step
- change feature gate name to SandboxInplaceUpdateReasonInplaceUpdating
@PersistentJZH PersistentJZH force-pushed the feat/support-inplace-resize-cpu branch from db5c963 to dcb00ed Compare April 19, 2026 11:47
return infra.ClaimSandboxOptions{}, fmt.Errorf("resources must specify at least one of requests or limits")
}
for _, rl := range []corev1.ResourceList{res.Requests, res.Limits} {
if cpu, ok := rl[corev1.ResourceCPU]; ok {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check should be generic enough and not limited to cpu resources

if pod.Spec.Resources != nil {
processResourceList(requests, pod.Spec.Resources.Requests)
processResourceList(limits, pod.Spec.Resources.Limits)
qosLimitResources := getQOSResources(pod.Spec.Resources.Limits)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename qosLimitResources to qosLimitsFound to make the name consistent with the one in else clause

}
utils.SetSandboxCondition(newStatus, cond)

// Update ready condition to in-progress
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not update the sandbox ready condition to false?

if tmpl, ok := templateContainers[afterPod.Spec.Containers[i].Name]; ok {
afterPod.Spec.Containers[i].Resources = tmpl.Resources
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides qos changes, can we also check whether limit > request here in case the user only specify request, but the request is larger than the limit in the warm pool

Copy link
Copy Markdown
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kruise-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: furykerry

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@furykerry furykerry merged commit cb1b8a4 into openkruise:master Apr 20, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants