Skip to content

feat: Introduce controller concurrency and Kubernetes API flags#368

Open
tomergee wants to merge 5 commits intokubernetes-sigs:mainfrom
tomergee:concurrency-flags
Open

feat: Introduce controller concurrency and Kubernetes API flags#368
tomergee wants to merge 5 commits intokubernetes-sigs:mainfrom
tomergee:concurrency-flags

Conversation

@tomergee
Copy link
Contributor

@tomergee tomergee commented Mar 5, 2026

API flags configuration options with corresponding documentation, default values remain the same values as before:

  • --concurrent-workers (default: 1): The maximum number of concurrent reconciles for the controllers. Increase this value to process multiple Sandbox and SandboxClaim events in parallel.
  • --kube-api-qps (default: 20): The maximum Queries Per Second (QPS) sent to the Kubernetes API server from the controller.
  • --kube-api-burst (default: 30): The maximum burst for throttle requests to the Kubernetes API server.
  • --sandbox-concurrent-workers (default: 1): The maximum number of concurrent reconciles for the Sandbox controller.
  • --sandbox-claim-concurrent-workers (default: 1): The maximum number of concurrent reconciles for the SandboxClaim controller.
  • --sandbox-warm-pool-concurrent-workers (default: 1): The maximum number of concurrent reconciles for the SandboxWarmPool controller.
  • --kube-api-qps (default: -1 ; no rate limiting): The maximum Queries Per Second (QPS) sent to the Kubernetes API server from the controller.
  • --kube-api-burst (default: 10): The maximum burst for throttle requests to the Kubernetes API server.
    in cmd/agent-sandbox-controller/main.go

Plumbed the new limits into restConfig before setting up the controller manager. Updated SetupWithManager signatures across all controller files.

…onfiguration options with corresponding documentation.

default values remain the same values as before

--kube-api-qps (default 20)
--kube-api-burst (default 30)
--concurrent-workers (default 1)
in cmd/agent-sandbox-controller/main.go
Plumbed the new limits into restConfig before setting up the controller manager.
Updated SetupWithManager signatures across all controller files.
@netlify
Copy link

netlify bot commented Mar 5, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 7aceee5
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69aa4e878267160008bef37f

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 5, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @tomergee. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 5, 2026
@tomergee tomergee changed the title feat: Introduce controller concurrency and Kubernetes API QPS/burst c… feat: Introduce controller concurrency and Kubernetes API flags Mar 5, 2026
@vicentefb
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2026
@tomergee
Copy link
Contributor Author

tomergee commented Mar 5, 2026

/ok-to-test

…onfiguration options with corresponding documentation.
Copy link
Contributor

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the QPS default the same as controller-runtime? If yes, please also update the documentation as well.

tomergee added 3 commits March 6, 2026 03:35
…mPool controllers, and adjust kube API rate limit defaults

Added error and logical validation on flag values, updated configuration.md and refernced release manifests
… and SandboxWarmPool controllers, and adjust kube API rate limit defaults.
if kubeAPIQPS == 0 || kubeAPIQPS < -1 {
setupLog.Error(nil, "kube-api-qps must be greater than 0, or -1 for no rate limiting")
os.Exit(1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever want it to be -1 ?
Wouldn't it be safer to say this has to be > 0 ?

Copy link
Contributor Author

@tomergee tomergee Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is the default value that comes with the controller (before the change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 means disabling throttling (no limits).

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aditya-shantanu, tomergee
Once this PR has been reviewed and has the lgtm label, please assign janetkuo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
restConfig := ctrl.GetConfigOrDie()
restConfig.QPS = float32(kubeAPIQPS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define kubeAPIQPS as float instead of int to allow for fractional values like 0.5

}
// A logical maximum (too much will create unnecessary load on the API server)
totalWorkers := sandboxConcurrentWorkers + sandboxClaimConcurrentWorkers + sandboxWarmPoolConcurrentWorkers
if totalWorkers > 1000 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is 1000 decided? Do we need to enforce a max now? It might work in a very large cluster with a large master VM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is a logical maximum? we should enforce something? otherwise its just wasting resources and performance...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants