Skip to content

Enable Go race detector in test scripts and update dev docs#335

Open
clarabennettdev wants to merge 3 commits intokubernetes-sigs:mainfrom
clarabennettdev:feat/enable-race-detector-331
Open

Enable Go race detector in test scripts and update dev docs#335
clarabennettdev wants to merge 3 commits intokubernetes-sigs:mainfrom
clarabennettdev:feat/enable-race-detector-331

Conversation

@clarabennettdev
Copy link

Description

Fixes #331

This PR enables Go's -race flag in the test tooling to detect data races in concurrent controller code.

Changes

  • dev/tools/test-unit: Added -race flag to go test invocation
  • dev/tools/test-e2e: Added -race flag to Go e2e test invocation
  • docs/development.md: Added Testing section documenting race detection usage and runtime overhead caveats

Fixes kubernetes-sigs#331

- Added -race flag to dev/tools/test-unit
- Added -race flag to dev/tools/test-e2e (Go e2e tests)
- Updated docs/development.md with Testing section documenting
  race detection usage and runtime overhead caveats

Signed-off-by: Clara Bennett <clarabennett2626@gmail.com>
@netlify
Copy link

netlify bot commented Feb 20, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 83fc8ab
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69a33fed6d8d120008db0cf4

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 20, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @clarabennett2626!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 20, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @clarabennett2626. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 20, 2026
@janetkuo
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 20, 2026

Both `test-unit` and `test-e2e` run with Go's `-race` flag enabled by default to detect data races in concurrent code. This is especially important because the controllers (`SandboxReconciler`, `SandboxClaimReconciler`, `SandboxWarmPoolReconciler`) run concurrently via controller-runtime.

Note that enabling the race detector increases memory usage (5-10×) and execution time (2-20×). If you need to disable it for local development (e.g., resource-constrained environments), you can run `go test` directly without the `-race` flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb will this be a concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making -race available for local tests (e.g. add a flag to dev/tools/test-e2e and then have a new Make target test-e2e-race; same for unit tests) for now?

It might be fine to enable -race for unit tests in CI. We can run a separate periodic job for e2e with -race to avoid slowing down PRs.

@janetkuo janetkuo self-assigned this Feb 20, 2026
@janetkuo
Copy link
Member

janetkuo commented Feb 21, 2026

e2e test failure shows data race (TestRunChromeSandbox, TestRunPythonRuntimeSandbox, TestRunPythonRuntimeSandboxClaim, TestRunPythonRuntimeSandboxWarmpool), which proves that the race detector is working.


```sh
./dev/tools/test-e2e
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Both `test-unit` and `test-e2e` run with Go's `-race` flag enabled by default to detect data races in concurrent code. This is especially important because the controllers (`SandboxReconciler`, `SandboxClaimReconciler`, `SandboxWarmPoolReconciler`) run concurrently via controller-runtime.

Note that enabling the race detector increases memory usage (5-10×) and execution time (2-20×). If you need to disable it for local development (e.g., resource-constrained environments), you can run `go test` directly without the `-race` flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making -race available for local tests (e.g. add a flag to dev/tools/test-e2e and then have a new Make target test-e2e-race; same for unit tests) for now?

It might be fine to enable -race for unit tests in CI. We can run a separate periodic job for e2e with -race to avoid slowing down PRs.

@yashasvimisra2798
Copy link

@clarabennettdev Thanks for putting this up, the race detector surfacing these failures is really helpful. I’d be happy to coordinate and take a pass at stabilizing one of the failing e2e tests and open a small follow-up PR with a fix.
Let me know if you’d prefer I start with a specific test

@clarabennettdev
Copy link
Author

thanks @yashasvimisra2798 - sounds good, i'm down to coordinate. race detectors doing what it's supposed to, which is kinda the point anyway. ping me if you want a hand with the follow-up PR

Two race conditions were found by the race detector enabled in PR kubernetes-sigs#335:

1. PortForward (framework/client.go): bytes.Buffer was shared between
   the goroutine copying cmd stdout/stderr and the main goroutine
   polling the buffer via String(). Fixed by introducing a syncBuffer
   type that wraps bytes.Buffer with a mutex.

2. ChromeSandboxMetrics (chromesandbox_test.go): t.Logf with %+v used
   reflection to read AtomicTimeDuration fields directly, bypassing
   atomic accessors. Fixed by adding a String() method to
   ChromeSandboxMetrics that reads all fields through their atomic
   accessors, and using %s format instead of %+v.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clarabennettdev
Once this PR has been reviewed and has the lgtm label, please ask for approval from janetkuo. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 21, 2026
@clarabennettdev
Copy link
Author

thanks @yashasvimisra2798 for jumping in. i just pushed a fix for the race conditions

the race detector flagged two actual issues:

  1. PortForward in framework/client.go: we had a bytes.Buffer getting shared between the goroutine copying cmd stdout/stderr and the main goroutine polling it via String(). so reads/writes were racing. fixed it by adding a syncBuffer wrapper around bytes.Buffer with a mutex

  2. ChromeSandboxMetrics in chromesandbox_test.go: t.Logf with %+v ends up using reflection, which was reading AtomicTimeDuration fields directly and kinda sidestepping the atomic getters. fixed by giving ChromeSandboxMetrics a String() that reads everything via the atomic accessors

so yeah, race detector is doing its job — it caught real bugs!

Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarabennettdev Thanks for the quick turnaround on fixing those data races in the 4 failed e2e tests!

To move this forward, could you please address a few remaining items from my earlier review:

  1. It might be fine to enable -race for unit tests in CI by default, but for e2e, we should likely move that to a separate periodic job later to avoid slowing down PRs. Let's default it to off in the script for test-e2e and allow it to be enabled via a flag in Make.
  2. Please add a new Make target (e.g., test-e2e-race) in Makefile to enable the optional -race flag when running e2e tests (it could be used to run e2e tests locally or in a periodic job).
  3. Please add doc changes in the existing docs/testing.md instead of docs/development.md, and mention the new make target you added in the previous step.

Per reviewer feedback:
- Remove hardcoded -race from dev/tools/test-e2e; it is now opt-in via
  --race flag to avoid slowing down PR presubmits
- Add --race passthrough in dev/ci/presubmits/test-e2e
- Add Makefile target test-e2e-race for local use or periodic jobs
- Unit tests keep -race enabled by default (dev/tools/test-unit unchanged)
- Move race-detector documentation from docs/development.md to
  docs/testing.md and mention the new make target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 28, 2026
@k8s-ci-robot
Copy link
Contributor

@clarabennettdev: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmit-agent-sandbox-unit-test 83fc8ab link true /test presubmit-agent-sandbox-unit-test
presubmit-agent-sandbox-lint-go 83fc8ab link true /test presubmit-agent-sandbox-lint-go
presubmit-test-autogen-up-to-date 83fc8ab link true /test presubmit-test-autogen-up-to-date
presubmit-agent-sandbox-e2e-test 83fc8ab link true /test presubmit-agent-sandbox-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@aditya-shantanu
Copy link
Contributor

This is awesome.
I would very much like to run the e2e tests in race mode as a blocking step.
@janetkuo - I don't want to make this as optional. We can make it as a separate target but IMO this should be a presubmit test.

We can only enforce this after we have fixed the race issues but fixing them is a P0 imo. The fixing part can be outsourced to more ppl to help expedite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable Go’s race detector in tests and CI

5 participants