Enable Go race detector in test scripts and update dev docs#335
Enable Go race detector in test scripts and update dev docs#335clarabennettdev wants to merge 3 commits intokubernetes-sigs:mainfrom
Conversation
Fixes kubernetes-sigs#331 - Added -race flag to dev/tools/test-unit - Added -race flag to dev/tools/test-e2e (Go e2e tests) - Updated docs/development.md with Testing section documenting race detection usage and runtime overhead caveats Signed-off-by: Clara Bennett <clarabennett2626@gmail.com>
✅ Deploy Preview for agent-sandbox canceled.
|
|
Welcome @clarabennett2626! |
|
Hi @clarabennett2626. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
docs/development.md
Outdated
|
|
||
| Both `test-unit` and `test-e2e` run with Go's `-race` flag enabled by default to detect data races in concurrent code. This is especially important because the controllers (`SandboxReconciler`, `SandboxClaimReconciler`, `SandboxWarmPoolReconciler`) run concurrently via controller-runtime. | ||
|
|
||
| Note that enabling the race detector increases memory usage (5-10×) and execution time (2-20×). If you need to disable it for local development (e.g., resource-constrained environments), you can run `go test` directly without the `-race` flag. |
There was a problem hiding this comment.
How about making -race available for local tests (e.g. add a flag to dev/tools/test-e2e and then have a new Make target test-e2e-race; same for unit tests) for now?
It might be fine to enable -race for unit tests in CI. We can run a separate periodic job for e2e with -race to avoid slowing down PRs.
|
e2e test failure shows data race (TestRunChromeSandbox, TestRunPythonRuntimeSandbox, TestRunPythonRuntimeSandboxClaim, TestRunPythonRuntimeSandboxWarmpool), which proves that the race detector is working. |
|
|
||
| ```sh | ||
| ./dev/tools/test-e2e | ||
| ``` |
There was a problem hiding this comment.
make is more user friendly. We also have https://github.com/kubernetes-sigs/agent-sandbox/blob/main/docs/testing.md
docs/development.md
Outdated
|
|
||
| Both `test-unit` and `test-e2e` run with Go's `-race` flag enabled by default to detect data races in concurrent code. This is especially important because the controllers (`SandboxReconciler`, `SandboxClaimReconciler`, `SandboxWarmPoolReconciler`) run concurrently via controller-runtime. | ||
|
|
||
| Note that enabling the race detector increases memory usage (5-10×) and execution time (2-20×). If you need to disable it for local development (e.g., resource-constrained environments), you can run `go test` directly without the `-race` flag. |
There was a problem hiding this comment.
How about making -race available for local tests (e.g. add a flag to dev/tools/test-e2e and then have a new Make target test-e2e-race; same for unit tests) for now?
It might be fine to enable -race for unit tests in CI. We can run a separate periodic job for e2e with -race to avoid slowing down PRs.
|
@clarabennettdev Thanks for putting this up, the race detector surfacing these failures is really helpful. I’d be happy to coordinate and take a pass at stabilizing one of the failing e2e tests and open a small follow-up PR with a fix. |
|
thanks @yashasvimisra2798 - sounds good, i'm down to coordinate. race detectors doing what it's supposed to, which is kinda the point anyway. ping me if you want a hand with the follow-up PR |
Two race conditions were found by the race detector enabled in PR kubernetes-sigs#335: 1. PortForward (framework/client.go): bytes.Buffer was shared between the goroutine copying cmd stdout/stderr and the main goroutine polling the buffer via String(). Fixed by introducing a syncBuffer type that wraps bytes.Buffer with a mutex. 2. ChromeSandboxMetrics (chromesandbox_test.go): t.Logf with %+v used reflection to read AtomicTimeDuration fields directly, bypassing atomic accessors. Fixed by adding a String() method to ChromeSandboxMetrics that reads all fields through their atomic accessors, and using %s format instead of %+v.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clarabennettdev The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
thanks @yashasvimisra2798 for jumping in. i just pushed a fix for the race conditions the race detector flagged two actual issues:
so yeah, race detector is doing its job — it caught real bugs! |
janetkuo
left a comment
There was a problem hiding this comment.
@clarabennettdev Thanks for the quick turnaround on fixing those data races in the 4 failed e2e tests!
To move this forward, could you please address a few remaining items from my earlier review:
- It might be fine to enable
-racefor unit tests in CI by default, but for e2e, we should likely move that to a separate periodic job later to avoid slowing down PRs. Let's default it to off in the script for test-e2e and allow it to be enabled via a flag in Make. - Please add a new Make target (e.g., test-e2e-race) in
Makefileto enable the optional -race flag when running e2e tests (it could be used to run e2e tests locally or in a periodic job). - Please add doc changes in the existing
docs/testing.mdinstead ofdocs/development.md, and mention the new make target you added in the previous step.
Per reviewer feedback: - Remove hardcoded -race from dev/tools/test-e2e; it is now opt-in via --race flag to avoid slowing down PR presubmits - Add --race passthrough in dev/ci/presubmits/test-e2e - Add Makefile target test-e2e-race for local use or periodic jobs - Unit tests keep -race enabled by default (dev/tools/test-unit unchanged) - Move race-detector documentation from docs/development.md to docs/testing.md and mention the new make target Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@clarabennettdev: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
This is awesome. We can only enforce this after we have fixed the race issues but fixing them is a P0 imo. The fixing part can be outsourced to more ppl to help expedite. |
Description
Fixes #331
This PR enables Go's
-raceflag in the test tooling to detect data races in concurrent controller code.Changes
dev/tools/test-unit: Added-raceflag togo testinvocationdev/tools/test-e2e: Added-raceflag to Go e2e test invocationdocs/development.md: Added Testing section documenting race detection usage and runtime overhead caveats