feat: add local integration testing infrastructure (envtest + kind) by ian-flores · Pull Request #54 · posit-dev/team-operator

ian-flores · 2026-01-27T00:04:49Z

Summary

Implements #51: a two-tier local integration testing strategy so developers can validate operator behavior without deploying to staging clusters.

Before: the only way to test controller behavior against a real Kubernetes API was deploying to an internal staging cluster.
After: fast local alternatives that work offline, run in CI, and support an iterative dev loop.

What This Adds

Tier 1: Envtest (~5s, no cluster)

Tests CRD schema validation and API storage against a real embedded K8s API server — no kubelet, no containers, no Docker.

Enabled the previously-skipped envtest suite in suite_test.go, fixing CRD paths and registering all required schemes (keycloak, traefik, secrets-store-csi-driver)
4 Ginkgo specs covering Site, Connect, Workbench, and PackageManager CRDs (internal/controller/core/site_envtest_test.go)

Tier 2: Kind cluster (minutes, full stack)

Tests the full operator Helm deployment and Site CR reconciliation end-to-end against a real cluster.

hack/test-kind.sh — deploys operator via Helm, creates a Site CR, verifies Connect and Workbench child CRs are reconciled, checks operator logs for panics, cleans up
make test-kind runs helm-generate before deploying so the chart is always in sync with the current API

Persistent Dev Loop

Instead of creating and destroying a cluster on every run, developers can keep a cluster alive across iterations:

make kind-setup    # create cluster + deploy operator (run once, or after code changes)
make kind-test     # run tests against the live cluster
make kind-teardown # clean up when done

make kind-setup is idempotent — running it after a code change rebuilds the image, reloads it into kind, and does a helm upgrade, so it doubles as the "reload" step.

CI Integration

Envtest runs on every PR
Kind tests run on PRs that touch relevant paths (api/**, internal/**, cmd/**, Dockerfile, Makefile, go.mod/sum, dist/chart/**, hack/test-kind.sh) via a check-changes job using dorny/paths-filter
Kind tests also run on push to main, nightly, and manual dispatch

Bug Fix

cleanupLegacyHomeApp was using a direct type assertion (err.(*meta.NoKindMatchError)) that misses wrapped errors, causing reconciliation to abort on any cluster without the secrets-store CSI driver (e.g. kind). Changed to errors.As.

Documentation

README.md: new Testing section surfacing both the one-shot and dev loop workflows as the primary recommended approach
docs/testing.md: full reference — prerequisites, all targets, development loop, troubleshooting, best practices

Test Plan

Envtest suite passes (4 specs, ~5s)
All existing unit tests continue to pass
make go-test exits cleanly
Kind tests pass end-to-end locally (make test-kind)
Dev loop tested: kind-setup → kind-test → code change → kind-setup → kind-test → kind-teardown
CI workflow triggers correctly on path-matched PRs
Smoke tested API change workflow: added a field to SiteSpec, ran make test-kind, confirmed the new field appeared in the live CRD schema, reverted

- Enable envtest suite: remove Skip(), fix CRD paths, add required schemes - Add Ginkgo-based envtest tests for Site, Connect, Workbench, PackageManager CRDs - Add kind integration tests with make targets (test-kind, test-kind-full) - Add GitHub Actions workflow for CI (envtest on PRs, kind on main/nightly) - Add comprehensive testing documentation Closes #51

claude · 2026-01-27T00:05:28Z

Claude finished @ian-flores's task —— View job

Reviewing PR #54

Review submitted with feedback on:

Test name/behavior mismatch in site_envtest_test.go
Missing cleanup in Site creation test
Ineffective validation test
Unused helper functions in suite_test.go
Unused GO_VERSION env var in workflow

claude

Review Summary

Good foundation for integration testing infrastructure. A few items to address:

Test Code Issues

internal/controller/core/site_envtest_test.go

Misleading test name (line 28): Test says "Should create child resources (Connect, Workbench, etc.)" but only verifies the Site CR was created - no child resources are checked. Either rename to match actual behavior or add child resource assertions (requires starting the controller).
Missing cleanup (line 67): The first Site test creates resources but doesn't clean them up, unlike other tests that properly delete their resources. Consider adding DeferCleanup or explicit cleanup.
Ineffective validation test (line 83): The "Should reject invalid Site specs" test accepts both success and failure outcomes. If validation behavior is defined, assert it; otherwise consider removing this test.

internal/controller/core/suite_test.go

Unused helper functions (lines 104-134): setupSiteControllerForEnvtest() and waitFor() are defined but never called. Remove dead code or add tests that use them.

Workflow Considerations

.github/workflows/integration-tests.yml

The GO_VERSION env var (line 43) is set to '1.22' but unused since go-version-file: go.mod is used instead. Consider removing the unused env var.

Documentation

The docs/testing.md is comprehensive and well-structured.

Security Notes

The hack/test-kind.sh script uses appropriate practices (set -euo pipefail, proper quoting)
Helm installation uses --wait and proper timeouts
No credential handling concerns identified

- Rename misleading test "Should create child resources" to accurately describe what it tests (Site CR creation and retrieval) - Add DeferCleanup to first Site test to match cleanup pattern of others - Remove ineffective validation test that accepted both success/failure - Remove unused setupSiteControllerForEnvtest and waitFor helpers - Remove unused GO_VERSION env var from integration-tests.yml

go test ./... with -covermode=atomic fails for packages with no test files in Go 1.25 due to covdata removal. Filter to only packages with test files using go list. Also improves kind integration tests: - Add docker-build and kind-load-image as test-kind prerequisites so the operator image is always present before running tests - Add test_reconciliation to verify the controller creates Connect and Workbench child CRs after a Site CR is applied - Add test_operator_logs to detect panics and confirm reconciliation activity in operator logs

ian-flores · 2026-02-23T19:48:15Z

@claude review this PR

claude · 2026-02-23T19:48:46Z

Claude finished @ian-flores's task —— View job

Reviewing PR #54

Read review guidelines and previous review
Read all changed files
Analyze code changes
Submit review feedback

Review

Solid infrastructure addition. The two-tier strategy is well-designed and the errors.As bugfix is correct. Since the last review, test names have been fixed, DeferCleanup was added to the Site test, the unused helper functions and invalid-spec test were removed, and the GO_VERSION env var was cleaned up. All prior review items appear addressed.

A few remaining observations:

Envtest Tests (`site_envtest_test.go`)

Tests only verify CRD schema, not controller behavior — The four tests create CRs and confirm they can be stored/retrieved, but no controller is started in the envtest suite (no manager, no reconciler registered). This means they test CRD registration, not reconciliation. The docs/testing.md and PR description claim these test "controller reconciliation logic" and "resource creation" — worth aligning the docs or adding a follow-up to wire up the controller. As-is, these are CRD smoke tests, which is still valuable.
Inconsistent cleanup patterns — The Site test uses DeferCleanup (line 68), but Connect/Workbench/PackageManager tests use explicit k8sClient.Delete at the end of the test (e.g. line 118). If the assertion fails mid-test, the explicit cleanup never runs, leaving resources behind. Consider using DeferCleanup consistently across all tests.
Eventually is unnecessary for synchronous creates — k8sClient.Create followed by k8sClient.Get against envtest's embedded API server is synchronous. The Eventually loops (e.g. lines 75-78, 109-112) add 30s timeout overhead on failure with no benefit. A direct Expect(k8sClient.Get(...)).To(Succeed()) would be simpler and fail faster.

Kind Script (`hack/test-kind.sh`)

test_reconciliation cleanup may leave child CRs — Line 312 deletes the Site, but if reconciliation created Connect/Workbench child CRs, those aren't explicitly cleaned up. If the operator isn't running (already cleaned up by the EXIT trap ordering), the finalizers or owner references won't cascade. This could leave resources behind in some failure paths, though the namespace deletion in cleanup() should catch it.
test_create_site runs without the operator deployed when CHART_DIR is missing — In the main() function, if the chart directory doesn't exist, the script skips operator deployment but still runs test_create_site and test_reconciliation (lines 390-391). test_reconciliation will always fail without an operator, making the fallback path misleading. Consider skipping reconciliation tests when the operator isn't deployed.

CI Workflow (`integration-tests.yml`)

Redundant image build/load steps — The kind job builds the image (line 117-119) and loads it (line 123), but make test-kind (line 127) has docker-build as a dependency already. This means the image is built twice. Consider either removing the explicit build step or using ./hack/test-kind.sh directly instead of make test-kind.
codecov-action@v4 without a token — The codecov upload step (line 76) uses v4 which requires a token. Since fail_ci_if_error: false is set this won't break CI, but the upload will silently fail on PRs from forks. Either add a Codecov token or use codecov-action@v5.

Makefile

KIND_VERSION variable is unused — KIND_VERSION ?= 1.29.x is defined at line 132 but never referenced by any target. The workflow uses a separate KIND_VERSION env var for the kind binary version (v0.23.0), and ENVTEST_K8S_VERSION is what controls the K8s API version for envtest. Consider removing the Makefile KIND_VERSION to avoid confusion.

Bug Fix (`site_controller_home_cleanup.go`)

The errors.As change is correct and well-motivated. Direct type assertion misses wrapped errors — this is the right fix.

Documentation (`docs/testing.md`)

Thorough and well-structured. One minor inaccuracy: the example in the "Envtest Suite" section (line 144) shows It("Should create child resources"...) which doesn't match the actual test names anymore.

Overall this is a good PR. The main feedback is about test fidelity (CRD smoke tests vs. actual controller reconciliation tests) and minor cleanup items. None are blockers.

ian-flores · 2026-02-23T20:54:41Z

Addressed all findings in 464ef4d:

Item 1 — Updated docs/testing.md to accurately describe envtest tests as CRD schema/storage smoke tests, not controller reconciliation tests. Also fixed the stale example test name.
Item 2 — Replaced explicit k8sClient.Delete cleanup at end of Connect, Workbench, and PackageManager tests with DeferCleanup, matching the pattern already used in the Site test.
Item 3 — Removed all Eventually polling loops; replaced with direct Expect(k8sClient.Get(...)).To(Succeed()). Removed the now-unused timeout/interval constants and "time" import.
Item 4 — Added explicit kubectl delete connect and kubectl delete workbench after Site deletion in test_reconciliation cleanup.
Item 5 — Moved test_reconciliation inside the if [[ -d "${CHART_DIR}" ]] block so it only runs when the operator is actually deployed.
Item 6 — Removed the explicit "Build operator image" and "Load image into kind" steps from the CI workflow; make test-kind handles both via its docker-build dependency and the tagging/loading in hack/test-kind.sh.
Item 7 — Upgraded codecov/codecov-action from v4 to v5.
Item 8 — Removed unused KIND_VERSION ?= 1.29.x from the Makefile.

…rdown

ian-flores · 2026-02-23T21:24:17Z

Local dev loop test ✔

Ran the persistent kind workflow end-to-end on the current branch:

$ make kind-setup
No kind clusters found.
Creating kind cluster 'team-operator-test'...
✓ Preparing nodes
✓ Starting control-plane
✓ Ready after 17s
[INFO] Deploying team-operator via Helm...
[INFO] Operator deployed successfully
[INFO] Waiting for operator to be ready...
[INFO] Success: operator deployment ready
[INFO] Kind cluster is ready. Run 'make kind-test' to execute tests.

$ make kind-test
[INFO] Testing: CRDs are installed...
[INFO]   CRD found: sites.core.posit.team
[INFO]   CRD found: connects.core.posit.team
[INFO]   CRD found: workbenches.core.posit.team
[INFO]   CRD found: packagemanagers.core.posit.team
[INFO] Test passed: All CRDs installed
[INFO] Testing: Operator logs...
[INFO] Test passed: Operator logs look healthy
[INFO] Testing: Site reconciliation...
[INFO] Site CR created, waiting for reconciliation...
[INFO] Child CRs created successfully
[INFO]   Connect CR found: test-site-reconcile
[INFO]   Workbench CR found: test-site-reconcile
[INFO] Test passed: Site reconciliation verified
[INFO] Testing: Create Site CR...
[INFO] Test passed: Site CR created
[INFO] ==========================================
[INFO] All integration tests passed!
[INFO] ==========================================

$ make kind-teardown
[INFO] Cleaning up...
release "team-operator" uninstalled
namespace "posit-team-system" deleted
namespace "posit-team" deleted
[INFO] Cleanup completed
Deleting cluster "team-operator-test" ...
Deleted nodes: ["team-operator-test-control-plane"]

Cluster creation and operator deployment only happen in kind-setup. kind-test is pure test execution — no setup overhead. Running kind-setup again after a code change will rebuild the image and helm upgrade in place without recreating the cluster.

ian-flores · 2026-02-23T21:27:37Z

How kind tests handle secrets without AWS credentials

The kind tests use type: test on both workloadSecret and mainDatabaseCredentialSecret in every Site CR. FetchSecret in api/product/secret.go switches on that type field and routes to SiteSecretTest, which reads from an in-memory GlobalTestSecretProvider map instead of hitting AWS Secrets Manager.

When the map is empty (as it is in kind tests — nothing pre-populates it), GetSecretWithFallback just returns the key name as the value. The controller gets a non-empty string, reconciliation proceeds, no AWS SDK call is made, no credentials required.

type: test is a first-class escape hatch baked into the operator for local/CI use. One thing to be aware of: it ships in the production binary, so it's a surface worth keeping an eye on if production hardening becomes a concern down the line.

stevenolen

massive, massive fan of this.

a few stray notes:

thanks for including the comment about secrets manager. ultimately, i think we need to extract the cloud-specific implementation bits, and formalize/document those integrations -- along with some local replacements for first-class kind support.
notes/additional pr that shows how to actually interact with the products once they are running in kind would be super cool

# [1.15.0](v1.14.0...v1.15.0) (2026-02-24) ### Features * add local integration testing infrastructure (envtest + kind) ([#54](#54)) ([a5b5751](a5b5751))

claude bot reviewed Jan 27, 2026

View reviewed changes

ian-flores added 5 commits February 19, 2026 11:53

Merge branch 'main' into issue-51-integration-tests

fb19366

fix: correct kind test infrastructure to run end-to-end

36f4f0a

fix: use errors.As for SecretProviderClass CRD absence check

1eb6555

ian-flores linked an issue Feb 23, 2026 that may be closed by this pull request

Add local integration testing infrastructure (envtest + kind) #51

Closed

4 tasks

ian-flores added 2 commits February 23, 2026 08:48

test: regenerate Helm chart before kind integration tests

5d3937c

Merge branch 'main' into issue-51-integration-tests

d7877f4

ian-flores added 2 commits February 23, 2026 12:50

ci: run kind tests on PRs touching relevant paths

7e0d94b

test: address review findings on integration test infrastructure

464ef4d

ian-flores added 2 commits February 23, 2026 12:56

Merge branch 'main' into issue-51-integration-tests

c291a06

feat: add persistent kind dev loop with kind-setup/kind-test/kind-tea…

193d70a

…rdown

ian-flores marked this pull request as ready for review February 23, 2026 21:27

ian-flores requested a review from stevenolen February 23, 2026 21:28

stevenolen approved these changes Feb 24, 2026

View reviewed changes

ian-flores added this pull request to the merge queue Feb 24, 2026

Merged via the queue into main with commit a5b5751 Feb 24, 2026
9 checks passed

ian-flores deleted the issue-51-integration-tests branch February 24, 2026 17:54

ian-flores pushed a commit that referenced this pull request Feb 24, 2026

chore(release): 1.15.0 [skip ci]

460ce0b

# [1.15.0](v1.14.0...v1.15.0) (2026-02-24) ### Features * add local integration testing infrastructure (envtest + kind) ([#54](#54)) ([a5b5751](a5b5751))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add local integration testing infrastructure (envtest + kind)#54

feat: add local integration testing infrastructure (envtest + kind)#54
ian-flores merged 12 commits intomainfrom
issue-51-integration-tests

ian-flores commented Jan 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

stevenolen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ian-flores commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What This Adds

Tier 1: Envtest (~5s, no cluster)

Tier 2: Kind cluster (minutes, full stack)

Persistent Dev Loop

CI Integration

Bug Fix

Documentation

Test Plan

Uh oh!

claude bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #54

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Review Summary

Test Code Issues

Workflow Considerations

Documentation

Security Notes

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #54

Review

Envtest Tests (site_envtest_test.go)

Kind Script (hack/test-kind.sh)

CI Workflow (integration-tests.yml)

Makefile

Bug Fix (site_controller_home_cleanup.go)

Documentation (docs/testing.md)

Uh oh!

ian-flores commented Feb 23, 2026

Uh oh!

ian-flores commented Feb 23, 2026

Local dev loop test ✔

Uh oh!

ian-flores commented Feb 23, 2026

How kind tests handle secrets without AWS credentials

Uh oh!

stevenolen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ian-flores commented Jan 27, 2026 •

edited

Loading

claude bot commented Jan 27, 2026 •

edited

Loading

claude bot commented Feb 23, 2026 •

edited

Loading

Envtest Tests (`site_envtest_test.go`)

Kind Script (`hack/test-kind.sh`)

CI Workflow (`integration-tests.yml`)

Bug Fix (`site_controller_home_cleanup.go`)

Documentation (`docs/testing.md`)