Skip to content

[SANDBOX-1399][SANDBOX-1395] apply all cluster resources at once and read the prior state from the nstemplateset status instead of labels#700

Merged
metlos merged 16 commits intocodeready-toolchain:masterfrom
metlos:cluster-resources-at-once
Oct 21, 2025
Merged

[SANDBOX-1399][SANDBOX-1395] apply all cluster resources at once and read the prior state from the nstemplateset status instead of labels#700
metlos merged 16 commits intocodeready-toolchain:masterfrom
metlos:cluster-resources-at-once

Conversation

@metlos
Copy link
Contributor

@metlos metlos commented Sep 8, 2025

When applying cluster-scoped resourcest the NSTemplateSet controller currently does the following:

  1. It applies only a single resource from the list at a time. I.e. as soon as it detects that it has created or updated a cluster-scoped resource, it exits and reschedules itself to continue.
  2. It uses labels on the cluster-scoped resources to mark them as part of the nstemplateset and uses those labels to determine the list of resources to consider during a reconcile.

This PR changes those 2 aspects such that:

  1. All the cluster-scoped resources are applied in one go during a single reconciliation.
  2. The membership of the resources is determined by comparing the current template ref (spec.clusterresources.templateref) with the last-applied templateref (status.clusterresources.templateref).

Those 2 changes required quite massive changes it the unit tests. I took the opportunity to slightly clean those up, too (removing a couple of redundant tests).

This change is getting us closer to unifying the way we deploy the cluster- and namespace-scoped resources.

Note that the e2e tests are currently not passing, so I'm opening this just as a draft PR to gather some preliminary thoughts and suggestions.

Jira:

Summary by CodeRabbit

  • New Features

    • Single-pass, template-driven cluster resource apply/cleanup with centralized status handling and per-object application flow.
    • Feature-change detection added to drive gated resource application.
  • Bug Fixes

    • Streamlined provisioning/delete control flow to avoid unnecessary requeues and consolidate error handling.
    • Improved tier promotion/demotion behavior for resource creation/removal.
  • Tests

    • Large test refactor and expansion: provisioning, updates, deletions, feature-flag scenarios, and new status/assertion helpers.

@coderabbitai
Copy link

coderabbitai bot commented Sep 8, 2025

Walkthrough

Refactors cluster-resources reconciliation to a template-driven, single-pass apply/cleanup flow; simplifies ensure/delete signatures to return only error; centralizes per-object lifecycle via objectApplier; introduces feature-change detection and updates tests and assertion helpers accordingly.

Changes

Cohort / File(s) Summary
Core reconciliation & controller
controllers/nstemplateset/cluster_resources.go, controllers/nstemplateset/nstemplateset_controller.go
Replaces multi-kind, multi-pass logic with template-driven single-pass processing using processTierTemplate and objectApplier; ensure/delete now return only error; removes per-call requeue for cluster resources and centralizes error handling.
Feature & status logic
controllers/nstemplateset/features.go, controllers/nstemplateset/status.go
Adds featuresChanged(); refactors featureAnnotationNeedsUpdate to normalize/deduplicate annotations and use containment checks (adds slices import); simplifies clusterResourcesNeedsUpdate nil-aware checks.
Reconciler tests & controller tests
controllers/nstemplateset/cluster_resources_test.go, controllers/nstemplateset/nstemplateset_controller_test.go
Large-scale test rewrite: replaces detailed kind-suite with reconciler-focused scenarios, adds feature-flag driven resource evaluation, expands provisioning/promotion/demotion flows, and introduces new status helpers (withStatusFeatureToggles, withStatusClusterResourcesInTier).
Feature/status tests
controllers/nstemplateset/features_test.go, controllers/nstemplateset/status_test.go
Adds TestFeaturesChanged (table-driven) and TestFeatureAnnotationNeedsUpdate suites covering annotation/status sync and edge cases (note: tests duplicated in file).
Test assertion utilities
test/cluster_assertion.go, test/nstemplateset_assertion.go
Adds WithoutLabel(key string) ResourceOption; adds HasLastAppliedClusterResourcesTemplateRef and HasStatusClusterResourcesNil; tightens nil-checks in existing cluster-resources assertions.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Reconciler
    participant ClusterMgr as clusterResources.ensure()
    participant Processor as processTierTemplate()
    participant Applier as objectApplier
    participant K8s as Kubernetes API

    Reconciler->>ClusterMgr: ensure(ctx, NSTemplateSet)
    ClusterMgr->>Processor: resolve tier template -> newObjects, currentObjects
    Processor-->>ClusterMgr: newObjects, currentObjects
    ClusterMgr->>Applier: newObjectApplier(newObjects, currentObjects)
    Applier->>Applier: Apply each new object (create/update)
    Applier->>K8s: create/update resource
    K8s-->>Applier: success / error
    Applier->>Applier: update NSTemplateSet status (Provisioning/Provisioned/Updating)
    Applier->>Applier: Cleanup stale objects (delete)
    Applier->>K8s: delete resource
    K8s-->>Applier: deleted / error
    Applier-->>ClusterMgr: result (nil or error)
    ClusterMgr-->>Reconciler: returns error only (no requeue flag)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • xcoulon
  • alexeykazakov

Poem

🐰
I hopped through templates, tidy and spry,
One pass to apply, old stubs goodbye,
Features checked, statuses sing—
Cluster folk dance in a single swing!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "[SANDBOX-1399][SANDBOX-1395] apply all cluster resources at once and read the prior state from the nstemplateset status instead of labels" directly and clearly describes the two primary behavioral changes reflected in the changeset: (1) a shift from applying cluster resources one-at-a-time to applying them all in a single reconciliation pass, and (2) a shift from using labels for resource membership determination to using the NSTemplateSet status fields (comparing spec.clusterresources.templateref with status.clusterresources.templateref). These changes align with the PR objectives and are substantively reflected across the modified files. The title is specific, concise, and avoids vague terminology, allowing a teammate scanning the commit history to quickly understand the main intent of the changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Sep 8, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved label Sep 8, 2025
@metlos metlos changed the title apply all cluster resources at once and read the prior state from the nstemplateset status instead of labels [SANDBOX-1399][SANDBOX-1395] apply all cluster resources at once and read the prior state from the nstemplateset status instead of labels Sep 8, 2025
@codecov
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 81.87135% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.21%. Comparing base (8e5e5fd) to head (cbdd7de).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
controllers/nstemplateset/cluster_resources.go 81.74% 17 Missing and 6 partials ⚠️
test/nstemplateset_assertion.go 41.66% 7 Missing ⚠️
...trollers/nstemplateset/nstemplateset_controller.go 75.00% 0 Missing and 1 partial ⚠️

❌ Your changes status has failed because you have indirect coverage changes. Learn more about Unexpected Coverage Changes and reasons for indirect coverage changes.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #700      +/-   ##
==========================================
+ Coverage   80.81%   81.21%   +0.39%     
==========================================
  Files          48       48              
  Lines        4478     4445      -33     
==========================================
- Hits         3619     3610       -9     
+ Misses        698      683      -15     
+ Partials      161      152       -9     
Files with missing lines Coverage Δ
controllers/nstemplateset/features.go 100.00% <100.00%> (ø)
controllers/nstemplateset/status.go 93.57% <100.00%> (+5.22%) ⬆️
test/cluster_assertion.go 100.00% <100.00%> (ø)
...trollers/nstemplateset/nstemplateset_controller.go 67.07% <75.00%> (+1.59%) ⬆️
test/nstemplateset_assertion.go 81.90% <41.66%> (+0.49%) ⬆️
controllers/nstemplateset/cluster_resources.go 82.80% <81.74%> (-1.07%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first quick review - overall looks good 👍 haven't reviewed the tests yet

Comment on lines +92 to +105
changeStatusIfNeeded := func() error {
if !statusChanged {
if firstDeployment {
err = r.setStatusProvisioningIfNotUpdating(ctx, nsTmplSet)
} else {
err = r.setStatusUpdatingIfNotProvisioning(ctx, nsTmplSet)
}
if err != nil {
return false, r.wrapErrorWithStatusUpdateForClusterResourceFailure(gvkCtx, nsTmplSet, err,
"failed to process template for the cluster resources with the name '%s'", nsTmplSet.Spec.ClusterResources.TemplateRef)
return err
}
statusChanged = true
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to run this in the loop? I know that it's executed only once, but my point is why not running

if firstDeployment {
	err = r.setStatusProvisioningIfNotUpdating(ctx, nsTmplSet)
} else {
	err = r.setStatusUpdatingIfNotProvisioning(ctx, nsTmplSet)
}
if err != nil {
	return err
}

simply here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the list of newObjs might be empty or only contain resources that should not be created. We'd be setting the status prematurely in those cases. There are pre-existing unit tests for this, so I wanted to keep the behavior as is because I frankly still don't fully see all the consequences changes like this might trigger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the newObjs doesn't contain anything, then there will most like be still something to be deleted (the refs don't match so we expect that there has been some change).
Anyway, I don't think that there would be any consequences on anything in the whole chain (this is a very rare corner-case), which means that it would be completely OK to simplify this.

@metlos metlos marked this pull request as ready for review September 24, 2025 10:01
@openshift-ci openshift-ci bot requested review from rsoaresd and xcoulon September 24, 2025 10:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controllers/nstemplateset/nstemplateset_controller_test.go (1)

1175-1176: Bug: wrong namespace asserted as the “second” one

This test asserts deletion of the dev namespace twice. The “second” namespace should be stage.

Apply this diff:

-                secondtNSName := fmt.Sprintf("%s-dev", spacename)
+                secondtNSName := fmt.Sprintf("%s-stage", spacename)
🧹 Nitpick comments (8)
controllers/nstemplateset/nstemplateset_controller_test.go (1)

817-819: Nit: fix comment typo

“updgraded” → “upgraded”.

controllers/nstemplateset/nstemplateset_controller.go (1)

219-233: Finalizer removal after issuing cluster-resource deletes may leave residuals

If any cluster-scoped resource has its own finalizer, Client.Delete can succeed while the object remains terminating. Removing the NSTemplateSet finalizer immediately could leave orphaned cluster resources.

  • Verify whether any cluster resources you apply can carry finalizers (e.g., CRDs like Idler).
  • If yes, consider requeue-until-deleted semantics (similar to namespaces) or at least checking for DeletionTimestamp/NotFound before removing the finalizer.
controllers/nstemplateset/cluster_resources_test.go (6)

356-359: Avoid variable shadowing of shared fixtures (rename local emptyCrq).

Local emptyCrq shadows the fixture name used in templates, reducing clarity. Rename the local to avoid confusion.

- emptyCrq := newClusterResourceQuota("empty", "advanced")
+ emptyCRQObj := newClusterResourceQuota("empty", "advanced")
@@
- manager, cl := prepareClusterResourcesManager(t, nsTmplSet, crq, crb, codeNs, previousTierTemplate, emptyCrq)
+ manager, cl := prepareClusterResourcesManager(t, nsTmplSet, crq, crb, codeNs, previousTierTemplate, emptyCRQObj)

391-394: Avoid variable shadowing of shared fixtures (rename local emptyCrq).

Same shadowing issue here; rename for clarity.

- emptyCrq := newClusterResourceQuota(spaceName, "withemptycrq")
- emptyCrq.Name = "for-empty"
- manager, cl := prepareClusterResourcesManager(t, nsTmplSet, emptyCrq, crq, crb, codeNs, previousTierTemplate)
+ emptyCRQObj := newClusterResourceQuota(spaceName, "withemptycrq")
+ emptyCRQObj.Name = "for-empty"
+ manager, cl := prepareClusterResourcesManager(t, nsTmplSet, emptyCRQObj, crq, crb, codeNs, previousTierTemplate)

119-127: Guard empty feature string to avoid [""] slice edge case.

Splitting an empty string yields []string{""}. While harmless here, guarding improves intent and avoids future footguns if logic changes.

- enabledFeatures := strings.Split(testRun.enabledFeatures, ",")
+ var enabledFeatures []string
+ if testRun.enabledFeatures != "" {
+   enabledFeatures = strings.Split(testRun.enabledFeatures, ",")
+ }
 for _, f := range allTierFeatures {
-  if slices.Contains(enabledFeatures, f) {
+  if slices.Contains(enabledFeatures, f) {
     clusterAssertion.HasResource(fmt.Sprintf("%s-for-%s", f, spacename), &quotav1.ClusterResourceQuota{})
   } else {
     clusterAssertion.HasNoResource(fmt.Sprintf("%s-for-%s", f, spacename), &quotav1.ClusterResourceQuota{})
   }
 }

323-324: Fix misleading inline comment.

The comment contradicts the assertion. Either remove it or adjust to reflect that the finalizer is present.

-   HasFinalizer(). // finalizer was not added and nothing else was done
+   HasFinalizer(). // finalizer present; controller reported UnableToTerminate below

756-756: Polish test title for clarity.

Minor grammar fix.

- t.Run("update to abcde13 fails because it find the template", func(t *testing.T) {
+ t.Run("update to abcde13 fails because it cannot find the template", func(t *testing.T) {

10-12: Prefer stdlib slices (Go 1.21+) or alias k8s slices to avoid future conflicts.

If you target Go 1.21+, switch to the stdlib “slices” package. If not, consider aliasing the k8s import (e.g., kslices) to reduce confusion when migrating.

What Go version does this repo target? If 1.21+, I can propose a small diff to use stdlib slices; otherwise I can suggest the alias update across the file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cc4430 and 9b9314b.

📒 Files selected for processing (6)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
  • controllers/nstemplateset/nstemplateset_controller.go (2 hunks)
  • controllers/nstemplateset/nstemplateset_controller_test.go (13 hunks)
  • test/cluster_assertion.go (1 hunks)
  • test/nstemplateset_assertion.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
controllers/nstemplateset/nstemplateset_controller_test.go (3)
test/nstemplateset_assertion.go (5)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Updating (232-238)
  • Provisioned (216-222)
  • NewTierTemplateName (212-214)
test/cluster_assertion.go (2)
  • AssertThatCluster (20-25)
  • WithLabel (45-51)
test/namespace_assertion.go (1)
  • AssertThatNamespace (32-38)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • UnableToProvisionClusterResources (258-265)
  • Updating (232-238)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build & push operator bundles for e2e tests
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (6)
test/cluster_assertion.go (1)

53-58: LGTM: handy assertion for label absence

The WithoutLabel option is clear and useful for negative label checks.

controllers/nstemplateset/nstemplateset_controller_test.go (2)

615-715: Assertions align with the “apply-all-at-once” flow

Good end-to-end coverage for cluster resources first, then namespace internals; matches the new reconciliation behavior.


1477-1483: LGTM: helpful status helper for cross-tier clusterresources

withStatusClusterResourcesInTier neatly sets up cross-tier last-applied refs for upgrade tests.

test/nstemplateset_assertion.go (2)

160-166: LGTM: added nil-guard for Spec.ClusterResources

The extra NotNil check prevents panics in assertions.


168-174: LGTM: status-based clusterresources assertion

HasLastAppliedClusterResourcesTemplateRef complements the spec-based assertion well.

controllers/nstemplateset/cluster_resources.go (1)

5-7: OK — go.mod targets Go 1.23, slices import is safe

go.mod declares "go 1.23.0" (>=1.21), so the standard-library slices package is supported; no change needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
controllers/nstemplateset/cluster_resources.go (1)

115-120: Potential performance issue with O(n²) nested loop for object matching

The nested loop to find and remove matching objects has O(n²) time complexity. With many cluster resources, this could cause performance issues.

Consider using a map for O(1) lookups:

-		// by removing the new objects from the current objects, we are going to be left with the objects
-		// that should be removed from the cluster after we're done with this loop
-		for oldIdx, curObj := range currentObjects {
-			if commonclient.SameGVKandName(curObj, newObj) {
-				currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)
-				break
-			}
-		}
+		// by removing the new objects from the current objects, we are going to be left with the objects
+		// that should be removed from the cluster after we're done with this loop
+		currentObjectsMap := make(map[string]int)
+		for i, curObj := range currentObjects {
+			key := fmt.Sprintf("%s/%s", curObj.GetObjectKind().GroupVersionKind().String(), curObj.GetName())
+			currentObjectsMap[key] = i
+		}
+		
+		newObjKey := fmt.Sprintf("%s/%s", newObj.GetObjectKind().GroupVersionKind().String(), newObj.GetName())
+		if idx, found := currentObjectsMap[newObjKey]; found {
+			currentObjects = slices.Delete(currentObjects, idx, idx+1)
+			delete(currentObjectsMap, newObjKey)
+		}

However, note that this approach requires careful index management after deletions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b9314b and c04a460.

📒 Files selected for processing (3)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
  • make/podman.mk (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • UnableToProvisionClusterResources (258-265)
  • Updating (232-238)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (27)
make/podman.mk (1)

17-17: Good dependency: ensure auth before pushing.

Adding podman-logged-in as a prerequisite for podman-push is the right call to gate pushes on auth.

controllers/nstemplateset/cluster_resources_test.go (18)

10-12: LGTM - Import additions for new functionality.

The new imports k8s.io/client-go/kubernetes/scheme and k8s.io/utils/strings/slices are appropriately added to support the enhanced test functionality. These align with the PR's shift to template-driven resource handling.


41-56: LGTM - Test case correctly validates provisioning status.

This test case effectively validates that the new single-reconciliation approach correctly sets the NSTemplateSet status to "Provisioning" and creates all cluster resources in one pass. The assertions for all expected resources (ClusterResourceQuota, ClusterRoleBinding, and Idlers) are comprehensive.


103-128: LGTM - Feature-gated resource testing is well-structured.

The test correctly validates that only resources for enabled features are deployed while ensuring feature-gated resources are properly excluded. The use of slices.Contains for feature checking is appropriate and the parameterized approach covers all relevant scenarios.


138-138: LGTM - Consistent API usage.

The transition from the previous multi-step approach to calling manager.ensure(ctx, nsTmplSet) directly aligns with the PR's objective to apply all cluster resources in a single reconciliation.


150-155: LGTM - Status-based resource management test.

The test properly validates the new status-based approach by using withStatusClusterResources("abcde11") to indicate already-provisioned resources, which aligns with the PR's shift from label-based to status-based resource tracking.


195-195: LGTM - Consistent API usage throughout tests.

All test cases consistently use manager.ensure(ctx, nsTmplSet) which aligns with the single-reconciliation approach described in the PR objectives.

Also applies to: 214-214, 361-361, 396-396, 424-424, 456-456, 488-488, 517-517, 546-546, 584-584, 612-612, 648-648, 672-672, 699-699, 716-716, 743-743, 762-762, 809-809


199-199: LGTM - Updated error message reflects new template-based approach.

The error message correctly references "TierTemplate for the to-be-applied cluster resources" which aligns with the PR's shift to template-driven resource handling.


222-222: LGTM - Detailed error message for cluster resource application failures.

The error message provides comprehensive details including resource name, GVK, and failure context, which will aid in debugging cluster resource application issues.


231-232: LGTM - Test environment setup moved to appropriate location.

Moving the environment variable setup to the beginning of the test function ensures it's properly configured for all test cases in the TestDeleteClusterResources suite.


237-237: LGTM - Updated NSTemplateSet setup with status-based approach.

The test correctly uses both withClusterResources("abcde11") and withStatusClusterResources("abcde11") which reflects the new status-based resource tracking approach described in the PR.


244-244: LGTM - Consistent deletion API usage.

All deletion test cases consistently use manager.delete(ctx, nsTmplSet) which maintains API consistency across the test suite.

Also applies to: 264-264, 288-288, 301-301, 317-317


250-250: LGTM - Correct assertion for ClusterRoleBinding deletion.

The assertion correctly verifies the absence of the ClusterRoleBinding with the expected name format spacename+"-tekton-view".


255-255: LGTM - Status-based resource tracking in deletion tests.

The tests properly use withStatusClusterResources("abcde11") to indicate the current state of cluster resources, which aligns with the PR's shift from label-based to status-based tracking.

Also applies to: 281-281


336-336: LGTM - Comprehensive variable naming consistency.

The consistent use of spaceName instead of spacename throughout the promotion and update tests improves code readability and follows Go naming conventions. All resource assertions correctly reference the updated variable names.

Also applies to: 350-358, 365-370, 381-393, 400-410, 419-434, 445-467, 478-497, 513-533, 544-558, 571-594, 604-623, 635-639, 643-656, 661-679, 684-707, 711-724, 736-754, 758-775, 787-816


343-348: LGTM - Template creation for previous state simulation.

The creation of previousTierTemplate objects correctly simulates the previous state of cluster resources, which is essential for testing the status-based comparison logic introduced in this PR.

Also applies to: 375-380, 413-418, 438-443, 471-476, 537-542, 565-570, 598-603, 730-735, 790-795


353-353: LGTM - Status-based resource tracking implementation.

The use of withStatusClusterResourcesInTier and withStatusClusterResources properly implements the new status-based approach where the controller determines resource membership by comparing current and previous templateRef values stored in status.

Also applies to: 384-384, 448-448, 481-481, 543-543, 575-575, 604-604, 665-665, 688-688, 802-802


369-371: Wrong type asserted for non-existence; should check CRQ, not CRB.

This assertion verifies absence of a ClusterRoleBinding named "for-empty", but based on the test context and the previous review comment, this should be checking for a ClusterResourceQuota instead.

Apply this diff:

 AssertThatCluster(t, cl).
   HasResource("for-"+spaceName, &quotav1.ClusterResourceQuota{}).
-  HasNoResource("for-empty", &quotav1.ClusterResourceQuota{})
+  HasNoResource("for-empty", &quotav1.ClusterResourceQuota{})

590-590: LGTM - Updated error messages reflect new resource handling.

The error messages correctly reference the new resource naming conventions and provide detailed failure context that will help with troubleshooting cluster resource application failures.

Also applies to: 619-619, 749-749

controllers/nstemplateset/cluster_resources.go (8)

46-48: Already flagged: Don't short‑circuit when templateRef is unchanged

This early return prevents re-applying missing/modified cluster resources when templateRef hasn't changed, breaking drift healing. Since you don't watch cluster resources, periodic reconciles will no-op and won't self-heal.


188-209: Already flagged: Fix variable scope issue

The err variable from the short if declaration is not accessible outside the if-block, causing a compile error on line 195.


171-209: Good addition of explicit delete functionality

The new delete method provides a clean way to explicitly remove all cluster-scoped resources when an NSTemplateSet is deleted. The implementation correctly processes the existing template to determine what resources to remove and handles errors appropriately.


93-106: Status management approach looks reasonable

The changeStatusIfNeeded closure properly differentiates between first deployment and update scenarios, setting appropriate statuses only once. This addresses the previous review concern about premature status setting.


68-82: Template processing logic is well-structured

The approach of processing both new and old tier templates to determine the desired state and current state is clean and logical. Error handling includes appropriate context for debugging.


117-117: Fix slices.Delete usage - requires slice reassignment

The slices.Delete function "modifies the contents of the slice" and "operates in place", but you must reassign the result to update the slice length properly.

Apply this diff:

-			currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)
+			currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)

Actually, looking more carefully at the code, the assignment is already present. However, there's still a logic issue - after deleting an element, the indices of subsequent elements shift, but the loop continues with the original indices. This can cause elements to be skipped.

-		for oldIdx, curObj := range currentObjects {
-			if commonclient.SameGVKandName(curObj, newObj) {
-				currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)
-				break
-			}
-		}
+		for oldIdx := 0; oldIdx < len(currentObjects); oldIdx++ {
+			curObj := currentObjects[oldIdx]
+			if commonclient.SameGVKandName(curObj, newObj) {
+				currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)
+				break
+			}
+		}

Likely an incorrect or invalid review comment.


30-30: Signature change is safe — no callers depend on the removed boolean

Verified: the only production caller is the Reconciler which calls r.clusterResources.ensure(ctx, nsTmplSet) and handles only an error (controllers/nstemplateset/nstemplateset_controller.go:139); tests in controllers/nstemplateset/cluster_resources_test.go also call it as error-only. No callers require the old (bool, error) return.


141-143: Verified — deleteObsoleteObjects is compatible with bulk deletion. The function iterates currentObjs and deletes entries not present in newObjects; cluster_resources.ensure prunes currentObjects against newObjs before calling deleteObsoleteObjects(ctx, r.Client, currentObjects, nil), so passing nil is correct. See controllers/nstemplateset/nstemplateset_controller.go:238–258.

@metlos metlos force-pushed the cluster-resources-at-once branch from c04a460 to 013cff1 Compare September 24, 2025 11:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controllers/nstemplateset/cluster_resources_test.go (1)

89-103: Subtest state leak: reuse of nsTmplSet mutates annotations across iterations.

Create a fresh NSTemplateSet per subtest; otherwise “no enabled features” can inherit prior features.

Apply:

-                if testRun.enabledFeatures != "" {
-                    nsTmplSet.Annotations = map[string]string{
-                        toolchainv1alpha1.FeatureToggleNameAnnotationKey: testRun.enabledFeatures,
-                    }
-                }
-                manager, fakeClient := prepareClusterResourcesManager(t, nsTmplSet)
+                // fresh NSTemplateSet per subtest to avoid leakage from previous iterations
+                nsTmplSet := newNSTmplSet(namespaceName, spacename, "advanced",
+                    withNamespaces("abcde11", "dev"), withClusterResources("abcde11"))
+                if testRun.enabledFeatures != "" {
+                    nsTmplSet.Annotations = map[string]string{
+                        toolchainv1alpha1.FeatureToggleNameAnnotationKey: testRun.enabledFeatures,
+                    }
+                }
+                manager, fakeClient := prepareClusterResourcesManager(t, nsTmplSet)
🧹 Nitpick comments (3)
controllers/nstemplateset/cluster_resources.go (2)

24-26: Fix outdated method doc: signature no longer returns (bool, error).

Apply:

-// Returns `true, nil` if something was changed, `false, nil` if nothing changed, `false, err` if an error occurred
+// Returns nil on success, or an error on failure.

122-130: Avoid err shadowing in apply error path.

Shadowing err reduces clarity. Use a different local or reuse err.

Apply:

-        if _, err = r.apply(ctx, nsTmplSet, newTierTemplate, newObj); err != nil {
-            err := fmt.Errorf("failed to apply changes to the cluster resource %s, %s: %w", newObj.GetName(), newObj.GetObjectKind().GroupVersionKind().String(), err)
-            return r.wrapErrorWithStatusUpdate(ctx, nsTmplSet, failureStatusReason, err, "failure while syncing cluster resources")
-        }
+        if _, err = r.apply(ctx, nsTmplSet, newTierTemplate, newObj); err != nil {
+            wrapErr := fmt.Errorf("failed to apply changes to the cluster resource %s, %s: %w",
+                newObj.GetName(), newObj.GetObjectKind().GroupVersionKind().String(), err)
+            return r.wrapErrorWithStatusUpdate(ctx, nsTmplSet, failureStatusReason, wrapErr, "failure while syncing cluster resources")
+        }
controllers/nstemplateset/cluster_resources_test.go (1)

756-757: Fix test name grammar.

Apply:

-        t.Run("update to abcde13 fails because it find the template", func(t *testing.T) {
+        t.Run("update to abcde13 fails because it cannot find the template", func(t *testing.T) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c04a460 and 013cff1.

📒 Files selected for processing (2)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:00.999Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T10:52:00.999Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:00.999Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources_test.go
  • controllers/nstemplateset/cluster_resources.go
🧬 Code graph analysis (2)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • UnableToProvisionClusterResources (258-265)
  • Updating (232-238)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (2)
controllers/nstemplateset/cluster_resources.go (2)

46-48: Short-circuit skips drift healing when refs match. Confirm this is intended.

With oldTemplateRef == newTemplateRef, no reconcile occurs, so externally deleted/modified cluster resources won’t be re-applied. Given the immutable TierTemplate assumption, this may be OK, but confirm we don’t need self-healing here (or add a periodic resync/watch later).


5-7: Stdlib slices is fine — repo targets Go 1.23.0
go.mod sets go 1.23.0, so the standard-library slices package is supported; no change required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
controllers/nstemplateset/cluster_resources.go (4)

22-24: Update function doc to match new signature (no bool return).

Comment still mentions a (bool, error) return. Please align the doc with the current signature.

Apply:

-// ensure ensures that the cluster resources exist.
-// Returns `true, nil` if something was changed, `false, nil` if nothing changed, `false, err` if an error occurred
+// ensure ensures that the cluster resources exist.
+// Returns an error on failure; nil on success.

112-118: Optional: avoid O(n²) scan by indexing currentObjects.

Not critical given small sets, but you can pre-index currentObjects by GVK+name for O(1) deletes to simplify and speed up the loop.

I can provide a small helper to build a map keyed by GroupVersionKind+Name and reconstruct the slice after deletions if you’d like.


121-129: Avoid err shadowing; improve readability.

The inner err := fmt.Errorf(...) shadows the outer err. Use a different name.

-        if _, err = r.apply(ctx, nsTmplSet, newTierTemplate, newObj); err != nil {
-            err := fmt.Errorf("failed to apply changes to the cluster resource %s, %s: %w", newObj.GetName(), newObj.GetObjectKind().GroupVersionKind().String(), err)
-            return r.wrapErrorWithStatusUpdate(ctx, nsTmplSet, failureStatusReason, err, "failure while syncing cluster resources")
+        if _, err = r.apply(ctx, nsTmplSet, newTierTemplate, newObj); err != nil {
+            wrapErr := fmt.Errorf("failed to apply changes to the cluster resource %s, %s: %w", newObj.GetName(), newObj.GetObjectKind().GroupVersionKind().String(), err)
+            return r.wrapErrorWithStatusUpdate(ctx, nsTmplSet, failureStatusReason, wrapErr, "failure while syncing cluster resources")
         }

181-181: Nit: use GetName() for consistency.

Minor consistency improvement only.

-    currentObjects, err := currentTierTemplate.process(r.Scheme, map[string]string{SpaceName: nsTmplSet.Name})
+    currentObjects, err := currentTierTemplate.process(r.Scheme, map[string]string{SpaceName: nsTmplSet.GetName()})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 013cff1 and 13ef4c3.

📒 Files selected for processing (1)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:00.999Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T10:52:00.999Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:00.999Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
📚 Learning: 2025-09-24T11:19:39.211Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.211Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
🧬 Code graph analysis (1)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (5)
controllers/nstemplateset/cluster_resources.go (5)

45-47: Early return disables drift healing when templateRef is unchanged — confirm intent.

Per your stated design (TierTemplates treated as immutable; content changes use new refs), this short-circuit is intentional. Note, though, it also prevents re-creating cluster resources if they are manually deleted or drifted without a ref change, even if a reconcile is triggered.

If that’s acceptable for now, keep as-is; otherwise, consider re-applying idempotently while deferring status changes until a real mutation is detected (using the createdOrModified flag from apply).

Do you want to keep this behavior until TierTemplateRevisions land, or should we enable self-healing now? I can propose a minimal refactor that avoids status churn.


140-142: Good reuse of deleteObsoleteObjects.

Switching to the shared helper aligns behavior with namespaces/space-roles and reduces duplication.


189-191: LGTM: use ObjectKeyFromObject for Get.

Correct for cluster-scoped resources and more future-proof than constructing NamespacedName manually.


5-7: Verified: go.mod uses Go 1.23.0 — stdlib slices (requires ≥1.21) is supported.


30-47: Confirmed: periodic resync used; no additional watch required.

SetupWithManager explicitly avoids watching cluster-scoped resources and relies on controller-runtime's periodic resync (manager.Options.Cache.SyncPeriod) — see controllers/nstemplateset/nstemplateset_controller.go lines 65–73; the early return in cluster_resources.ensure on identical TemplateRef is intentional.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13ef4c3 and 8f29cbe.

📒 Files selected for processing (6)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
  • controllers/nstemplateset/features.go (1 hunks)
  • controllers/nstemplateset/features_test.go (1 hunks)
  • controllers/nstemplateset/nstemplateset_controller_test.go (14 hunks)
  • controllers/nstemplateset/status.go (1 hunks)
👮 Files not reviewed due to content moderation or server errors (3)
  • controllers/nstemplateset/nstemplateset_controller_test.go
  • controllers/nstemplateset/features.go
  • controllers/nstemplateset/features_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T11:19:39.229Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.229Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources_test.go
  • controllers/nstemplateset/cluster_resources.go
  • controllers/nstemplateset/nstemplateset_controller_test.go
📚 Learning: 2025-09-24T10:52:01.009Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
  • controllers/nstemplateset/nstemplateset_controller_test.go
🧬 Code graph analysis (3)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • Updating (232-238)
  • UnableToProvisionClusterResources (258-265)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
controllers/nstemplateset/nstemplateset_controller_test.go (3)
test/nstemplateset_assertion.go (5)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Updating (232-238)
  • Provisioned (216-222)
  • NewTierTemplateName (212-214)
test/cluster_assertion.go (2)
  • AssertThatCluster (20-25)
  • WithLabel (45-51)
test/namespace_assertion.go (1)
  • AssertThatNamespace (32-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests

Comment on lines +45 to +47
if oldTemplateRef == newTemplateRef && !featuresChanged(nsTmplSet) {
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Restore drift healing when templateRef is unchanged

Short-circuiting when oldTemplateRef == newTemplateRef means we never even look at (let alone re-apply) the expected objects. If any cluster-scoped resource is deleted or mutated out-of-band, we won’t recreate or fix it because we bail out before the reconciliation logic runs. The previous implementation healed this drift; the new guard regresses that behaviour.

Please keep the “no work” fast-path only after you’ve proven that every object that should exist actually does (and no deletions are pending). One way to do that is to process the templates, detect whether any create/update/delete is required, and only then early-return; alternatively, remove the short-circuit and restructure the status transition so it’s triggered only when we truly perform work, avoiding spurious status flips. Either way, we must retain the ability to reconcile missing resources under the same templateRef.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f29cbe and b0f4327.

📒 Files selected for processing (4)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
  • controllers/nstemplateset/features_test.go (1 hunks)
  • controllers/nstemplateset/status.go (2 hunks)
  • controllers/nstemplateset/status_test.go (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T11:19:39.229Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.229Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources_test.go
🧬 Code graph analysis (1)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • Updating (232-238)
  • UnableToProvisionClusterResources (258-265)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (11)
controllers/nstemplateset/features_test.go (1)

101-158: LGTM! Good test coverage for feature change detection.

The new TestFeaturesChanged test provides comprehensive coverage for all feature change scenarios including edge cases with duplicates.

controllers/nstemplateset/status.go (2)

134-140: Clean implementation of cluster resources drift detection.

The simplified comparison logic using nil checks and TemplateRef comparison aligns well with the PR's goal of using status-based membership determination instead of labels. The logic correctly handles all four cases: both nil, both non-nil with same/different refs, and one nil/one non-nil.

This aligns with the learning from metlos about TierTemplates being immutable, where content changes result in new versions with new references.


109-131: Approve feature comparison logic

Sorted and deduplicated feature lists are compared bidirectionally and covered by existing tests (e.g., “should detect duplicates” in status_test.go), so no further changes needed.

controllers/nstemplateset/cluster_resources_test.go (8)

184-186: LGTM! Test correctly sets up the feature-1 CRQ.

The test correctly creates a ClusterResourceQuota with the name "feature-1-for-" + spacename to test cleanup of resources from disabled features. The resource is properly configured for the test scenario.


296-297: Verify correct resource type in assertion.

Good that this was already fixed based on past review comments - the test now correctly checks for absence of ClusterResourceQuota instead of ClusterRoleBinding for the "for-empty" resource.


36-56: LGTM! Test validates status transitions correctly.

The test properly validates that the controller sets the status to provisioning and creates the expected cluster resources (ClusterResourceQuota, ClusterRoleBinding, and Idler resources).


149-174: Test properly validates no-op when resources exist.

Good test for the scenario where all cluster resources are already created with matching status. The test correctly verifies that the status remains "Provisioned" and all expected resources still exist.


176-201: Excellent test for feature toggle cleanup.

This test effectively validates the cleanup of cluster resources when features are disabled. It correctly:

  1. Sets up a scenario where feature-1 was enabled (in status) but is now disabled (feature-2 in annotation)
  2. Creates a feature-1 resource that should be cleaned up
  3. Verifies the resource is deleted and status changes to "Updating"

806-843: Good test coverage for feature removal scenarios.

The test properly validates that when a feature annotation is removed from the NSTemplateSet, the corresponding cluster resource is deleted. This aligns well with the new status-based tracking approach.


370-374: Fix test parameter usage - pass both key and value.

The test.WithParams call appears to only pass the value without the key. Based on the codebase patterns, it should pass both the parameter name constant and the value.

Apply this fix consistently throughout the file:

-            test.WithParams(spacename),
+            test.WithParams(SpaceName, spacename),

500-502: Fix inconsistent parameter usage in test templates.

Multiple instances of test.WithParams(spacename) need to be fixed to include the parameter key.

-                    test.WithParams(spacename),
+                    test.WithParams(SpaceName, spacename),

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0f4327 and 43ee550.

📒 Files selected for processing (1)
  • openshift-ci/Dockerfile.tools (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests

@metlos metlos force-pushed the cluster-resources-at-once branch from 43ee550 to ef59420 Compare September 26, 2025 11:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controllers/nstemplateset/nstemplateset_controller_test.go (1)

1169-1176: Fix second namespace assertion to check 'stage', not 'dev'

The “second namespace” verification repeats dev; it should assert stage deletion.

Apply:

-                secondtNSName := fmt.Sprintf("%s-dev", spacename)
+                secondtNSName := fmt.Sprintf("%s-stage", spacename)
🧹 Nitpick comments (1)
controllers/nstemplateset/cluster_resources.go (1)

114-119: Potential inefficiency with repeated slice search

Using slices.Delete inside a loop can be O(n²) in the worst case, as each deletion requires shifting elements and the outer loop searches through the slice again for each new object.

Consider optimizing by building a map or using a different approach:

-		// by removing the new objects from the current objects, we are going to be left with the objects
-		// that should be removed from the cluster after we're done with this loop
-		for oldIdx, curObj := range currentObjects {
-			if commonclient.SameGVKandName(curObj, newObj) {
-				currentObjects = slices.Delete(currentObjects, oldIdx, oldIdx+1)
-				break
-			}
-		}
+		// Mark objects that should not be deleted by filtering them out later
+		toKeep := make(map[string]bool)
+		for _, newObj := range newObjs {
+			if shouldCreate(newObj, nsTmplSet) {
+				key := fmt.Sprintf("%s/%s", newObj.GetObjectKind().GroupVersionKind().String(), newObj.GetName())
+				toKeep[key] = true
+			}
+		}
+		
+		// Filter currentObjects to only those not in toKeep
+		filtered := currentObjects[:0]
+		for _, curObj := range currentObjects {
+			key := fmt.Sprintf("%s/%s", curObj.GetObjectKind().GroupVersionKind().String(), curObj.GetName())
+			if !toKeep[key] {
+				filtered = append(filtered, curObj)
+			}
+		}
+		currentObjects = filtered
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43ee550 and ef59420.

📒 Files selected for processing (10)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
  • controllers/nstemplateset/features.go (1 hunks)
  • controllers/nstemplateset/features_test.go (1 hunks)
  • controllers/nstemplateset/nstemplateset_controller.go (2 hunks)
  • controllers/nstemplateset/nstemplateset_controller_test.go (14 hunks)
  • controllers/nstemplateset/status.go (2 hunks)
  • controllers/nstemplateset/status_test.go (2 hunks)
  • test/cluster_assertion.go (1 hunks)
  • test/nstemplateset_assertion.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • controllers/nstemplateset/nstemplateset_controller.go
  • test/nstemplateset_assertion.go
  • controllers/nstemplateset/features_test.go
  • test/cluster_assertion.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T10:52:01.009Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
  • controllers/nstemplateset/nstemplateset_controller_test.go
📚 Learning: 2025-09-24T11:19:39.229Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.229Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
  • controllers/nstemplateset/nstemplateset_controller_test.go
  • controllers/nstemplateset/cluster_resources_test.go
🧬 Code graph analysis (3)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
controllers/nstemplateset/nstemplateset_controller_test.go (3)
test/nstemplateset_assertion.go (5)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Updating (232-238)
  • Provisioned (216-222)
  • NewTierTemplateName (212-214)
test/cluster_assertion.go (2)
  • AssertThatCluster (20-25)
  • WithLabel (45-51)
test/namespace_assertion.go (1)
  • AssertThatNamespace (32-38)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • Updating (232-238)
  • UnableToProvisionClusterResources (258-265)
  • UpdateFailed (240-247)
test/cluster_assertion.go (3)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (21)
controllers/nstemplateset/status.go (3)

112-131: Symmetric feature drift check looks correct

Set-style comparison in both directions with normalization is solid. Returning the cleaned, deduped list for status is a nice touch.


135-139: Nil-aware clusterResources drift check LGTM

Concise and correct for the intended “compare by TemplateRef only” design. Based on learnings.


5-5: stdlib ‘slices’ import is supported
go.mod specifies Go 1.23, satisfying the ≥1.21 requirement.

controllers/nstemplateset/features.go (1)

28-32: Thin wrapper is fine

Simple delegate to the status helper; no issues.

controllers/nstemplateset/status_test.go (1)

277-343: Good coverage for feature toggles sync (including duplicates)

Assertions match the normalized, deduped output and symmetric comparison behavior.

controllers/nstemplateset/nstemplateset_controller_test.go (2)

615-641: Provisioning expectations updated appropriately

Asserting CRQ/CRB/Idlers created in the first pass and withholding TemplateRef/Tier labels until inner resources are applied aligns with the new single-pass logic. Based on learnings.


1418-1422: Helper to seed Status.FeatureToggles is appropriate

This improves test setup clarity around status-vs-annotation scenarios.

controllers/nstemplateset/cluster_resources_test.go (6)

41-56: Status transitions and base objects creation assertions LGTM

Validates initial provisioning path and aligns with unified apply of cluster resources.


148-174: No-op path when already provisioned is correct

Ensures idempotency and preserves Provisioned condition with existing resources.


216-231: Error path for missing TierTemplate is asserted well

Message and UnableToProvisionClusterResources condition match the controller’s behavior.


246-250: Creation failure path assertion looks good

Checks error propagation and status condition; consistent with the new apply flow.


295-298: Correct resource type in absence assertion

Asserting HasNoResource for CRQ “for-empty” is now correct.


184-186: Seed CRQ with correct space and feature annotation

The fixture creates a CRQ with empty SpaceLabel and only overrides Name. It may not be associated with the user space for cleanup. Seed labels and feature annotation explicitly.

Apply:

-        crq := newClusterResourceQuota("", "advanced")
-        crq.Name = "feature-1-for-" + spacename // manually create the resource with the name matching the feature resource
+        crq := newClusterResourceQuota(spacename, "advanced",
+            withFeatureAnnotation("feature-1"),
+            withName("feature-1-for-"+spacename))
controllers/nstemplateset/cluster_resources.go (8)

5-6: LGTM!

Adding necessary imports for improved functionality.


29-47: LGTM! Template reference comparison with feature detection

The short-circuit logic correctly compares template references and includes feature change detection. Based on learnings, this approach assumes TierTemplates are immutable with versioning through new template references, which is the intended design pattern.


49-81: LGTM! Comprehensive template processing with proper error handling

The template retrieval and processing logic handles both new and old templates correctly, with contextual error wrapping for cluster resource failures.


83-105: LGTM! Well-structured status management

The status change detection and conditional updates properly differentiate between first deployment and updates, with appropriate failure status reasons.


107-130: LGTM! Efficient object processing with proper status updates

The object processing loop correctly applies new objects and removes them from the current objects list for later cleanup. The status change is properly handled before applying resources.


132-143: LGTM! Proper cleanup of obsolete objects

The logic correctly handles deletion of remaining objects with appropriate status updates before deletion.


171-208: LGTM! Comprehensive delete implementation with proper error handling

The delete method correctly processes the current template to determine objects to delete, handles missing resources gracefully, and includes proper logging and error handling.


187-206: Delete error handling is correct The else if ensures that a NotFound error handled in the first branch is never re‐evaluated in the second; no change required.

Likely an incorrect or invalid review comment.

@metlos
Copy link
Contributor Author

metlos commented Sep 30, 2025

/retest

I've seen TestIdlerAndPriorityClasses both fail and pass locally. Not sure if it a known flaky test? @rsoaresd, would you happen to know?

Not sure what might be causing the flakiness if it is introduced by this PR yet.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
controllers/nstemplateset/cluster_resources_test.go (1)

373-373: Fix template parameter usage: missing key in test.WithParams.

Multiple calls to test.WithParams(spacename) are missing the parameter key. The function expects key-value pairs, but only the value is provided. This would cause template parameter substitution to fail.

Based on previous review feedback, replace all occurrences:

-test.WithParams(spacename)
+test.WithParams(SpaceName, spacename)

This ensures the SpaceName constant is used as the key and spacename as the value.

Also applies to: 405-405, 443-443, 468-468, 501-501, 567-567, 595-595, 628-628, 760-760, 820-820

🧹 Nitpick comments (1)
controllers/nstemplateset/cluster_resources_test.go (1)

34-34: Inconsistent variable naming: spacename vs spaceName.

The file uses both spacename (line 34) and spaceName (line 363) to represent the space name. This inconsistency can lead to confusion and potential typos. Choose one convention and apply it consistently throughout the file.

Consider standardizing on spaceName (camelCase) as it follows Go naming conventions:

-	spacename := "johnsmith"
+	spaceName := "johnsmith"

Then update all references to use the standardized name.

Also applies to: 363-363

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef59420 and 1468340.

📒 Files selected for processing (1)
  • controllers/nstemplateset/cluster_resources_test.go (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (224-230)
  • Provisioned (216-222)
  • Updating (232-238)
  • UnableToProvisionClusterResources (258-265)
  • UpdateFailed (240-247)
test/cluster_assertion.go (4)
  • AssertThatCluster (20-25)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (2)
controllers/nstemplateset/cluster_resources_test.go (2)

148-174: LGTM: Idempotency test correctly validates no-op when resources are up-to-date.

The test properly verifies that when status.clusterresources.templateref matches spec.clusterresources.templateref, no changes are made. This aligns with the PR's shift from label-based to status-based membership tracking.


176-201: LGTM: Feature cleanup logic correctly removes resources for disabled features.

The test validates that resources for features no longer in the enabled list are properly cleaned up. The manual creation of the feature-gated CRQ (lines 184-186) correctly simulates a previously deployed feature resource, and the assertion confirms it's removed.

@metlos metlos force-pushed the cluster-resources-at-once branch from 2f42908 to 725c159 Compare October 16, 2025 11:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
controllers/nstemplateset/cluster_resources_test.go (1)

121-129: Guard against empty feature list to avoid sentinel ""

Minor clarity: avoid Split on empty string to prevent a [""] slice.

-                enabledFeatures := strings.Split(testRun.enabledFeatures, ",")
+                var enabledFeatures []string
+                if testRun.enabledFeatures != "" {
+                    enabledFeatures = strings.Split(testRun.enabledFeatures, ",")
+                }
controllers/nstemplateset/nstemplateset_controller.go (1)

235-239: Fix function comment: return type is error only, not (bool, error)

The comment describes boolean returns, but deleteObsoleteObjects returns only error.

-// deleteObsoleteObjects takes template objects of the current tier and of the new tier (provided as newObjects param),
-// compares their names and GVKs and deletes those ones that are in the current template but are not found in the new one.
-// return `true, nil` if an object was deleted, `false, nil`/`false, err` otherwise
+// deleteObsoleteObjects takes template objects of the current tier and of the new tier (provided as newObjects),
+// compares their names and GVKs, and deletes objects present in currentObjs but not found in newObjects.
+// Returns nil on success or an error if a deletion fails.
controllers/nstemplateset/nstemplateset_controller_test.go (1)

836-849: Remove duplicate withStatusClusterResources option

withStatusClusterResources("abcde11") is applied twice in this setup. It’s harmless but redundant.

 nsTmplSet := newNSTmplSet(namespaceName, spacename, "advanced",
     withNamespaces("abcde12", "dev"),
     withClusterResources("abcde12"),
-    withStatusClusterResources("abcde11"),
     withSpaceRoles(map[string][]string{
         "advanced-admin-abcde12":  {spacename},
         "advanced-viewer-abcde12": {"zorro", spacename, "viewer"},
     }),
     withStatusNamespaces("abcde11", "dev", "stage"),
     withStatusClusterResources("abcde11"),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f42908 and 725c159.

📒 Files selected for processing (10)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
  • controllers/nstemplateset/cluster_resources_test.go (11 hunks)
  • controllers/nstemplateset/features.go (1 hunks)
  • controllers/nstemplateset/features_test.go (1 hunks)
  • controllers/nstemplateset/nstemplateset_controller.go (2 hunks)
  • controllers/nstemplateset/nstemplateset_controller_test.go (14 hunks)
  • controllers/nstemplateset/status.go (2 hunks)
  • controllers/nstemplateset/status_test.go (2 hunks)
  • test/cluster_assertion.go (1 hunks)
  • test/nstemplateset_assertion.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • test/cluster_assertion.go
  • controllers/nstemplateset/features.go
  • controllers/nstemplateset/status_test.go
  • controllers/nstemplateset/status.go
  • controllers/nstemplateset/features_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.
📚 Learning: 2025-09-24T10:52:01.009Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
📚 Learning: 2025-09-24T11:19:39.229Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.229Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
🧬 Code graph analysis (3)
controllers/nstemplateset/nstemplateset_controller_test.go (3)
test/nstemplateset_assertion.go (5)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (231-237)
  • Updating (239-245)
  • Provisioned (223-229)
  • NewTierTemplateName (219-221)
test/cluster_assertion.go (2)
  • AssertThatCluster (20-25)
  • WithLabel (45-51)
test/namespace_assertion.go (1)
  • AssertThatNamespace (32-38)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
controllers/nstemplateset/cluster_resources_test.go (2)
test/nstemplateset_assertion.go (6)
  • AssertThatNSTemplateSet (35-41)
  • Provisioning (231-237)
  • Provisioned (223-229)
  • Updating (239-245)
  • UnableToProvisionClusterResources (265-272)
  • UpdateFailed (247-254)
test/cluster_assertion.go (5)
  • AssertThatCluster (20-25)
  • HasDeletionTimestamp (68-72)
  • Containing (60-66)
  • WithoutLabel (53-58)
  • WithLabel (45-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (2)
controllers/nstemplateset/cluster_resources.go (1)

34-37: Early return disables drift healing when templateRef/features are unchanged

With this guard, we skip reconciling cluster resources entirely if old/new refs and features are equal. Given we don’t watch cluster-scoped resources, periodic reconciles also no-op, so missing/modified resources won’t self‑heal.

If this is intentional (immutable TierTemplates and no drift-heal under same ref), please confirm. If drift-heal is desired, consider processing templates and only short‑circuit after detecting “no create/update/delete” needed, or idempotently re-apply expected objects while avoiding status flips when no actual changes occur.

Based on learnings

test/nstemplateset_assertion.go (1)

168-174: New status assertions look good

Both helpers are correct and improve test clarity.

Also applies to: 183-188

Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great job 👍 🚀

@openshift-ci
Copy link

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MatousJobanek, metlos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [MatousJobanek,metlos]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sonarqubecloud
Copy link

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
controllers/nstemplateset/cluster_resources.go (1)

236-252: LGTM! Clean obsolete resource cleanup.

The Cleanup method properly:

  • Defers status changes until work is needed (line 240)
  • Deletes obsolete resources via deleteObsoleteObjects (line 247)
  • Handles errors with appropriate status updates

Consider moving the comment from lines 245-246 closer to the deleteObsoleteObjects call for improved readability:

 func (oa *objectApplier) Cleanup(ctx context.Context) error {
 	if len(oa.currentObjects) > 0 {
-		// we'll be making changes to the cluster, because there are still some unprocessed objects.
-		// let's reflect that in the status, if it was not yet updated.
 		if err := oa.changeStatusIfNeeded(ctx); err != nil {
 			return err
 		}
 	}
 
-	// what we're left with here is the list of currently existing objects that are no longer present in the template.
-	// we need to delete them
+	// what we're left with in currentObjects is the list of existing objects that are no longer 
+	// present in the new template and need to be deleted
 	if err := deleteObsoleteObjects(ctx, oa.r.Client, oa.currentObjects, nil); err != nil {
 		return oa.r.wrapErrorWithStatusUpdate(ctx, oa.nstt, oa.failureStatusReason, err, "failure while syncing cluster resources")
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 725c159 and cbdd7de.

📒 Files selected for processing (1)
  • controllers/nstemplateset/cluster_resources.go (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T10:52:01.009Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:46-49
Timestamp: 2025-09-24T10:52:01.009Z
Learning: In the NSTemplateSet cluster resources controller, the current approach compares template references (oldTemplateRef vs newTemplateRef) rather than comparing actual cluster resource contents. This assumes TierTemplates are immutable - any content changes result in new TierTemplate versions with new references. The short-circuit when template refs are equal is intentional based on this design, as it cannot detect changes within the same TierTemplate.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
📚 Learning: 2025-09-24T11:19:39.229Z
Learnt from: metlos
PR: codeready-toolchain/member-operator#700
File: controllers/nstemplateset/cluster_resources.go:187-196
Timestamp: 2025-09-24T11:19:39.229Z
Learning: In controllers/nstemplateset/cluster_resources.go, the delete method specifically handles cluster-scoped resources only (not namespaced resources), so using types.NamespacedName{Name: toDelete.GetName()} in Client.Get calls is correct since cluster resources don't have namespaces. However, using runtimeclient.ObjectKeyFromObject() is preferred for future-proofing.

Applied to files:

  • controllers/nstemplateset/cluster_resources.go
🧬 Code graph analysis (1)
controllers/nstemplateset/cluster_resources.go (1)
controllers/nstemplateset/nstemplatetier.go (1)
  • SpaceName (100-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build & push operator bundles for e2e tests
🔇 Additional comments (8)
controllers/nstemplateset/cluster_resources.go (8)

3-20: LGTM!

The new imports (fmt, slices, commonclient) support the refactored all-at-once reconciliation flow.


22-60: Well-structured refactoring to single-pass reconciliation.

The shift from one-resource-per-reconcile to processing all cluster resources at once is cleanly implemented. The short-circuit logic (lines 35-37) correctly handles both template reference changes and feature toggle changes (line 70), addressing drift concerns while respecting TierTemplate immutability.


62-73: LGTM! Feature-aware change detection.

The inclusion of featuresChanged(nstt) on line 70 ensures that cluster resources are reconciled when feature toggles change, even if the templateRef remains the same. This properly addresses drift scenarios for feature-gated resources.

Based on learnings


75-95: Good extraction of tier template processing logic.

The processTierTemplate helper cleanly encapsulates template retrieval and processing, making it reusable for both spec and status comparisons.


97-118: LGTM!

The apply method correctly maintains labels for metadata while the reconciliation logic now relies on spec/status templateRef comparison for membership determination.


121-153: LGTM! Clean deletion flow.

The refactored delete method properly leverages processTierTemplate to derive the list of resources to remove and handles edge cases (not found, already deleting) correctly. The use of runtimeclient.ObjectKeyFromObject on line 134 is future-proof.

Based on learnings


155-191: Excellent objectApplier pattern introduction.

The objectApplier type cleanly encapsulates the apply/cleanup lifecycle with proper separation of concerns:

  • Deferred status changes (only when work is actually performed)
  • Clear distinction between first deployment vs update scenarios
  • Proper bookkeeping of current vs new objects

209-234: LGTM! Proper object application with feature-toggle awareness.

The Apply method correctly:

  • Respects feature toggles via shouldCreate (line 210)
  • Updates bookkeeping by removing matched objects from currentObjects (lines 216-221)
  • Uses commonclient.SameGVKandName for comparison (line 217)
  • Defers status changes until actually needed (line 223)

The use of slices.Delete with an immediate break (line 219) is safe for concurrent modification during iteration.

@metlos
Copy link
Contributor Author

metlos commented Oct 21, 2025

/retest

@metlos metlos merged commit 0568ed2 into codeready-toolchain:master Oct 21, 2025
13 of 16 checks passed
@metlos metlos deleted the cluster-resources-at-once branch March 18, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants