Skip to content

Conversation

@cloud-team-rebase-bot
Copy link

No description provided.

faiq and others added 12 commits December 22, 2025 16:36
…ot/cherry-pick-5793-to-release-2.10

[release-2.10] 🐛 fix: bumps golangci-lint to work with go 1.24+
…nd tests

This PR updates the default value for HostAffinity from `host` to `default` as that's also the AWS platform default,
and potentially a more sensible value to set if the user does not have a preference.

It also improves the API's go doc comments to further explain the
effects of the settings and adds a bunch more units to pinpoint the
exact behaviour described in the updated doc.
…ot/cherry-pick-5801-to-release-2.10

[release-2.10] 🐛 fix: change HostAffinity default 'host'->'default' improved API doc and tests
Relaxes the validation for ROSA NodePool autoscaling to allow users to
specify a minimum of 0 replicas, enabling scale-to-zero scenarios.
MaxReplicas remains with a minimum of 1.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
…ot/cherry-pick-5816-to-release-2.10

[release-2.10] 🌱 Allow ROSA NodePool autoscaling MinReplicas to be 0
Signed-off-by: serngawy <serngawy@gmail.com>
…ot/cherry-pick-5786-to-release-2.10

[release-2.10] ✨ ROSA Add logForward config AND ImageTypes
Signed-off-by: serngawy <serngawy@gmail.com>
…ot/cherry-pick-5842-to-release-2.10

[release-2.10] 🐛  Fix flaky test TestROSARoleConfigReconcileExist
the webhook server should use the tlsconfig specified in the manager
options, so users setting tls fields in the manager see their preference
honoured not only for the metrics server but also for the webhook
server.
…ot/cherry-pick-5848-to-release-2.10

[release-2.10] 🐛 fix: use tlsconfig from the manager options for the webhook server
@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

Walkthrough

This pull request introduces support for ROSA log forwarding capabilities, updates host affinity behavior and validation across machine and control plane types, adds image type support for ROSA machine pools, and updates multiple dependencies. Changes span workflow configurations, API type definitions, validation webhooks, CRD schemas, controller reconciliation logic, and generated code.

Changes

Cohort / File(s) Summary
Linter and Workflow Configuration
.github/workflows/pr-golangci-lint.yaml, hack/tools/.custom-gcl.yaml, .golangci.yml
Updated golangci-lint version from v2.1.0 to v2.7.0; expanded revive and staticcheck rules to enforce naming conventions and flag deprecated fields across additional Go modules and paths.
API Type Definitions - HostAffinity
api/v1beta1/types.go, api/v1beta2/awsmachine_types.go, api/v1beta2/types.go
Changed HostAffinity default value from "host" to "default"; expanded documentation describing behavior when HostAffinity is "host" vs "default" with detailed semantics for HostID presence and placement modes. Added comment clarifications in webhook validation.
API Type Definitions - ROSA Log Forwarders
controlplane/rosa/api/v1beta2/rosacontrolplane_types.go
Introduced two new structs: CloudWatchLogForwarderConfig and S3LogForwarderConfig; added corresponding optional fields to RosaControlPlaneSpec. Updated AutoScaling.MinReplicas validation minimum from 1 to 0.
API Type Definitions - ROSA Machine Pools
exp/api/v1beta2/rosamachinepool_types.go
Added new ImageType field to RosaMachinePoolSpec with enum validation for Windows and Default values.
Webhook Validation - HostAffinity
api/v1beta2/awsmachine_webhook.go, api/v1beta2/awsmachinetemplate_webhook.go
Added comment documenting mutual exclusivity of hostID and dynamicHostAllocation; added validation requiring at least one of hostID or dynamicHostAllocation when HostAffinity is "host".
Webhook Validation Tests - HostAffinity
api/v1beta2/awsmachine_webhook_test.go, api/v1beta2/awsmachinetemplate_webhook_test.go
Expanded test coverage for HostAffinity validation scenarios including invalid affinity values, mutual exclusivity checks, and combinations with hostID and dynamicHostAllocation. Removed prior test cases assuming different validation rules.
CRD Base Schemas - HostAffinity and Deprecations
config/crd/bases/infrastructure.cluster.x-k8s.io_awsmachines.yaml, config/crd/bases/infrastructure.cluster.x-k8s.io_awsmachinetemplates.yaml, config/crd/bases/infrastructure.cluster.x-k8s.io_awsclusters.yaml, config/crd/bases/controlplane.cluster.x-k8s.io_awsmanagedcontrolplanes.yaml
Updated HostAffinity default from "host" to "default"; expanded descriptions with detailed semantics for various HostAffinity and HostID combinations. Added deprecation notes to ARN fields in additionalSecurityGroups and subnet.
CRD Base Schemas - ROSA Features
config/crd/bases/controlplane.cluster.x-k8s.io_rosacontrolplanes.yaml, config/crd/bases/infrastructure.cluster.x-k8s.io_rosamachinepools.yaml
Added cloudWatchlogForwarder and s3LogForwarder fields to RosaControlPlane spec; added imageType field to ROSAMachinePool spec; changed AutoScaling.minReplicas minimum from 1 to 0.
Generated Deep Copy Methods
controlplane/rosa/api/v1beta2/zz_generated.deepcopy.go
Added DeepCopyInto and DeepCopy methods for CloudWatchLogForwarderConfig and S3LogForwarderConfig; updated RosaControlPlaneSpec.DeepCopyInto to handle deep copying of new forwarder fields.
Controller Implementation - Log Forwarders
controlplane/rosa/controllers/rosacontrolplane_controller.go
Implemented log forwarder reconciliation logic: added reconcileLogForwarders, reconcileLogForwarder, buildLogForwarders, findExistingLogForwarders, and buildGroups helper functions. Integrated log forwarder reconciliation into main cluster reconciliation flow and OCM cluster spec population.
Controller Implementation - Image Type
exp/controllers/rosamachinepool_controller.go, exp/utils/rosa_helper.go
Added conditional logic to propagate ImageType field from RosaMachinePoolSpec into NodePool configuration; updated NodePoolToRosaMachinePoolSpec to populate ImageType from nodePool object.
Controller Tests - Log Forwarders
controlplane/rosa/controllers/rosacontrolplane_controller_test.go
Added comprehensive tests for log forwarder construction and reconciliation: TestBuildLogForwarders covering various configurations, TestReconcileLogForwarders_CreateCloudWatchAndS3 for creation scenarios, and TestReconcileLogForwarders_UpdateCW_DeleteS3 for update/delete scenarios.
Controller Tests - ROSA Role Config
exp/controllers/rosaroleconfig_controller_test.go
Replaced synchronous reconciliation verification with asynchronous Gomega Eventually blocks using polling with 30s timeout; refactored test assertions to check status within eventual verification blocks.
Controller Tests - Image Type
exp/controllers/rosamachinepool_controller_test.go
Added ImageType field assignment in test setup for RosaMachinePoolSpec.
Webhook Server Configuration
main.go
Updated manager options retrieval to also fetch TLS options; propagated TLSOpts to WebhookServer configuration for TLS support.
OpenShift Infrastructure Configuration
openshift/infrastructure-components-openshift.yaml
Added multiple new CRD definitions for AWS infrastructure resources (AWSMachinePool, AWSManagedCluster, ROSACluster, ROSARoleConfig, ROSANetwork, ROSAControlPlane); added webhook configurations for mutation and validation; added ValidatingAdmissionPolicy for AWS cluster protection.
Dependency and Version Updates
go.mod, openshift/tools/go.mod, openshift/provider-version.mk
Updated multiple Go module dependencies including openshift-online/ocm-api-model/clientapi (v0.0.431→v0.0.440), openshift/rosa (v1.2.57→v1.99.9-testing), Prometheus client_golang (v1.23.0→v1.23.2), golang.org/x/* packages, and Kubernetes-related modules. Bumped provider version from v2.10.0 to v2.10.1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 29, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 29, 2026

Hi @cloud-team-rebase-bot[bot]. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link

openshift-ci bot commented Jan 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cloud-team-rebase-bot and others added 12 commits February 9, 2026 12:06
# Conflicts:
#	.github/dependabot.yml
#	.github/workflows/dependabot.yml
#	OWNERS_ALIASES

# Conflicts:
#	.github/PULL_REQUEST_TEMPLATE.md
#	.github/dependabot.yml

# Conflicts:
#	.github/workflows/codeql-analysis.yml
#	.github/workflows/dependabot.yml

# Conflicts:
#	.github/workflows/codeql-analysis.yml
#	.github/workflows/dependabot.yml
#	OWNERS_ALIASES
# Conflicts:
#	openshift/infrastructure-components.yaml
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@exp/controllers/rosamachinepool_controller.go`:
- Around line 545-549: The computeSpecDiff routine is causing reconcile churn
because an empty desiredSpec.ImageType (omitted in the RosaMachinePool CR) does
not match OCM's default "Default"; update computeSpecDiff to normalize
desiredSpec.ImageType before comparison by setting desiredSpec.ImageType =
string(cmv1.ImageTypeDefault) when it is empty so the diff treats omitted and
OCM-defaulted values as equal (alternatively, add the same defaulting to the
Default() webhook or ensure buildNodePoolFromSpec logic that sets ImageType for
Windows/Default aligns with computeSpecDiff normalization).

In `@main.go`:
- Around line 147-150: The code calls flags.GetManagerOptions and logs
setupLog.Error when err != nil but continues execution, which can lead to
dereferencing the returned metricsOptions and panics; update the error path in
main after the call to flags.GetManagerOptions (variables: tlsOptions,
metricsOptions, err) to fail fast by exiting the process (e.g., return from main
or call os.Exit(1)) immediately after logging the error so no later code (that
uses metricsOptions) runs with an invalid value.
🧹 Nitpick comments (2)
exp/controllers/rosaroleconfig_controller_test.go (2)

381-382: Avoid reusing outer-scope err variable inside Eventually closure.

Line 381 assigns to the err variable declared at line 341 (from CreateNamespace). This is confusing and inconsistent with line 604 which uses inline error checking. Use the same pattern:

-		err = reconciler.Client.Get(ctx, req.NamespacedName, updatedRoleConfig)
-		g.Expect(err).ToNot(HaveOccurred())
+		g.Expect(reconciler.Client.Get(ctx, req.NamespacedName, updatedRoleConfig)).ToNot(HaveOccurred())

396-396: Add explicit .WithPolling() for consistency.

This Eventually block is missing .WithPolling(500 * time.Millisecond) which is present in TestROSARoleConfigReconcileExist at line 630. Without it, the default polling interval (10ms) will cause many more Reconcile calls than necessary.

-	}).WithTimeout(30 * time.Second).Should(Succeed())
+	}).WithTimeout(30 * time.Second).WithPolling(500 * time.Millisecond).Should(Succeed())

Comment on lines +545 to +549
if rosaMachinePoolSpec.ImageType == string(cmv1.ImageTypeWindows) {
npBuilder = npBuilder.ImageType(cmv1.ImageTypeWindows)
} else if rosaMachinePoolSpec.ImageType == string(cmv1.ImageTypeDefault) {
npBuilder = npBuilder.ImageType(cmv1.ImageTypeDefault)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's examine the actual code at lines 545-549
cat -n exp/controllers/rosamachinepool_controller.go | sed -n '540,555p'

Repository: openshift/cluster-api-provider-aws

Length of output: 742


🏁 Script executed:

#!/bin/bash
# Find computeSpecDiff implementation
rg -n "func computeSpecDiff" -A 20 -g'*.go'

Repository: openshift/cluster-api-provider-aws

Length of output: 2079


🏁 Script executed:

#!/bin/bash
# Search for RosaMachinePool defaulting logic
rg -n "RosaMachinePool.*Default|func.*Default.*RosaMachinePool" -g'*.go' -i

Repository: openshift/cluster-api-provider-aws

Length of output: 834


🏁 Script executed:

#!/bin/bash
# Look for ImageType handling and defaults
rg -n "ImageType.*Default|DefaultImageType|ImageTypeDefault" -g'*.go' -B 2 -A 2

Repository: openshift/cluster-api-provider-aws

Length of output: 2431


🏁 Script executed:

#!/bin/bash
# Check for NodePoolToRosaMachinePoolSpec implementation
rg -n "NodePoolToRosaMachinePoolSpec" -A 15 -g'*.go'

Repository: openshift/cluster-api-provider-aws

Length of output: 7984


🏁 Script executed:

#!/bin/bash
# Check the full Default() webhook implementation
cat -n exp/api/v1beta2/rosamachinepool_webhook.go | sed -n '161,200p'

Repository: openshift/cluster-api-provider-aws

Length of output: 602


🏁 Script executed:

#!/bin/bash
# Check the RosaMachinePool spec definition for ImageType
cat -n exp/api/v1beta2/rosamachinepool_types.go | sed -n '75,95p'

Repository: openshift/cluster-api-provider-aws

Length of output: 1096


Normalize ImageType before computing spec diff to prevent reconcile churn.

When a RosaMachinePool CR omits ImageType, the field remains empty. However, OCM defaults ImageType to "Default" when creating the NodePool. On reconciliation, computeSpecDiff detects a mismatch (empty string vs "Default") and repeatedly triggers UpdateNodePool calls even when the CR has not changed. This causes unnecessary API traffic and reconcile churn.

The Default() webhook does not currently default ImageType, and buildNodePoolFromSpec (lines 545-549) skips empty strings. Add normalization in computeSpecDiff to align empty desiredSpec.ImageType with the OCM default before comparison, or default ImageType to "Default" in the webhook.

Example normalization in computeSpecDiff
 func computeSpecDiff(desiredSpec expinfrav1.RosaMachinePoolSpec, nodePool *cmv1.NodePool) string {
 	currentSpec := utils.NodePoolToRosaMachinePoolSpec(nodePool)
+	if desiredSpec.ImageType == "" {
+		desiredSpec.ImageType = currentSpec.ImageType
+	}
 
 	ignoredFields := []string{
🤖 Prompt for AI Agents
In `@exp/controllers/rosamachinepool_controller.go` around lines 545 - 549, The
computeSpecDiff routine is causing reconcile churn because an empty
desiredSpec.ImageType (omitted in the RosaMachinePool CR) does not match OCM's
default "Default"; update computeSpecDiff to normalize desiredSpec.ImageType
before comparison by setting desiredSpec.ImageType =
string(cmv1.ImageTypeDefault) when it is empty so the diff treats omitted and
OCM-defaulted values as equal (alternatively, add the same defaulting to the
Default() webhook or ensure buildNodePoolFromSpec logic that sets ImageType for
Windows/Default aligns with computeSpecDiff normalization).

Comment on lines +147 to 150
tlsOptions, metricsOptions, err := flags.GetManagerOptions(managerOptions)
if err != nil {
setupLog.Error(err, "Unable to start manager: invalid flags")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on invalid manager flags.

If err is non-nil at Line 147-150, execution continues and *metricsOptions is dereferenced at Line 186, which can panic or run with invalid config. Exit after logging.

🛠️ Suggested fix
 if err != nil {
 	setupLog.Error(err, "Unable to start manager: invalid flags")
+	os.Exit(1)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tlsOptions, metricsOptions, err := flags.GetManagerOptions(managerOptions)
if err != nil {
setupLog.Error(err, "Unable to start manager: invalid flags")
}
tlsOptions, metricsOptions, err := flags.GetManagerOptions(managerOptions)
if err != nil {
setupLog.Error(err, "Unable to start manager: invalid flags")
os.Exit(1)
}
🤖 Prompt for AI Agents
In `@main.go` around lines 147 - 150, The code calls flags.GetManagerOptions and
logs setupLog.Error when err != nil but continues execution, which can lead to
dereferencing the returned metricsOptions and panics; update the error path in
main after the call to flags.GetManagerOptions (variables: tlsOptions,
metricsOptions, err) to fail fast by exiting the process (e.g., return from main
or call os.Exit(1)) immediately after logging the error so no later code (that
uses metricsOptions) runs with an invalid value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants