Skip to content

Conversation

@gangwgr
Copy link
Contributor

@gangwgr gangwgr commented Nov 13, 2025

Added new OTE helper functions for common test scenarios:
WaitForClusterOperatorStatus - Core functionality for testing

  • WaitForClusterOperatorHealthy - Convenience wrapper
  • GetClusterOperatorConditionStatus - Status retrieval helper
  • Supporting helper functions for condition details and logging

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 13, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2025
@gangwgr gangwgr marked this pull request as ready for review November 13, 2025 14:25
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2025
@gangwgr gangwgr changed the title Add OTE helper functions in test/library/ote CNTRLPLANE-1724: Add OTE helper functions in test/library/ote Nov 13, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 13, 2025

@gangwgr: This pull request references CNTRLPLANE-1724 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Add two new OTE helper functions for common test scenarios:

  1. WaitForAPIServerRollout: Waits for all API server pods to be recreated after a configuration change. Unlike WaitForAPIServerToStabilizeOnTheSameRevision, this specifically waits for NEW pods to replace old ones.

  2. WaitForFeatureGateEnabled: Waits for a specific feature gate to be enabled in the cluster by polling the FeatureGate resource.

These functions are needed for testing configuration changes and feature gate
enablement in operator e2e tests, particularly for EventTTL configuration tests
in cluster-kube-apiserver-operator.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wangke19
Copy link
Contributor

wangke19 commented Nov 19, 2025

Discussion in slack for this: https://redhat-internal.slack.com/archives/CC3CZCQHM/p1762941938449059?thread_ts=1762421042.028719&cid=CC3CZCQHM
https://github.com/openshift/library-go/tree/master/test would hold test helpers we use to create test cases across our repositories
https://github.com/openshift/library-go/blob/master/test/library/apiserver/apiserver.go we have similar func.

@gangwgr
Copy link
Contributor Author

gangwgr commented Nov 19, 2025

Discussion in slack for this: https://redhat-internal.slack.com/archives/CC3CZCQHM/p1762941938449059?thread_ts=1762421042.028719&cid=CC3CZCQHM https://github.com/openshift/library-go/tree/master/test would hold test helpers we use to create test cases across our repositories https://github.com/openshift/library-go/blob/master/test/library/apiserver/apiserver.go we have similar func.

Moved funcs to test. I checked existing funcs have different behaviour

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 19, 2025

@gangwgr: This pull request references CNTRLPLANE-1724 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Added new OTE helper functions for common test scenarios:
WaitForClusterOperatorStatus - Core functionality for testing

  • WaitForClusterOperatorHealthy - Convenience wrapper
  • GetClusterOperatorConditionStatus - Status retrieval helper
  • Supporting helper functions for condition details and logging

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

test/ote/util.go Outdated
// map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"},
// 10*time.Minute, 1.0)
func WaitForClusterOperatorStatus(t library.LoggingT, coClient configv1client.ClusterOperatorInterface, coName string, expectedStatus map[string]string, timeout time.Duration, waitMultiplier float64) error {
stableDelay := 100 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several hard-coded values should be constants.

stableDelay := 100 * time.Second  // Line 23
 maxConsecutiveErrors := 5          // Line 36
 if attempt%3 == 0 {               // Line 91
 if attempt%4 == 0 {               // Line 357

Suggestion:

const (
      DefaultStableDelay = 100 * time.Second
      MaxConsecutiveErrors = 5
      StatusLogInterval = 3  // log every 3rd attempt
      RolloutLogInterval = 4 // log every 4th attempt
  )

test/ote/util.go Outdated
// status, err := GetClusterOperatorConditionStatus(ctx, coClient, "kube-apiserver",
// map[string]string{"Available": "", "Progressing": "", "Degraded": ""})
// Returns: map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}
func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface, coName string, statusToCompare map[string]string) (map[string]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statusToCompare parameter name is misleading - it's actually used as a template for which conditions to query, not for comparison. Consider renaming:

Suggested change
func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface, coName string, statusToCompare map[string]string) (map[string]string, error) {
func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface,
coName string, conditionsToQuery map[string]string) (map[string]string, error)

test/ote/util.go Outdated
// - All existing pods must be replaced by new pods created after this function is called
// - Supports both single-node and multi-node deployments
func WaitForAPIServerRollout(t library.LoggingT, podClient corev1client.PodInterface, labelSelector string, timeout time.Duration) error {
rolloutStartTime := time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 295-300: Getting initial pods outside the poll loop could create timing issues:

rolloutStartTime := time.Now() // Line 281
// ... get initial pods ...

If pods start rolling out between time.Now() and the first poll iteration, you might miss the transition.

Suggestion: Move rolloutStartTime to be captured atomically with initial pod state.

for _, condType := range conditionTypes {
if detail, ok := details[condType]; ok {
if detail.Status == "Unknown" {
t.Logf(" %s: %s (%s)", detail.Type, detail.Status, detail.Reason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 166-169, 186-189, 204-207: Message truncation is repeated three times.

Suggestion: Extract to helper:

  func truncateMessage(msg string, maxLen int) string {
      if len(msg) > maxLen {
          return msg[:maxLen-3] + "..."
      }
      return msg
  }

test/ote/util.go Outdated
var lastStatus map[string]string
var lastConditionDetails map[string]conditionDetail

errCo := wait.PollUntilContextTimeout(context.Background(), 20*time.Second, timeout, false, func(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context Propagation: Lines 40, 119, 313: Consider accepting context as a parameter to WaitForClusterOperatorStatus and WaitForAPIServerRollout instead of creating new background
contexts, allowing callers to control cancellation.

@wangke19
Copy link
Contributor

wangke19 commented Jan 5, 2026

Better to raise one PR in other component repo to reference the code, prove it works very well.

test/ote/util.go Outdated
eq := reflect.DeepEqual(expectedStatus, gottenStatus)
if eq {
// Check if this is the stable healthy state
isHealthyState := reflect.DeepEqual(expectedStatus, map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a variable for map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}

Suggested change
isHealthyState := reflect.DeepEqual(expectedStatus, map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"})
var HealthyConditions = map[string]string{
"Available": "True",
"Progressing": "False",
"Degraded": "False",
}

test/ote/util.go Outdated
// err := WaitForClusterOperatorHealthy(ctx, t, coClient, "kube-apiserver", 10*time.Minute, 1.0)
func WaitForClusterOperatorHealthy(ctx context.Context, t library.LoggingT, coClient configv1client.ClusterOperatorInterface, coName string, timeout time.Duration, waitMultiplier float64) error {
return WaitForClusterOperatorStatus(ctx, t, coClient, coName,
map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/ote/util.go Outdated
Comment on lines 144 to 145
statusMap := make(map[string]string)
detailsMap := make(map[string]conditionDetail)
Copy link
Contributor

@wangke19 wangke19 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable naming inconsistency

Suggested change
statusMap := make(map[string]string)
detailsMap := make(map[string]conditionDetail)
status := make(map[string]string)
details := make(map[string]conditionDetail)

test/ote/util.go Outdated
return false, nil
}

eq := reflect.DeepEqual(expectedStatus, gottenStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable eq is only used once - inline it:

Suggested change
eq := reflect.DeepEqual(expectedStatus, gottenStatus)
if reflect.DeepEqual(expectedStatus, gottenStatus) {
t.Logf("ClusterOperator %s is stably available/non-progressing/non-degraded", coName)
return true, nil
}

test/ote/util.go Outdated
// logConditionDetails logs detailed information about all conditions
func logConditionDetails(t library.LoggingT, details map[string]conditionDetail) {
// Sort condition types for consistent output
conditionTypes := []string{"Available", "Progressing", "Degraded"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant list and could be a package-level constant for reuse.

test/ote/util.go Outdated
return nil, fmt.Errorf("failed to get ClusterOperator %s: %w", coName, err)
}

statusMap := make(map[string]string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
statusMap := make(map[string]string)
status := make(map[string]string)

  Add two new OTE helper functions for common test scenarios:

  1. WaitForAPIServerRollout: Waits for all API server pods to be recreated
     after a configuration change. Unlike WaitForAPIServerToStabilizeOnTheSameRevision,
     this specifically waits for NEW pods to replace old ones.

  2. WaitForFeatureGateEnabled: Waits for a specific feature gate to be enabled
     in the cluster by polling the FeatureGate resource.

  These functions are needed for testing configuration changes and feature gate
  enablement in operator e2e tests, particularly for EventTTL configuration tests
  in cluster-kube-apiserver-operator.
@wangke19
Copy link
Contributor

wangke19 commented Jan 6, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr, wangke19
Once this PR has been reviewed and has the lgtm label, please assign bertinatto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gangwgr
Copy link
Contributor Author

gangwgr commented Jan 6, 2026

/test unit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2026

@gangwgr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gangwgr
Copy link
Contributor Author

gangwgr commented Jan 7, 2026

/assign @p0lyn0mial

@@ -0,0 +1,392 @@
package ote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we copy the code from existing repo or is it brand new code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is new code not existing. i created some new functions to use in component repo

@@ -0,0 +1,392 @@
package ote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just realized that these function could be also defined directly in the origin repo. Thoughts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel not needed there

@@ -0,0 +1,392 @@
package ote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests will use these functions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for this openshift/cluster-kube-apiserver-operator#1988 and some qe tests cases which we later migrate in component repo

// Note:
// - All existing pods must be replaced by new pods created after this function is called
// - Supports both single-node and multi-node deployments
func WaitForAPIServerRollout(ctx context.Context, t library.LoggingT, podClient corev1client.PodInterface, labelSelector string, timeout time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitForAPIServerToStabilizeOnTheSameRevision (Existing Function)

  • Polls until all API server pods are on the same revision (e.g., all on revision 10)
  • Does NOT care when pods were created - just checks they're all on the same revision

WaitForAPIServerRollout

  • Waits for ALL pods to be RECREATED with new pods

    • Captures the current pod creation times when called
    • Polls until ALL existing pods are replaced by new pods created after the function was called
    • All new pods must be in Running state
    • Cares about pod creation time - ensures every pod is new
      -When we need to guarantee that ALL pods have been recreated with fresh configuration (e.g., after secret rotation, config changes that require pod restart)

    Key Difference:
    WaitForAPIServerToStabilizeOnTheSameRevision:

    • Old pods on revision 10 → New pods on revision 10 (PASSES - same revision)

    WaitForAPIServerRollout:

    • Old pods created at 10:00am → Old pods on same revision 10 (FAILS - pods not recreated)
    • Old pods created at 10:00am → New pods created at 10:15am (PASSES - all pods recreated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants