-
Notifications
You must be signed in to change notification settings - Fork 259
CNTRLPLANE-1724: Add OTE helper functions in test/library/ote #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
@gangwgr: This pull request references CNTRLPLANE-1724 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Discussion in slack for this: https://redhat-internal.slack.com/archives/CC3CZCQHM/p1762941938449059?thread_ts=1762421042.028719&cid=CC3CZCQHM |
Moved funcs to test. I checked existing funcs have different behaviour |
|
@gangwgr: This pull request references CNTRLPLANE-1724 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
test/ote/util.go
Outdated
| // map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}, | ||
| // 10*time.Minute, 1.0) | ||
| func WaitForClusterOperatorStatus(t library.LoggingT, coClient configv1client.ClusterOperatorInterface, coName string, expectedStatus map[string]string, timeout time.Duration, waitMultiplier float64) error { | ||
| stableDelay := 100 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several hard-coded values should be constants.
stableDelay := 100 * time.Second // Line 23
maxConsecutiveErrors := 5 // Line 36
if attempt%3 == 0 { // Line 91
if attempt%4 == 0 { // Line 357
Suggestion:
const (
DefaultStableDelay = 100 * time.Second
MaxConsecutiveErrors = 5
StatusLogInterval = 3 // log every 3rd attempt
RolloutLogInterval = 4 // log every 4th attempt
)
test/ote/util.go
Outdated
| // status, err := GetClusterOperatorConditionStatus(ctx, coClient, "kube-apiserver", | ||
| // map[string]string{"Available": "", "Progressing": "", "Degraded": ""}) | ||
| // Returns: map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"} | ||
| func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface, coName string, statusToCompare map[string]string) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statusToCompare parameter name is misleading - it's actually used as a template for which conditions to query, not for comparison. Consider renaming:
| func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface, coName string, statusToCompare map[string]string) (map[string]string, error) { | |
| func GetClusterOperatorConditionStatus(ctx context.Context, coClient configv1client.ClusterOperatorInterface, | |
| coName string, conditionsToQuery map[string]string) (map[string]string, error) |
test/ote/util.go
Outdated
| // - All existing pods must be replaced by new pods created after this function is called | ||
| // - Supports both single-node and multi-node deployments | ||
| func WaitForAPIServerRollout(t library.LoggingT, podClient corev1client.PodInterface, labelSelector string, timeout time.Duration) error { | ||
| rolloutStartTime := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 295-300: Getting initial pods outside the poll loop could create timing issues:
rolloutStartTime := time.Now() // Line 281
// ... get initial pods ...
If pods start rolling out between time.Now() and the first poll iteration, you might miss the transition.
Suggestion: Move rolloutStartTime to be captured atomically with initial pod state.
| for _, condType := range conditionTypes { | ||
| if detail, ok := details[condType]; ok { | ||
| if detail.Status == "Unknown" { | ||
| t.Logf(" %s: %s (%s)", detail.Type, detail.Status, detail.Reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 166-169, 186-189, 204-207: Message truncation is repeated three times.
Suggestion: Extract to helper:
func truncateMessage(msg string, maxLen int) string {
if len(msg) > maxLen {
return msg[:maxLen-3] + "..."
}
return msg
}
test/ote/util.go
Outdated
| var lastStatus map[string]string | ||
| var lastConditionDetails map[string]conditionDetail | ||
|
|
||
| errCo := wait.PollUntilContextTimeout(context.Background(), 20*time.Second, timeout, false, func(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context Propagation: Lines 40, 119, 313: Consider accepting context as a parameter to WaitForClusterOperatorStatus and WaitForAPIServerRollout instead of creating new background
contexts, allowing callers to control cancellation.
|
Better to raise one PR in other component repo to reference the code, prove it works very well. |
test/ote/util.go
Outdated
| eq := reflect.DeepEqual(expectedStatus, gottenStatus) | ||
| if eq { | ||
| // Check if this is the stable healthy state | ||
| isHealthyState := reflect.DeepEqual(expectedStatus, map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a variable for map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}
| isHealthyState := reflect.DeepEqual(expectedStatus, map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}) | |
| var HealthyConditions = map[string]string{ | |
| "Available": "True", | |
| "Progressing": "False", | |
| "Degraded": "False", | |
| } | |
test/ote/util.go
Outdated
| // err := WaitForClusterOperatorHealthy(ctx, t, coClient, "kube-apiserver", 10*time.Minute, 1.0) | ||
| func WaitForClusterOperatorHealthy(ctx context.Context, t library.LoggingT, coClient configv1client.ClusterOperatorInterface, coName string, timeout time.Duration, waitMultiplier float64) error { | ||
| return WaitForClusterOperatorStatus(ctx, t, coClient, coName, | ||
| map[string]string{"Available": "True", "Progressing": "False", "Degraded": "False"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/ote/util.go
Outdated
| statusMap := make(map[string]string) | ||
| detailsMap := make(map[string]conditionDetail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable naming inconsistency
| statusMap := make(map[string]string) | |
| detailsMap := make(map[string]conditionDetail) | |
| status := make(map[string]string) | |
| details := make(map[string]conditionDetail) |
test/ote/util.go
Outdated
| return false, nil | ||
| } | ||
|
|
||
| eq := reflect.DeepEqual(expectedStatus, gottenStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable eq is only used once - inline it:
| eq := reflect.DeepEqual(expectedStatus, gottenStatus) | |
| if reflect.DeepEqual(expectedStatus, gottenStatus) { | |
| t.Logf("ClusterOperator %s is stably available/non-progressing/non-degraded", coName) | |
| return true, nil | |
| } | |
test/ote/util.go
Outdated
| // logConditionDetails logs detailed information about all conditions | ||
| func logConditionDetails(t library.LoggingT, details map[string]conditionDetail) { | ||
| // Sort condition types for consistent output | ||
| conditionTypes := []string{"Available", "Progressing", "Degraded"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constant list and could be a package-level constant for reuse.
test/ote/util.go
Outdated
| return nil, fmt.Errorf("failed to get ClusterOperator %s: %w", coName, err) | ||
| } | ||
|
|
||
| statusMap := make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| statusMap := make(map[string]string) | |
| status := make(map[string]string) |
Add two new OTE helper functions for common test scenarios:
1. WaitForAPIServerRollout: Waits for all API server pods to be recreated
after a configuration change. Unlike WaitForAPIServerToStabilizeOnTheSameRevision,
this specifically waits for NEW pods to replace old ones.
2. WaitForFeatureGateEnabled: Waits for a specific feature gate to be enabled
in the cluster by polling the FeatureGate resource.
These functions are needed for testing configuration changes and feature gate
enablement in operator e2e tests, particularly for EventTTL configuration tests
in cluster-kube-apiserver-operator.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gangwgr, wangke19 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test unit |
|
@gangwgr: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign @p0lyn0mial |
| @@ -0,0 +1,392 @@ | |||
| package ote | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we copy the code from existing repo or is it brand new code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is new code not existing. i created some new functions to use in component repo
| @@ -0,0 +1,392 @@ | |||
| package ote | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just realized that these function could be also defined directly in the origin repo. Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel not needed there
| @@ -0,0 +1,392 @@ | |||
| package ote | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests will use these functions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for this openshift/cluster-kube-apiserver-operator#1988 and some qe tests cases which we later migrate in component repo
| // Note: | ||
| // - All existing pods must be replaced by new pods created after this function is called | ||
| // - Supports both single-node and multi-node deployments | ||
| func WaitForAPIServerRollout(ctx context.Context, t library.LoggingT, podClient corev1client.PodInterface, labelSelector string, timeout time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this function differ from https://github.com/openshift/library-go/blob/master/test/library/apiserver/apiserver.go#L36
why do we need both ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WaitForAPIServerToStabilizeOnTheSameRevision (Existing Function)
- Polls until all API server pods are on the same revision (e.g., all on revision 10)
- Does NOT care when pods were created - just checks they're all on the same revision
WaitForAPIServerRollout
-
Waits for ALL pods to be RECREATED with new pods
- Captures the current pod creation times when called
- Polls until ALL existing pods are replaced by new pods created after the function was called
- All new pods must be in Running state
- Cares about pod creation time - ensures every pod is new
-When we need to guarantee that ALL pods have been recreated with fresh configuration (e.g., after secret rotation, config changes that require pod restart)
Key Difference:
WaitForAPIServerToStabilizeOnTheSameRevision:- Old pods on revision 10 → New pods on revision 10 (PASSES - same revision)
WaitForAPIServerRollout:
- Old pods created at 10:00am → Old pods on same revision 10 (FAILS - pods not recreated)
- Old pods created at 10:00am → New pods created at 10:15am (PASSES - all pods recreated)
Added new OTE helper functions for common test scenarios:
WaitForClusterOperatorStatus- Core functionality for testingWaitForClusterOperatorHealthy- Convenience wrapperGetClusterOperatorConditionStatus- Status retrieval helper