-
Notifications
You must be signed in to change notification settings - Fork 49
add new healthcheck eco-go-test #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
klaskosk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a more general note on comments (since I know gemini has a tendency to add way too many 🙂): you only need to add them to explain why, not what. The linter will complain for exported types/consts/var/etc so things like consts may have more obvious comments, but inside a function you shouldn't need that many.
This is a good example, where the comment explains why we can set clusterName := strings.Split(cluster.Server, ".")[1]. Without the comment it's not clear without knowing the format of the input why that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be typed as glog.Level but it's unused so that didn't show up
as for the actual number, the convention so far is to leave 100 for eco-goinfra logs and then use lower numbers for eco-gotests. In cnf/ran we usually do 90 on the test suite level and 80 on the cnf/ran level, although we always set verbosity to 100 in CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for Expect(err).ToNot(HaveOccurred(),, the error will be printed regardless so we don't need to specify it in the format string. although good to be aware that you can format like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like above, you can just format directly in the matcher. no need to use fmt.Sprintf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the eco-goinfra/pkg/nodes.Builder type has an IsReady method, so it would be good to reuse that method if possible
this is a good implementation of it though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins console will get all log messages, so the biggest advantage is that GinkgoWriter will show up in the junit that gets exported. But unless you have a specific reason for doing that it'd be preferred to use glog.V(tsparams.LogLevel).Info() or Infof. Plus glog will add the newline automatically
|
another thing since we just moved the repo is that you'll need to rebase ( |
7909402 to
40e4f89
Compare
WalkthroughAdds two Ginkgo/Gomega test suites and supporting packages: a minimal "Cluster Health Check" healthcheck suite and a comprehensive "Node Health CNF RAN" suite. Introduces test params, initialization helpers, reporting hooks (JUnit/XML), multiple node-health validations, and documentation for the node-health tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Suite as Healthcheck Suite
participant Config as RANConfig
participant Spec as Healthcheck Spec
participant API as Kubernetes API (Spoke1APIClient)
Suite->>Config: GetJunitReportPath(currentFile)
Config-->>Suite: JUnit path
Suite->>Suite: RegisterFailHandler(Fail)\nRunSpecs(filter: "healthcheck")
Spec->>API: List Nodes (context.TODO())
API-->>Spec: NodeList or error
alt error
Spec->>Suite: Fail (error)
else success
Spec->>Spec: Expect nodeList non-empty
loop each node
Spec->>Spec: Check NodeReady == ConditionTrue
alt not ready
Spec->>Suite: Fail (node not ready)
else ready
Spec->>Spec: Log readiness
end
end
end
sequenceDiagram
autonumber
participant Suite as NodeHealth Suite
participant Init as InitTools/APIClient
participant Reporter as Reporter/ReportXML
participant API as Kubernetes API
Note over Suite: Suite startup
Suite->>Init: resolve currentFile, get APIClient
Suite->>Reporter: configure JUnit/XML paths
Suite->>Suite: RegisterFailHandler(Fail)\nRunSpecs(filter: nodehealthparams.Labels)
Note over Suite,API: Test execution (high level)
Suite->>API: BeforeAll: List Nodes -> nodesList
Suite->>API: Ordered Contexts: Readiness, Pressure, Resources, Kubelet, Conditions, Monitoring
API-->>Suite: Node and Pod data responses
alt any spec failed
Suite->>Reporter: JustAfterEach -> ReportIfFailed(CurrentSpecReport(), currentFile, namespaces, crds)
end
Note over Suite: AfterSuite
Suite->>Reporter: reportxml.Create(report, GeneralConfig.GetReportPath(), GeneralConfig.TCPrefix)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)
7-7: Log-level convention ack (90 for suite-level; 100 reserved for eco-goinfra)This aligns with the convention described earlier; keep as-is for
cnf/ransuite.tests/cnf/ran/healthcheck/tests/healthcheck.go (2)
30-31: Drop custom message for err matcher; Gomega will print the errorCleaner failure output; previous review mentioned this as well.
- Expect(err).ToNot(HaveOccurred(), "Failed to list cluster nodes") + Expect(err).ToNot(HaveOccurred())
34-51: Optional: reuse eco-goinfra’s Node builderIsReadyhelperReduces custom condition-walking and aligns with library semantics.
- By("Verifying all nodes are in Ready state") - // Iterate through each node retrieved from the cluster. - for _, node := range nodeList { - // Initialize a flag to track if the node is Ready. - isReady := false - // Iterate through the conditions reported by the node. - for _, condition := range node.Status.Conditions { - // Check for the "Ready" condition type and ensure its status is "True". - if condition.Type == corev1.NodeReady && condition.Status == corev1.ConditionTrue { - isReady = true - break // Found the Ready condition, no need to check further conditions for this node. - } - } - // Assert that the node is ready. If not, the test will fail and report the node's name. - Expect(isReady).To(BeTrue(), "Node %s is not in Ready state. Current conditions: %+v", node.Name, node.Status.Conditions) - // Print a message to the glog output for successful nodes. - glog.V(tsparams.LogLevel).Infof("Node %s is Ready.", node.Name) - } + By("Verifying all nodes are in Ready state") + for _, node := range nodeList { + ready, err := nodes.NewBuilder(Spoke1APIClient, node.Name).IsReady(context.TODO()) + Expect(err).ToNot(HaveOccurred()) + Expect(ready).To(BeTrue(), "Node %s is not in Ready state. Current conditions: %+v", node.Name, node.Status.Conditions) + glog.V(tsparams.LogLevel).Infof("Node %s is Ready.", node.Name) + }
🧹 Nitpick comments (2)
tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)
3-7: Polish exported comments to satisfy lintersStart comments with the identifier names.
-// This label will be used in the Jenkins job's TEST_TYPE parameter. +// LabelHealthCheckTestCases is used in the Jenkins job's TEST_TYPE parameter. const LabelHealthCheckTestCases = "healthcheck" -// defines the verbosity level for glog. +// LogLevel defines the default verbosity level for glog. const LogLevel glog.Level = 90tests/cnf/ran/healthcheck/tests/healthcheck.go (1)
25-31: Optional: use a timeout context for API callsPrevents hangs on slow/unreachable clusters.
- nodeList, err := nodes.List(Spoke1APIClient, context.TODO()) + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + nodeList, err := nodes.List(Spoke1APIClient, ctx)Add import if adopting:
-import ( +import ( "context" + "time"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/cnf/ran/healthcheck/healthcheck_suite_test.go(1 hunks)tests/cnf/ran/healthcheck/internal/tsparams/params.go(1 hunks)tests/cnf/ran/healthcheck/tests/healthcheck.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/cnf/ran/healthcheck/healthcheck_suite_test.go (1)
tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)
LabelHealthCheckTestCases(4-4)
tests/cnf/ran/healthcheck/tests/healthcheck.go (2)
tests/cnf/ran/healthcheck/internal/tsparams/params.go (2)
LabelHealthCheckTestCases(4-4)LogLevel(7-7)tests/cnf/ran/internal/raninittools/raninittools.go (1)
Spoke1APIClient(14-14)
🔇 Additional comments (2)
tests/cnf/ran/healthcheck/healthcheck_suite_test.go (1)
21-22: RunSpecs invocation is consistent with repository conventions; no changes requiredAfter reviewing the
RunSpecscalls across all suite tests (e.g. vcore, spk, RDS, O-Cloud, RAN O-RAN, TALM, etc.), every test uses the pattern:RunSpecs(t, "<Suite Name>", Label(...), reporterConfig)Your
Cluster Health Checksuite does the same:RunSpecs(t, "Cluster Health Check", Label(tsparams.LabelHealthCheckTestCases), reporterConfig)This matches the established Ginkgo v2 usage in our codebase, so no update is necessary here.
tests/cnf/ran/healthcheck/tests/healthcheck.go (1)
53-53: LGTM: concise summary loggingFinal readiness summary at the configured verbosity is helpful and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (10)
tests/cnf/ran/node-health/README.md (1)
27-38: Add language specifiers to fenced code blocks (markdownlint MD040)Add a language to the directory tree and log examples to satisfy linters.
-``` +```text ... -``` +```-``` +```text INFO: Found 3 nodes for health monitoring ... -``` +```Also applies to: 141-150
tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go (2)
13-17: Avoid recommending dot-imports; export a stable API insteadDot-imports reduce clarity and can cause identifier collisions. The comment suggests dot-importing this package, but only APIClient is exported anyway.
-// init loads all variables automatically when this package is imported. Once package is imported a user has full -// access to all vars within init function. It is recommended to import this package using dot import. +// init exposes shared clients when this package is imported. +// Import this package normally and reference nodehealthinittools.APIClient explicitly.
8-17: Remove or consolidate the nodehealthinittools re-export
No references tonodehealthinittools.APIClientwere found—tests still dot-importtests/internal/inittools. Either switch tests to import this package forAPIClientor drop it to eliminate redundancy.tests/cnf/ran/node-health/tests/node_health_validation.go (3)
156-181: Disk “usage” test doesn’t assert usage; it only logs image sizesCurrently this only checks Capacity/Allocatable presence and logs image sizes. If thresholds are intended (per README), add real assertions from metrics or kubelet stats API.
I can wire this to metrics.k8s.io (if enabled) or node allocatable vs. requested via schedulable resources.
372-411: Long-running monitoring in an It block; parameterize or gateA hardcoded 2 minutes with 30s sleeps will slow suites and may hit CI timeouts.
- Read duration/interval from env (e.g., ECO_NODE_MONITOR_DURATION, ECO_NODE_MONITOR_INTERVAL) with sensible defaults.
- Skip or shorten when GeneralConfig.DryRun is true.
3-20: Prefer avoiding dot-imports for inittoolsDot-importing inittools pulls many globals into this package’s scope. Prefer a named import and explicit references for readability.
- . "github.com/rh-ecosystem-edge/eco-gotests/tests/internal/inittools" + inittools "github.com/rh-ecosystem-edge/eco-gotests/tests/internal/inittools"Then reference inittools.GeneralConfig and inittools.APIClient explicitly.
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (4)
42-50: Use time.Duration for timeouts/intervals to avoid unit ambiguity.Seconds as bare ints are error-prone; prefer typed durations.
Apply:
- // KubeletHealthCheckTimeout is the timeout for kubelet health checks. - KubeletHealthCheckTimeout = 30 + // KubeletHealthCheckTimeout is the timeout for kubelet health checks. + KubeletHealthCheckTimeout = 30 * time.Second @@ - // NodeConditionCheckTimeout is the timeout for node condition checks. - NodeConditionCheckTimeout = 60 + // NodeConditionCheckTimeout is the timeout for node condition checks. + NodeConditionCheckTimeout = 60 * time.Second @@ - // ResourceCheckInterval is the interval between resource checks. - ResourceCheckInterval = 10 + // ResourceCheckInterval is the interval between resource checks. + ResourceCheckInterval = 10 * time.SecondAlso add the import:
import ( k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter" corev1 "k8s.io/api/core/v1" "time" )
51-59: Mirror Kubernetes constants instead of hardcoding strings.Prevents drift and typos while keeping the same exported names.
Apply:
- // ConditionTypeReadyString constant to fix linter warning. - ConditionTypeReadyString = "Ready" - - // ConstantTrueString constant to fix linter warning. - ConstantTrueString = "True" - - // ConstantFalseString constant to fix linter warning. - ConstantFalseString = "False" + // ConditionTypeReadyString mirrors corev1.NodeReady. + ConditionTypeReadyString = string(corev1.NodeReady) + + // ConstantTrueString mirrors corev1.ConditionTrue. + ConstantTrueString = string(corev1.ConditionTrue) + + // ConstantFalseString mirrors corev1.ConditionFalse. + ConstantFalseString = string(corev1.ConditionFalse)
1-2: Add a package doc comment for lint compliance.Apply:
+// Package nodehealthparams centralizes labels, thresholds, and reporter settings +// used by the Node Health CNF tests. package nodehealthparams
83-95: Reporter defaults are sensible.Namespaces and CRDs chosen are useful for triage. Consider adding tuned (openshift-cluster-node-tuning-operator) if node tuning issues are in scope, but optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
tests/cnf/ran/node-health/README.md(1 hunks)tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go(1 hunks)tests/cnf/ran/node-health/internal/nodehealthparams/const.go(1 hunks)tests/cnf/ran/node-health/node_health_suite_test.go(1 hunks)tests/cnf/ran/node-health/tests/node_health_validation.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/cnf/ran/node-health/node_health_suite_test.go (3)
tests/internal/config/config.go (1)
GeneralConfig(21-42)tests/cnf/ran/node-health/internal/nodehealthparams/const.go (4)
Label(10-10)Labels(74-81)ReporterNamespacesToDump(84-88)ReporterCRDsToDump(91-95)tests/internal/reporter/reporter.go (1)
ReportIfFailed(51-57)
tests/cnf/ran/node-health/tests/node_health_validation.go (2)
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (2)
KubeletNamespace(61-61)KubeletPodSelector(64-64)tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go (1)
APIClient(10-10)
🪛 LanguageTool
tests/cnf/ran/node-health/README.md
[grammar] ~9-~9: There might be a mistake here.
Context: ...Node Readiness**: Ensures all nodes are in Ready state for CNF workloads - **Press...
(QB_NEW_EN)
[grammar] ~42-~42: There might be a mistake here.
Context: ...gories ### 1. Node Readiness Validation - Test ID: node-health-001 - **Purpose...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ...ode Readiness Validation - Test ID: node-health-001 - Purpose: Verify all nodes are in Ready...
(QB_NEW_EN)
[grammar] ~44-~44: There might be a mistake here.
Context: ...01` - Purpose: Verify all nodes are in Ready state for CNF workloads - **Label...
(QB_NEW_EN)
[grammar] ~45-~45: There might be a mistake here.
Context: ...y state for CNF workloads - Labels: node-readiness-check - Description: Checks that each node has...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ... True ### 2. Node Pressure Validation - Test ID: node-health-003 to `node-he...
(QB_NEW_EN)
[grammar] ~49-~49: There might be a mistake here.
Context: ...ion - Test ID: node-health-003 to node-health-005 - Purpose: Monitor node pressure conditi...
(QB_NEW_EN)
[grammar] ~50-~50: There might be a mistake here.
Context: ... conditions critical for RAN performance - Labels: disk-pressure-check, `memory...
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...essure-check, memory-pressure-check, network-pressure-check` - Description: Validates that nodes are ...
(QB_NEW_EN)
[grammar] ~54-~54: There might be a mistake here.
Context: ...e ### 3. Node Resource Usage Validation - Test ID: node-health-006 to `node-he...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...ion - Test ID: node-health-006 to node-health-008 - Purpose: Monitor resource utilization ...
(QB_NEW_EN)
[grammar] ~56-~56: There might be a mistake here.
Context: ...ce utilization for CNF resource planning - Labels: disk-usage-check, `memory-us...
(QB_NEW_EN)
[grammar] ~57-~57: There might be a mistake here.
Context: ...isk-usage-check, memory-usage-check, cpu-usage-check` - Description: Verifies resource capacit...
(QB_NEW_EN)
[grammar] ~60-~60: There might be a mistake here.
Context: ...limits ### 4. Kubelet Status Validation - Test ID: node-health-009 to `node-he...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...ion - Test ID: node-health-009 to node-health-010 - Purpose: Ensure kubelet service health...
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...ing - Labels: kubelet-pod-status, kubelet-service-check - Description: Verifies kubelet pods are...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...nodes ### 5. Node Conditions Validation - Test ID: node-health-011 to `node-he...
(QB_NEW_EN)
[grammar] ~68-~68: There might be a mistake here.
Context: ...e condition monitoring for RAN stability - Labels: node-conditions-check, `node...
(QB_NEW_EN)
[grammar] ~72-~72: There might be a mistake here.
Context: ...sition times ### 6. Resource Monitoring - Test ID: node-health-013 - **Purpose...
(QB_NEW_EN)
[grammar] ~73-~73: There might be a mistake here.
Context: ...# 6. Resource Monitoring - Test ID: node-health-013 - Purpose: Continuous monitoring over ti...
(QB_NEW_EN)
[grammar] ~74-~74: There might be a mistake here.
Context: ...g over time for CNF performance analysis - Labels: resource-monitoring - **Desc...
(QB_NEW_EN)
[grammar] ~75-~75: There might be a mistake here.
Context: ... CNF performance analysis - Labels: resource-monitoring - Description: Monitors node health cont...
(QB_NEW_EN)
[grammar] ~80-~80: There might be a mistake here.
Context: ...ration ## Configuration ### Thresholds The test suite uses configurable thresho...
(QB_NEW_EN)
[grammar] ~155-~155: There might be a mistake here.
Context: ...es the standard reporting infrastructure - Labels: Follows the established labeli...
(QB_NEW_EN)
[grammar] ~156-~156: There might be a mistake here.
Context: ...ows the established labeling conventions - Parameters: Uses the parameter managem...
(QB_NEW_EN)
[grammar] ~157-~157: There might be a mistake here.
Context: ...**: Uses the parameter management system - Error Handling: Integrates with the fa...
(QB_NEW_EN)
[grammar] ~162-~162: There might be a mistake here.
Context: ...ra**: For Kubernetes resource management - Ginkgo v2: For BDD-style testing - **G...
(QB_NEW_EN)
[grammar] ~163-~163: There might be a mistake here.
Context: ...t - Ginkgo v2: For BDD-style testing - Gomega: For assertions and matchers - ...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ... Gomega: For assertions and matchers - Kubernetes client-go: For API interact...
(QB_NEW_EN)
[grammar] ~169-~169: There might be a mistake here.
Context: ...tomization ### Adding New Health Checks To add new health checks, extend the tes...
(QB_NEW_EN)
[grammar] ~181-~181: There might be a mistake here.
Context: ...ere }) ``` ### Modifying Thresholds Update the constants in `internal/nodehe...
(QB_NEW_EN)
[grammar] ~190-~190: There might be a mistake here.
Context: ...old ) ``` ### Adding New Resource Types Extend the resource validation tests to ...
(QB_NEW_EN)
[grammar] ~210-~210: There might be a mistake here.
Context: ...let is properly deployed ### Debug Mode Enable verbose logging by setting the lo...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
tests/cnf/ran/node-health/README.md
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (6)
tests/cnf/ran/node-health/README.md (1)
80-99: Documented thresholds/timeouts aren’t enforced by the codeThreshold and timeout constants are mentioned but not used in node_health_validation.go. Either wire them into the tests or remove from README to avoid drift.
Do you want me to wire these into the tests (CPU/memory/disk checks and polling intervals)?
tests/cnf/ran/node-health/node_health_suite_test.go (2)
27-35: LGTM: failure diagnostics and XML report hookThe JustAfterEach failure dump and ReportAfterSuite XML generation align with existing eco-gotests patterns.
20-25: No action required: RunSpecs invocation is correct
PassingLabel(...)and the mutatedreporterConfigintoRunSpecsis consistent with all other suites in this repo and aligns with Ginkgo v2’s DSL (labels as[]string, reporter overrides viaReporterConfig).tests/cnf/ran/node-health/tests/node_health_validation.go (1)
32-40: LGTM: node discovery and preconditionsListing nodes once in BeforeAll and asserting non-empty cluster is fine for this suite.
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (2)
66-71: MCD constants look correct.Label and container name match typical MCO deployments.
60-65: Validate OpenShift kubelet selector
In tests/cnf/ran/node-health/internal/nodehealthparams/const.go:60-65,KubeletPodSelector = "k8s-app=kubelet"won’t match any pods on OpenShift (kubelet runs as a systemd service). Manually confirm with:oc get pods -A -l k8s-app=kubeletIf no pods are returned, replace pod-based checks with node conditions (Ready) or query systemd status via the machine-config-daemon in the openshift-machine-config-operator namespace.
| import ( | ||
| "github.com/openshift-kni/k8sreporter" | ||
| corev1 "k8s.io/api/core/v1" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix moved module import path (build will fail after repo move).
Update k8sreporter to the new module path per PR comment about the repo move.
Apply:
-import (
- "github.com/openshift-kni/k8sreporter"
- corev1 "k8s.io/api/core/v1"
-)
+import (
+ k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter"
+ corev1 "k8s.io/api/core/v1"
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import ( | |
| "github.com/openshift-kni/k8sreporter" | |
| corev1 "k8s.io/api/core/v1" | |
| ) | |
| import ( | |
| k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter" | |
| corev1 "k8s.io/api/core/v1" | |
| ) |
🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/internal/nodehealthparams/const.go around lines
3–6, the import path for k8sreporter needs to be updated to the repo's new
module path after the move; replace "github.com/openshift-kni/k8sreporter" with
the new module path specified in the PR (adjusting any trailing /vX versioned
path if required), then run go mod tidy to update dependencies and fix any
import usages accordingly.
| const ( | ||
| // Label is used to select tests for Node Health monitoring. | ||
| Label = "node-health" | ||
|
|
||
| // NodeHealthLogLevel configures logging level for Node Health related tests. | ||
| NodeHealthLogLevel = 90 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm intended log level scale (90).
Value 90 is unusually high; ensure it matches your logging facade’s expectations (range, semantics). If unused, drop it.
Run:
🏁 Script executed:
#!/bin/bash
rg -n "NodeHealthLogLevel"Length of output: 258
Drop the unused NodeHealthLogLevel constant. It’s declared at tests/cnf/ran/node-health/internal/nodehealthparams/const.go:13 but never referenced; remove it if there’s no intended usage.
🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/internal/nodehealthparams/const.go around lines 8
to 14, the constant NodeHealthLogLevel is declared but unused; remove the
NodeHealthLogLevel constant definition from this file (and any trailing blank
line if needed) so only the used Label constant remains; if the log level was
intended to be configurable, instead add a TODO and a clear comment or implement
its usage where logging is configured.
| ### 4. Kubelet Status Validation | ||
| - **Test ID**: `node-health-009` to `node-health-010` | ||
| - **Purpose**: Ensure kubelet service health essential for CNF pod scheduling | ||
| - **Labels**: `kubelet-pod-status`, `kubelet-service-check` | ||
| - **Description**: Verifies kubelet pods are running and ready on all nodes | ||
|
|
||
| ### 5. Node Conditions Validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet “pod” wording is incorrect for OpenShift; tests will fail as written
On OpenShift/OKD, kubelet runs as a systemd service on the node, not as a Pod in kube-system. The current description (and matching tests) expect kubelet Pods and will consistently find none.
Consider updating both docs and tests to validate kubelet health via Node conditions (Ready) and/or kubelet health endpoint on the node, rather than listing Pods. If you keep a kubelet status section, reword to avoid “kubelet pods.”
🧰 Tools
🪛 LanguageTool
[grammar] ~60-~60: There might be a mistake here.
Context: ...limits ### 4. Kubelet Status Validation - Test ID: node-health-009 to `node-he...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...ion - Test ID: node-health-009 to node-health-010 - Purpose: Ensure kubelet service health...
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...ing - Labels: kubelet-pod-status, kubelet-service-check - Description: Verifies kubelet pods are...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...nodes ### 5. Node Conditions Validation - Test ID: node-health-011 to `node-he...
(QB_NEW_EN)
| ```bash | ||
| # Run tests with specific labels | ||
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness" | ||
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure" | ||
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet" | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix label names in run examples
Examples use non-existent labels. Replace them with labels actually present in the code.
Apply:
-# Run tests with specific labels
-go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness"
-go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure"
-go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet"
+# Run tests with specific labels
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-readiness"
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-pressure"
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="kubelet-status"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```bash | |
| # Run tests with specific labels | |
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness" | |
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure" | |
| go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet" | |
| ``` |
🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/README.md around lines 121 to 126, the example go
test commands reference nonexistent labels; update those three example lines to
use the actual label names used in the test suite: inspect the test files
(ginkgo Describe/It labels or any label metadata) to determine the correct label
keys/values (e.g., readiness, pressure, kubelet may be named differently) and
replace "node-health-readiness", "node-health-pressure", and
"node-health-kubelet" with the exact labels found; ensure the updated commands
preserve the same flags and quoting and run a quick local go test with one
corrected label to verify correctness.
| ### Debug Mode | ||
| Enable verbose logging by setting the log level: | ||
|
|
||
| ```bash | ||
| export GLOMAXLEVEL=5 | ||
| go test ./tests/cnf/ran/node-health/ -v | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong env var for verbosity
Use ECO_VERBOSE_LEVEL (per tests/internal/config/config.go), not GLOMAXLEVEL.
-export GLOMAXLEVEL=5
+export ECO_VERBOSE_LEVEL=5
go test ./tests/cnf/ran/node-health/ -v📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Debug Mode | |
| Enable verbose logging by setting the log level: | |
| ```bash | |
| export GLOMAXLEVEL=5 | |
| go test ./tests/cnf/ran/node-health/ -v | |
| ``` | |
| ### Debug Mode | |
| Enable verbose logging by setting the log level: | |
🧰 Tools
🪛 LanguageTool
[grammar] ~210-~210: There might be a mistake here.
Context: ...let is properly deployed ### Debug Mode Enable verbose logging by setting the lo...
(QB_NEW_EN)
🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/README.md around lines 210 to 216, the README shows
the wrong environment variable for test verbosity; update the instructions to
use ECO_VERBOSE_LEVEL (as defined in tests/internal/config/config.go) instead of
GLOMAXLEVEL and demonstrate exporting a numeric level (e.g., export
ECO_VERBOSE_LEVEL=5) before running the go test command to enable verbose
logging.
| Context("Kubelet Status Validation", Label("kubelet-status"), func() { | ||
| It("Verify kubelet pods are running on all nodes", | ||
| Label("kubelet-pod-status"), | ||
| reportxml.ID("node-health-009"), | ||
| func() { | ||
| for _, nodeBuilder := range nodesList { | ||
| nodeObj := nodeBuilder.Object | ||
| glog.Infof("Checking kubelet status on node: %s", nodeObj.Name) | ||
|
|
||
| // Check if kubelet pod is running on this node | ||
| podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List( | ||
| context.TODO(), | ||
| metav1.ListOptions{ | ||
| FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name), | ||
| LabelSelector: nodehealthparams.KubeletPodSelector, | ||
| }, | ||
| ) | ||
| Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name) | ||
|
|
||
| // Verify at least one kubelet pod is running | ||
| hasRunningKubelet := false | ||
| for _, pod := range podList.Items { | ||
| if pod.Status.Phase == corev1.PodRunning { | ||
| hasRunningKubelet = true | ||
| break | ||
| } | ||
| } | ||
|
|
||
| Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name) | ||
| glog.Infof("Kubelet pod is running on node %s", nodeObj.Name) | ||
| } | ||
| }) | ||
|
|
||
| It("Verify kubelet service is responding", | ||
| Label("kubelet-service-check"), | ||
| reportxml.ID("node-health-010"), | ||
| func() { | ||
| // This test would typically check kubelet health endpoint | ||
| // For now, we'll verify the kubelet pods are healthy | ||
| for _, nodeBuilder := range nodesList { | ||
| nodeObj := nodeBuilder.Object | ||
| glog.Infof("Checking kubelet service health on node: %s", nodeObj.Name) | ||
|
|
||
| // Check kubelet pod readiness | ||
| podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List( | ||
| context.TODO(), | ||
| metav1.ListOptions{ | ||
| FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name), | ||
| LabelSelector: nodehealthparams.KubeletPodSelector, | ||
| }, | ||
| ) | ||
| Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name) | ||
|
|
||
| for _, pod := range podList.Items { | ||
| // Check if pod is ready | ||
| isReady := false | ||
| for _, condition := range pod.Status.Conditions { | ||
| if condition.Type == corev1.PodReady { | ||
| isReady = condition.Status == corev1.ConditionTrue | ||
| break | ||
| } | ||
| } | ||
|
|
||
| Expect(isReady).To(BeTrue(), "Kubelet pod %s on node %s is not ready", pod.Name, nodeObj.Name) | ||
| glog.Infof("Kubelet pod %s on node %s is ready", pod.Name, nodeObj.Name) | ||
| } | ||
| } | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet checks assume kubelet Pods exist; this will fail on OpenShift
Kubelet isn’t a Pod on OpenShift/OKD. Listing kubelet Pods in kube-system with label k8s-app=kubelet will return nothing and the tests will fail.
Replace both kubelet tests with condition-based checks (or node heartbeat), e.g.:
- It("Verify kubelet pods are running on all nodes",
- Label("kubelet-pod-status"),
- reportxml.ID("node-health-009"),
- func() {
- ...
- podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(...)
- ...
- hasRunningKubelet := false
- for _, pod := range podList.Items {
- if pod.Status.Phase == corev1.PodRunning { hasRunningKubelet = true; break }
- }
- Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name)
- })
+ It("Verify kubelet is reporting Ready on all nodes",
+ Label("kubelet-service-check"),
+ reportxml.ID("node-health-009"),
+ func() {
+ for _, nodeBuilder := range nodesList {
+ nodeObj := nodeBuilder.Object
+ var readyCond *corev1.NodeCondition
+ for i := range nodeObj.Status.Conditions {
+ if nodeObj.Status.Conditions[i].Type == corev1.NodeReady {
+ readyCond = &nodeObj.Status.Conditions[i]
+ break
+ }
+ }
+ Expect(readyCond).NotTo(BeNil(), "Node %s has no Ready condition", nodeObj.Name)
+ Expect(readyCond.Status).To(Equal(corev1.ConditionTrue), "Node %s kubelet is not Ready", nodeObj.Name)
+ }
+ })
- It("Verify kubelet service is responding",
- Label("kubelet-service-check"),
- reportxml.ID("node-health-010"),
- func() {
- // Lists kubelet pods and checks readiness
- ...
- })
+ // Optionally, add a heartbeat freshness check by inspecting LastHeartbeatTime if desired.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Context("Kubelet Status Validation", Label("kubelet-status"), func() { | |
| It("Verify kubelet pods are running on all nodes", | |
| Label("kubelet-pod-status"), | |
| reportxml.ID("node-health-009"), | |
| func() { | |
| for _, nodeBuilder := range nodesList { | |
| nodeObj := nodeBuilder.Object | |
| glog.Infof("Checking kubelet status on node: %s", nodeObj.Name) | |
| // Check if kubelet pod is running on this node | |
| podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List( | |
| context.TODO(), | |
| metav1.ListOptions{ | |
| FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name), | |
| LabelSelector: nodehealthparams.KubeletPodSelector, | |
| }, | |
| ) | |
| Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name) | |
| // Verify at least one kubelet pod is running | |
| hasRunningKubelet := false | |
| for _, pod := range podList.Items { | |
| if pod.Status.Phase == corev1.PodRunning { | |
| hasRunningKubelet = true | |
| break | |
| } | |
| } | |
| Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name) | |
| glog.Infof("Kubelet pod is running on node %s", nodeObj.Name) | |
| } | |
| }) | |
| It("Verify kubelet service is responding", | |
| Label("kubelet-service-check"), | |
| reportxml.ID("node-health-010"), | |
| func() { | |
| // This test would typically check kubelet health endpoint | |
| // For now, we'll verify the kubelet pods are healthy | |
| for _, nodeBuilder := range nodesList { | |
| nodeObj := nodeBuilder.Object | |
| glog.Infof("Checking kubelet service health on node: %s", nodeObj.Name) | |
| // Check kubelet pod readiness | |
| podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List( | |
| context.TODO(), | |
| metav1.ListOptions{ | |
| FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name), | |
| LabelSelector: nodehealthparams.KubeletPodSelector, | |
| }, | |
| ) | |
| Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name) | |
| for _, pod := range podList.Items { | |
| // Check if pod is ready | |
| isReady := false | |
| for _, condition := range pod.Status.Conditions { | |
| if condition.Type == corev1.PodReady { | |
| isReady = condition.Status == corev1.ConditionTrue | |
| break | |
| } | |
| } | |
| Expect(isReady).To(BeTrue(), "Kubelet pod %s on node %s is not ready", pod.Name, nodeObj.Name) | |
| glog.Infof("Kubelet pod %s on node %s is ready", pod.Name, nodeObj.Name) | |
| } | |
| } | |
| }) | |
| }) | |
| Context("Kubelet Status Validation", Label("kubelet-status"), func() { | |
| It("Verify kubelet is reporting Ready on all nodes", | |
| Label("kubelet-service-check"), | |
| reportxml.ID("node-health-009"), | |
| func() { | |
| for _, nodeBuilder := range nodesList { | |
| nodeObj := nodeBuilder.Object | |
| var readyCond *corev1.NodeCondition | |
| for i := range nodeObj.Status.Conditions { | |
| if nodeObj.Status.Conditions[i].Type == corev1.NodeReady { | |
| readyCond = &nodeObj.Status.Conditions[i] | |
| break | |
| } | |
| } | |
| Expect(readyCond).NotTo(BeNil(), "Node %s has no Ready condition", nodeObj.Name) | |
| Expect(readyCond.Status).To(Equal(corev1.ConditionTrue), "Node %s kubelet is not Ready", nodeObj.Name) | |
| } | |
| }) | |
| // Optionally, add a heartbeat freshness check by inspecting LastHeartbeatTime if desired. | |
| }) |
| if condition.Type == corev1.NodeNetworkUnavailable && | ||
| strings.Contains(nodeObj.Name, "master") { | ||
| // Master nodes might have NetworkUnavailable=True during initial setup | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Don’t infer control-plane nodes from name substrings
Checking strings.Contains(nodeObj.Name, "master") is brittle. Use role labels.
- if condition.Type == corev1.NodeNetworkUnavailable &&
- strings.Contains(nodeObj.Name, "master") {
+ if condition.Type == corev1.NodeNetworkUnavailable && isControlPlaneNode(nodeObj) {
continue
}Add helper (outside this hunk):
func isControlPlaneNode(n *corev1.Node) bool {
if n == nil { return false }
_, hasCP := n.Labels["node-role.kubernetes.io/control-plane"]
_, hasMaster := n.Labels["node-role.kubernetes.io/master"]
return hasCP || hasMaster
}🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/tests/node_health_validation.go around lines 316 to
320, the code infers control-plane nodes by checking if nodeObj.Name contains
"master", which is brittle; add a helper function (placed outside this hunk)
named isControlPlaneNode that returns true if the node has either
node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label,
then replace the strings.Contains(nodeObj.Name, "master") check with a call to
isControlPlaneNode(&nodeObj) so the NetworkUnavailable exception is based on
role labels rather than name substrings.
| It("Verify node last transition times are recent", | ||
| Label("node-transition-time-check"), | ||
| reportxml.ID("node-health-012"), | ||
| func() { | ||
| for _, nodeBuilder := range nodesList { | ||
| nodeObj := nodeBuilder.Object | ||
| glog.Infof("Checking transition times on node: %s", nodeObj.Name) | ||
|
|
||
| // Check if node conditions have recent transition times | ||
| for _, condition := range nodeObj.Status.Conditions { | ||
| if condition.LastTransitionTime.IsZero() { | ||
| continue // Skip conditions without transition time | ||
| } | ||
|
|
||
| // Check if transition time is within reasonable bounds (not too old) | ||
| timeSinceTransition := time.Since(condition.LastTransitionTime.Time) | ||
| Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour), | ||
| "Node %s condition %s has very old transition time: %s", | ||
| nodeObj.Name, condition.Type, condition.LastTransitionTime) | ||
|
|
||
| glog.Infof("Node %s condition %s transition time: %s (age: %v)", | ||
| nodeObj.Name, condition.Type, condition.LastTransitionTime, timeSinceTransition) | ||
| } | ||
|
|
||
| glog.Infof("Transition times on node %s are recent", nodeObj.Name) | ||
| } | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Transition time < 24h will flake on stable clusters
Many clusters have not transitioned conditions for days/weeks. The 24h bound will cause false failures.
- Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour),
+ Expect(timeSinceTransition).To(BeNumerically("<", 30*24*time.Hour),
"Node %s condition %s has very old transition time: %s",
nodeObj.Name, condition.Type, condition.LastTransitionTime)Optionally, make this threshold configurable (env or nodehealthparams) and only warn/log when exceeded.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| It("Verify node last transition times are recent", | |
| Label("node-transition-time-check"), | |
| reportxml.ID("node-health-012"), | |
| func() { | |
| for _, nodeBuilder := range nodesList { | |
| nodeObj := nodeBuilder.Object | |
| glog.Infof("Checking transition times on node: %s", nodeObj.Name) | |
| // Check if node conditions have recent transition times | |
| for _, condition := range nodeObj.Status.Conditions { | |
| if condition.LastTransitionTime.IsZero() { | |
| continue // Skip conditions without transition time | |
| } | |
| // Check if transition time is within reasonable bounds (not too old) | |
| timeSinceTransition := time.Since(condition.LastTransitionTime.Time) | |
| Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour), | |
| "Node %s condition %s has very old transition time: %s", | |
| nodeObj.Name, condition.Type, condition.LastTransitionTime) | |
| glog.Infof("Node %s condition %s transition time: %s (age: %v)", | |
| nodeObj.Name, condition.Type, condition.LastTransitionTime, timeSinceTransition) | |
| } | |
| glog.Infof("Transition times on node %s are recent", nodeObj.Name) | |
| } | |
| }) | |
| }) | |
| // Check if transition time is within reasonable bounds (not too old) | |
| timeSinceTransition := time.Since(condition.LastTransitionTime.Time) | |
| Expect(timeSinceTransition).To(BeNumerically("<", 30*24*time.Hour), | |
| "Node %s condition %s has very old transition time: %s", | |
| nodeObj.Name, condition.Type, condition.LastTransitionTime) |
🤖 Prompt for AI Agents
In tests/cnf/ran/node-health/tests/node_health_validation.go around lines 343 to
370, the hardcoded 24-hour threshold for LastTransitionTime causes flakes
because many clusters don't transition conditions within 24h; replace the
hardcoded value with a configurable duration and stop failing the test on
exceedance—log or warn instead. Concretely: introduce a configurable threshold
(env var like NODE_HEALTH_TRANSITION_THRESHOLD or a nodeHealthParams field)
parsed as a time.Duration with a sensible default (e.g., 7*24*time.Hour),
replace the BeNumerically("<", 24*time.Hour) Expect with a conditional that if
timeSinceTransition > threshold then glog.Warningf (or Ginkgo Log/Warn) with the
node, condition, transition time and age (do not call Expect to fail), otherwise
keep logging the recent transition; ensure parsing the env var falls back to the
default and add a short comment explaining the change.
Summary by CodeRabbit