add new healthcheck eco-go-test #754

bazem8 · 2025-08-20T12:27:08Z

Summary by CodeRabbit

Tests
- Added BDD-style Cluster Health and comprehensive Node Health test suites validating node discovery, readiness, pressure, resource, kubelet, and condition checks with ordered, failure-aware flows.
- Emits JUnit-compatible reports, supports label-based selective runs, and produces failure-time diagnostic dumps for CI troubleshooting.
Documentation
- Added README documenting the Node Health test suite, configuration defaults, running instructions, and customization guidance.
Chores
- Exposed configurable thresholds and log verbosity to aid debugging and reporting.

klaskosk

a more general note on comments (since I know gemini has a tendency to add way too many 🙂): you only need to add them to explain why, not what. The linter will complain for exported types/consts/var/etc so things like consts may have more obvious comments, but inside a function you shouldn't need that many.

This is a good example, where the comment explains why we can set clusterName := strings.Split(cluster.Server, ".")[1]. Without the comment it's not clear without knowing the format of the input why that works

klaskosk · 2025-08-26T11:41:39Z

tests/cnf/ran/healthcheck/internal/tsparams/params.go

this should be typed as glog.Level but it's unused so that didn't show up

as for the actual number, the convention so far is to leave 100 for eco-goinfra logs and then use lower numbers for eco-gotests. In cnf/ran we usually do 90 on the test suite level and 80 on the cnf/ran level, although we always set verbosity to 100 in CI

klaskosk · 2025-08-26T11:43:31Z

tests/cnf/ran/healthcheck/tests/healthcheck.go

for Expect(err).ToNot(HaveOccurred(),, the error will be printed regardless so we don't need to specify it in the format string. although good to be aware that you can format like that

klaskosk · 2025-08-26T11:44:48Z

tests/cnf/ran/healthcheck/tests/healthcheck.go

like above, you can just format directly in the matcher. no need to use fmt.Sprintf

klaskosk · 2025-08-26T11:49:15Z

tests/cnf/ran/healthcheck/tests/healthcheck.go

the eco-goinfra/pkg/nodes.Builder type has an IsReady method, so it would be good to reuse that method if possible

this is a good implementation of it though

klaskosk · 2025-08-26T11:52:20Z

tests/cnf/ran/healthcheck/tests/healthcheck.go

Jenkins console will get all log messages, so the biggest advantage is that GinkgoWriter will show up in the junit that gets exported. But unless you have a specific reason for doing that it'd be preferred to use glog.V(tsparams.LogLevel).Info() or Infof. Plus glog will add the newline automatically

klaskosk · 2025-08-26T12:09:59Z

another thing since we just moved the repo is that you'll need to rebase (git pull --rebase upstream main) these changes and update the imports to be github.com/rh-ecosystem-edge/...

coderabbitai · 2025-08-27T13:21:24Z

Walkthrough

Adds two Ginkgo/Gomega test suites and supporting packages: a minimal "Cluster Health Check" healthcheck suite and a comprehensive "Node Health CNF RAN" suite. Introduces test params, initialization helpers, reporting hooks (JUnit/XML), multiple node-health validations, and documentation for the node-health tests.

Changes

Cohort / File(s)	Summary
Healthcheck suite wiring `tests/cnf/ran/healthcheck/healthcheck_suite_test.go`	New Ginkgo suite entrypoint that sets the JUnit report path via RANConfig, registers the fail handler, and runs specs filtered by `tsparams.LabelHealthCheckTestCases`.
Healthcheck params `tests/cnf/ran/healthcheck/internal/tsparams/params.go`	Adds `LabelHealthCheckTestCases = "healthcheck"` and `LogLevel glog.Level = 90` constants for labeling and glog verbosity.
Healthcheck test case `tests/cnf/ran/healthcheck/tests/healthcheck.go`	Adds a spec that lists cluster nodes via `Spoke1APIClient`, asserts the list is non-empty, and verifies each node has `NodeReady == ConditionTrue`, logging readiness.
Node Health suite wiring `tests/cnf/ran/node-health/node_health_suite_test.go`	New Ginkgo suite entrypoint for node-health: sets JUnit/XML reporting paths, registers fail handler, applies label filters from `nodehealthparams.Labels`, adds JustAfterEach reporter dump on failure, and creates XML report in AfterSuite.
Node Health params & reporter config `tests/cnf/ran/node-health/internal/nodehealthparams/...` `tests/cnf/ran/node-health/internal/nodehealthparams/const.go`	Adds many constants (labels, thresholds, timeouts, selectors), `Labels` slice, `ReporterNamespacesToDump` map, and `ReporterCRDsToDump` list used by reporter hooks.
Node Health init helper `tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go`	Exposes `APIClient *clients.Settings` initialized from internal `inittools.APIClient` for consumers of the package.
Node Health tests (validation suite) `tests/cnf/ran/node-health/tests/node_health_validation.go`	Adds a comprehensive, ordered test suite that: lists nodes in BeforeAll, validates readiness, pressure, resource fields, kubelet pod status, node conditions (including timestamps), and performs short continuous monitoring. Uses report IDs and logging.
Documentation `tests/cnf/ran/node-health/README.md`	New README describing suite purpose, structure, categories, thresholds, running instructions, config guidance, dependencies, and contribution notes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Suite as Healthcheck Suite
  participant Config as RANConfig
  participant Spec as Healthcheck Spec
  participant API as Kubernetes API (Spoke1APIClient)

  Suite->>Config: GetJunitReportPath(currentFile)
  Config-->>Suite: JUnit path
  Suite->>Suite: RegisterFailHandler(Fail)\nRunSpecs(filter: "healthcheck")
  Spec->>API: List Nodes (context.TODO())
  API-->>Spec: NodeList or error
  alt error
    Spec->>Suite: Fail (error)
  else success
    Spec->>Spec: Expect nodeList non-empty
    loop each node
      Spec->>Spec: Check NodeReady == ConditionTrue
      alt not ready
        Spec->>Suite: Fail (node not ready)
      else ready
        Spec->>Spec: Log readiness
      end
    end
  end

sequenceDiagram
  autonumber
  participant Suite as NodeHealth Suite
  participant Init as InitTools/APIClient
  participant Reporter as Reporter/ReportXML
  participant API as Kubernetes API

  Note over Suite: Suite startup
  Suite->>Init: resolve currentFile, get APIClient
  Suite->>Reporter: configure JUnit/XML paths
  Suite->>Suite: RegisterFailHandler(Fail)\nRunSpecs(filter: nodehealthparams.Labels)

  Note over Suite,API: Test execution (high level)
  Suite->>API: BeforeAll: List Nodes -> nodesList
  Suite->>API: Ordered Contexts: Readiness, Pressure, Resources, Kubelet, Conditions, Monitoring
  API-->>Suite: Node and Pod data responses
  alt any spec failed
    Suite->>Reporter: JustAfterEach -> ReportIfFailed(CurrentSpecReport(), currentFile, namespaces, crds)
  end

  Note over Suite: AfterSuite
  Suite->>Reporter: reportxml.Create(report, GeneralConfig.GetReportPath(), GeneralConfig.TCPrefix)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)

7-7: Log-level convention ack (90 for suite-level; 100 reserved for eco-goinfra)

This aligns with the convention described earlier; keep as-is for cnf/ran suite.

tests/cnf/ran/healthcheck/tests/healthcheck.go (2)

30-31: Drop custom message for err matcher; Gomega will print the error

Cleaner failure output; previous review mentioned this as well.

-		Expect(err).ToNot(HaveOccurred(), "Failed to list cluster nodes")
+		Expect(err).ToNot(HaveOccurred())

34-51: Optional: reuse eco-goinfra’s Node builder IsReady helper

Reduces custom condition-walking and aligns with library semantics.

-		By("Verifying all nodes are in Ready state")
-		// Iterate through each node retrieved from the cluster.
-		for _, node := range nodeList {
-			// Initialize a flag to track if the node is Ready.
-			isReady := false
-			// Iterate through the conditions reported by the node.
-			for _, condition := range node.Status.Conditions {
-				// Check for the "Ready" condition type and ensure its status is "True".
-				if condition.Type == corev1.NodeReady && condition.Status == corev1.ConditionTrue {
-					isReady = true
-					break // Found the Ready condition, no need to check further conditions for this node.
-				}
-			}
-			// Assert that the node is ready. If not, the test will fail and report the node's name.
-			Expect(isReady).To(BeTrue(), "Node %s is not in Ready state. Current conditions: %+v", node.Name, node.Status.Conditions)
-			// Print a message to the glog output for successful nodes.
-			glog.V(tsparams.LogLevel).Infof("Node %s is Ready.", node.Name)
-		}
+		By("Verifying all nodes are in Ready state")
+		for _, node := range nodeList {
+			ready, err := nodes.NewBuilder(Spoke1APIClient, node.Name).IsReady(context.TODO())
+			Expect(err).ToNot(HaveOccurred())
+			Expect(ready).To(BeTrue(), "Node %s is not in Ready state. Current conditions: %+v", node.Name, node.Status.Conditions)
+			glog.V(tsparams.LogLevel).Infof("Node %s is Ready.", node.Name)
+		}

🧹 Nitpick comments (2)

tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)
3-7: Polish exported comments to satisfy linters

Start comments with the identifier names.
-// This label will be used in the Jenkins job's TEST_TYPE parameter.
+// LabelHealthCheckTestCases is used in the Jenkins job's TEST_TYPE parameter.
 const LabelHealthCheckTestCases = "healthcheck"
 
-// defines the verbosity level for glog.
+// LogLevel defines the default verbosity level for glog.
 const LogLevel glog.Level = 90
tests/cnf/ran/healthcheck/tests/healthcheck.go (1)
25-31: Optional: use a timeout context for API calls

Prevents hangs on slow/unreachable clusters.
-		nodeList, err := nodes.List(Spoke1APIClient, context.TODO())
+		ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+		defer cancel()
+		nodeList, err := nodes.List(Spoke1APIClient, ctx)
Add import if adopting:
-import (
+import (
 	"context"
+	"time"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8f38185 and 40e4f89.

📒 Files selected for processing (3)

tests/cnf/ran/healthcheck/healthcheck_suite_test.go (1 hunks)
tests/cnf/ran/healthcheck/internal/tsparams/params.go (1 hunks)
tests/cnf/ran/healthcheck/tests/healthcheck.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/cnf/ran/healthcheck/healthcheck_suite_test.go (1)

tests/cnf/ran/healthcheck/internal/tsparams/params.go (1)

LabelHealthCheckTestCases (4-4)

tests/cnf/ran/healthcheck/tests/healthcheck.go (2)

tests/cnf/ran/healthcheck/internal/tsparams/params.go (2)

LabelHealthCheckTestCases (4-4)

LogLevel (7-7)

tests/cnf/ran/internal/raninittools/raninittools.go (1)

Spoke1APIClient (14-14)

🔇 Additional comments (2)

tests/cnf/ran/healthcheck/healthcheck_suite_test.go (1)
21-22: RunSpecs invocation is consistent with repository conventions; no changes required

After reviewing the RunSpecs calls across all suite tests (e.g. vcore, spk, RDS, O-Cloud, RAN O-RAN, TALM, etc.), every test uses the pattern:
RunSpecs(t, "<Suite Name>", Label(...), reporterConfig)
Your Cluster Health Check suite does the same:
RunSpecs(t, "Cluster Health Check", Label(tsparams.LabelHealthCheckTestCases), reporterConfig)
This matches the established Ginkgo v2 usage in our codebase, so no update is necessary here.
tests/cnf/ran/healthcheck/tests/healthcheck.go (1)

53-53: LGTM: concise summary logging

Final readiness summary at the configured verbosity is helpful and consistent.

tests/cnf/ran/healthcheck/healthcheck_suite_test.go

tests/cnf/ran/healthcheck/internal/tsparams/params.go

tests/cnf/ran/healthcheck/tests/healthcheck.go

…orts new module path

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (10)

tests/cnf/ran/node-health/README.md (1)
27-38: Add language specifiers to fenced code blocks (markdownlint MD040)

Add a language to the directory tree and log examples to satisfy linters.
-```
+```text
 ...
-```
+```
-```
+```text
 INFO: Found 3 nodes for health monitoring
 ...
-```
+```
Also applies to: 141-150
tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go (2)
13-17: Avoid recommending dot-imports; export a stable API instead

Dot-imports reduce clarity and can cause identifier collisions. The comment suggests dot-importing this package, but only APIClient is exported anyway.
-// init loads all variables automatically when this package is imported. Once package is imported a user has full
-// access to all vars within init function. It is recommended to import this package using dot import.
+// init exposes shared clients when this package is imported.
+// Import this package normally and reference nodehealthinittools.APIClient explicitly.
8-17: Remove or consolidate the nodehealthinittools re-export
No references to nodehealthinittools.APIClient were found—tests still dot-import tests/internal/inittools. Either switch tests to import this package for APIClient or drop it to eliminate redundancy.
tests/cnf/ran/node-health/tests/node_health_validation.go (3)
156-181: Disk “usage” test doesn’t assert usage; it only logs image sizes

Currently this only checks Capacity/Allocatable presence and logs image sizes. If thresholds are intended (per README), add real assertions from metrics or kubelet stats API.

I can wire this to metrics.k8s.io (if enabled) or node allocatable vs. requested via schedulable resources.

372-411: Long-running monitoring in an It block; parameterize or gate

A hardcoded 2 minutes with 30s sleeps will slow suites and may hit CI timeouts.

Read duration/interval from env (e.g., ECO_NODE_MONITOR_DURATION, ECO_NODE_MONITOR_INTERVAL) with sensible defaults.

Skip or shorten when GeneralConfig.DryRun is true.

3-20: Prefer avoiding dot-imports for inittools

Dot-importing inittools pulls many globals into this package’s scope. Prefer a named import and explicit references for readability.
- . "github.com/rh-ecosystem-edge/eco-gotests/tests/internal/inittools"
+ inittools "github.com/rh-ecosystem-edge/eco-gotests/tests/internal/inittools"
Then reference inittools.GeneralConfig and inittools.APIClient explicitly.
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (4)
42-50: Use time.Duration for timeouts/intervals to avoid unit ambiguity.

Seconds as bare ints are error-prone; prefer typed durations.

Apply:
-	// KubeletHealthCheckTimeout is the timeout for kubelet health checks.
-	KubeletHealthCheckTimeout = 30
+	// KubeletHealthCheckTimeout is the timeout for kubelet health checks.
+	KubeletHealthCheckTimeout = 30 * time.Second
@@
-	// NodeConditionCheckTimeout is the timeout for node condition checks.
-	NodeConditionCheckTimeout = 60
+	// NodeConditionCheckTimeout is the timeout for node condition checks.
+	NodeConditionCheckTimeout = 60 * time.Second
@@
-	// ResourceCheckInterval is the interval between resource checks.
-	ResourceCheckInterval = 10
+	// ResourceCheckInterval is the interval between resource checks.
+	ResourceCheckInterval = 10 * time.Second
Also add the import:
import (
    k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter"
    corev1 "k8s.io/api/core/v1"
    "time"
)
51-59: Mirror Kubernetes constants instead of hardcoding strings.

Prevents drift and typos while keeping the same exported names.

Apply:
-	// ConditionTypeReadyString constant to fix linter warning.
-	ConditionTypeReadyString = "Ready"
-
-	// ConstantTrueString constant to fix linter warning.
-	ConstantTrueString = "True"
-
-	// ConstantFalseString constant to fix linter warning.
-	ConstantFalseString = "False"
+	// ConditionTypeReadyString mirrors corev1.NodeReady.
+	ConditionTypeReadyString = string(corev1.NodeReady)
+
+	// ConstantTrueString mirrors corev1.ConditionTrue.
+	ConstantTrueString = string(corev1.ConditionTrue)
+
+	// ConstantFalseString mirrors corev1.ConditionFalse.
+	ConstantFalseString = string(corev1.ConditionFalse)
1-2: Add a package doc comment for lint compliance.

Apply:
+// Package nodehealthparams centralizes labels, thresholds, and reporter settings
+// used by the Node Health CNF tests.
 package nodehealthparams
83-95: Reporter defaults are sensible.

Namespaces and CRDs chosen are useful for triage. Consider adding tuned (openshift-cluster-node-tuning-operator) if node tuning issues are in scope, but optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 896a4af and 3955b21.

📒 Files selected for processing (5)

tests/cnf/ran/node-health/README.md (1 hunks)
tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go (1 hunks)
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (1 hunks)
tests/cnf/ran/node-health/node_health_suite_test.go (1 hunks)
tests/cnf/ran/node-health/tests/node_health_validation.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/cnf/ran/node-health/node_health_suite_test.go (3)

tests/internal/config/config.go (1)

GeneralConfig (21-42)

tests/cnf/ran/node-health/internal/nodehealthparams/const.go (4)

Label (10-10)

Labels (74-81)

ReporterNamespacesToDump (84-88)

ReporterCRDsToDump (91-95)

tests/internal/reporter/reporter.go (1)

ReportIfFailed (51-57)

tests/cnf/ran/node-health/tests/node_health_validation.go (2)

tests/cnf/ran/node-health/internal/nodehealthparams/const.go (2)

KubeletNamespace (61-61)

KubeletPodSelector (64-64)

tests/cnf/ran/node-health/internal/nodehealthinittools/nodehealthinittools.go (1)

APIClient (10-10)

🪛 LanguageTool

tests/cnf/ran/node-health/README.md

[grammar] ~9-~9: There might be a mistake here.
Context: ...Node Readiness**: Ensures all nodes are in Ready state for CNF workloads - **Press...

(QB_NEW_EN)

[grammar] ~42-~42: There might be a mistake here.
Context: ...gories ### 1. Node Readiness Validation - Test ID: node-health-001 - **Purpose...

(QB_NEW_EN)

[grammar] ~43-~43: There might be a mistake here.
Context: ...ode Readiness Validation - Test ID: node-health-001 - Purpose: Verify all nodes are in Ready...

(QB_NEW_EN)

[grammar] ~44-~44: There might be a mistake here.
Context: ...01` - Purpose: Verify all nodes are in Ready state for CNF workloads - **Label...

(QB_NEW_EN)

[grammar] ~45-~45: There might be a mistake here.
Context: ...y state for CNF workloads - Labels: node-readiness-check - Description: Checks that each node has...

(QB_NEW_EN)

[grammar] ~48-~48: There might be a mistake here.
Context: ... True ### 2. Node Pressure Validation - Test ID: node-health-003 to `node-he...

(QB_NEW_EN)

[grammar] ~49-~49: There might be a mistake here.
Context: ...ion - Test ID: node-health-003 to node-health-005 - Purpose: Monitor node pressure conditi...

(QB_NEW_EN)

[grammar] ~50-~50: There might be a mistake here.
Context: ... conditions critical for RAN performance - Labels: disk-pressure-check, `memory...

(QB_NEW_EN)

[grammar] ~51-~51: There might be a mistake here.
Context: ...essure-check, memory-pressure-check, network-pressure-check` - Description: Validates that nodes are ...

(QB_NEW_EN)

[grammar] ~54-~54: There might be a mistake here.
Context: ...e ### 3. Node Resource Usage Validation - Test ID: node-health-006 to `node-he...

(QB_NEW_EN)

[grammar] ~55-~55: There might be a mistake here.
Context: ...ion - Test ID: node-health-006 to node-health-008 - Purpose: Monitor resource utilization ...

(QB_NEW_EN)

[grammar] ~56-~56: There might be a mistake here.
Context: ...ce utilization for CNF resource planning - Labels: disk-usage-check, `memory-us...

(QB_NEW_EN)

[grammar] ~57-~57: There might be a mistake here.
Context: ...isk-usage-check, memory-usage-check, cpu-usage-check` - Description: Verifies resource capacit...

(QB_NEW_EN)

[grammar] ~60-~60: There might be a mistake here.
Context: ...limits ### 4. Kubelet Status Validation - Test ID: node-health-009 to `node-he...

(QB_NEW_EN)

[grammar] ~61-~61: There might be a mistake here.
Context: ...ion - Test ID: node-health-009 to node-health-010 - Purpose: Ensure kubelet service health...

(QB_NEW_EN)

[grammar] ~63-~63: There might be a mistake here.
Context: ...ing - Labels: kubelet-pod-status, kubelet-service-check - Description: Verifies kubelet pods are...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...nodes ### 5. Node Conditions Validation - Test ID: node-health-011 to `node-he...

(QB_NEW_EN)

[grammar] ~68-~68: There might be a mistake here.
Context: ...e condition monitoring for RAN stability - Labels: node-conditions-check, `node...

(QB_NEW_EN)

[grammar] ~72-~72: There might be a mistake here.
Context: ...sition times ### 6. Resource Monitoring - Test ID: node-health-013 - **Purpose...

(QB_NEW_EN)

[grammar] ~73-~73: There might be a mistake here.
Context: ...# 6. Resource Monitoring - Test ID: node-health-013 - Purpose: Continuous monitoring over ti...

(QB_NEW_EN)

[grammar] ~74-~74: There might be a mistake here.
Context: ...g over time for CNF performance analysis - Labels: resource-monitoring - **Desc...

(QB_NEW_EN)

[grammar] ~75-~75: There might be a mistake here.
Context: ... CNF performance analysis - Labels: resource-monitoring - Description: Monitors node health cont...

(QB_NEW_EN)

[grammar] ~80-~80: There might be a mistake here.
Context: ...ration ## Configuration ### Thresholds The test suite uses configurable thresho...

(QB_NEW_EN)

[grammar] ~155-~155: There might be a mistake here.
Context: ...es the standard reporting infrastructure - Labels: Follows the established labeli...

(QB_NEW_EN)

[grammar] ~156-~156: There might be a mistake here.
Context: ...ows the established labeling conventions - Parameters: Uses the parameter managem...

(QB_NEW_EN)

[grammar] ~157-~157: There might be a mistake here.
Context: ...**: Uses the parameter management system - Error Handling: Integrates with the fa...

(QB_NEW_EN)

[grammar] ~162-~162: There might be a mistake here.
Context: ...ra**: For Kubernetes resource management - Ginkgo v2: For BDD-style testing - **G...

(QB_NEW_EN)

[grammar] ~163-~163: There might be a mistake here.
Context: ...t - Ginkgo v2: For BDD-style testing - Gomega: For assertions and matchers - ...

(QB_NEW_EN)

[grammar] ~164-~164: There might be a mistake here.
Context: ... Gomega: For assertions and matchers - Kubernetes client-go: For API interact...

(QB_NEW_EN)

[grammar] ~169-~169: There might be a mistake here.
Context: ...tomization ### Adding New Health Checks To add new health checks, extend the tes...

(QB_NEW_EN)

[grammar] ~181-~181: There might be a mistake here.
Context: ...ere }) ``` ### Modifying Thresholds Update the constants in `internal/nodehe...

(QB_NEW_EN)

[grammar] ~190-~190: There might be a mistake here.
Context: ...old ) ``` ### Adding New Resource Types Extend the resource validation tests to ...

(QB_NEW_EN)

[grammar] ~210-~210: There might be a mistake here.
Context: ...let is properly deployed ### Debug Mode Enable verbose logging by setting the lo...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

tests/cnf/ran/node-health/README.md

27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (6)

tests/cnf/ran/node-health/README.md (1)

80-99: Documented thresholds/timeouts aren’t enforced by the code

Threshold and timeout constants are mentioned but not used in node_health_validation.go. Either wire them into the tests or remove from README to avoid drift.

Do you want me to wire these into the tests (CPU/memory/disk checks and polling intervals)?

tests/cnf/ran/node-health/node_health_suite_test.go (2)

27-35: LGTM: failure diagnostics and XML report hook

The JustAfterEach failure dump and ReportAfterSuite XML generation align with existing eco-gotests patterns.

20-25: No action required: RunSpecs invocation is correct
Passing Label(...) and the mutated reporterConfig into RunSpecs is consistent with all other suites in this repo and aligns with Ginkgo v2’s DSL (labels as []string, reporter overrides via ReporterConfig).

tests/cnf/ran/node-health/tests/node_health_validation.go (1)

32-40: LGTM: node discovery and preconditions

Listing nodes once in BeforeAll and asserting non-empty cluster is fine for this suite.
tests/cnf/ran/node-health/internal/nodehealthparams/const.go (2)
66-71: MCD constants look correct.

Label and container name match typical MCO deployments.

60-65: Validate OpenShift kubelet selector
In tests/cnf/ran/node-health/internal/nodehealthparams/const.go:60-65, KubeletPodSelector = "k8s-app=kubelet" won’t match any pods on OpenShift (kubelet runs as a systemd service). Manually confirm with:
oc get pods -A -l k8s-app=kubelet
If no pods are returned, replace pod-based checks with node conditions (Ready) or query systemd status via the machine-config-daemon in the openshift-machine-config-operator namespace.

coderabbitai · 2025-09-02T14:01:29Z

tests/cnf/ran/node-health/internal/nodehealthparams/const.go

+import (
+	"github.com/openshift-kni/k8sreporter"
+	corev1 "k8s.io/api/core/v1"
+)


⚠️ Potential issue

Fix moved module import path (build will fail after repo move).

Update k8sreporter to the new module path per PR comment about the repo move.

Apply:

-import ( - "github.com/openshift-kni/k8sreporter" - corev1 "k8s.io/api/core/v1" -) +import ( + k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter" + corev1 "k8s.io/api/core/v1" +)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import (

"github.com/openshift-kni/k8sreporter"

corev1 "k8s.io/api/core/v1"

)

import (

k8sreporter "github.com/rh-ecosystem-edge/eco-goinfra/pkg/k8sreporter"

corev1 "k8s.io/api/core/v1"

)

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/internal/nodehealthparams/const.go around lines 3–6, the import path for k8sreporter needs to be updated to the repo's new module path after the move; replace "github.com/openshift-kni/k8sreporter" with the new module path specified in the PR (adjusting any trailing /vX versioned path if required), then run go mod tidy to update dependencies and fix any import usages accordingly.

coderabbitai · 2025-09-02T14:01:29Z

tests/cnf/ran/node-health/internal/nodehealthparams/const.go

+const (
+	// Label is used to select tests for Node Health monitoring.
+	Label = "node-health"
+
+	// NodeHealthLogLevel configures logging level for Node Health related tests.
+	NodeHealthLogLevel = 90
+


💡 Verification agent

🧩 Analysis chain

Confirm intended log level scale (90).

Value 90 is unusually high; ensure it matches your logging facade’s expectations (range, semantics). If unused, drop it.

Run:

🏁 Script executed:

#!/bin/bash rg -n "NodeHealthLogLevel"

Length of output: 258

Drop the unused NodeHealthLogLevel constant. It’s declared at tests/cnf/ran/node-health/internal/nodehealthparams/const.go:13 but never referenced; remove it if there’s no intended usage.

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/internal/nodehealthparams/const.go around lines 8 to 14, the constant NodeHealthLogLevel is declared but unused; remove the NodeHealthLogLevel constant definition from this file (and any trailing blank line if needed) so only the used Label constant remains; if the log level was intended to be configurable, instead add a TODO and a clear comment or implement its usage where logging is configured.

coderabbitai · 2025-09-02T14:01:29Z

tests/cnf/ran/node-health/README.md

+### 4. Kubelet Status Validation
+- **Test ID**: `node-health-009` to `node-health-010`
+- **Purpose**: Ensure kubelet service health essential for CNF pod scheduling
+- **Labels**: `kubelet-pod-status`, `kubelet-service-check`
+- **Description**: Verifies kubelet pods are running and ready on all nodes
+
+### 5. Node Conditions Validation


⚠️ Potential issue

Kubelet “pod” wording is incorrect for OpenShift; tests will fail as written

On OpenShift/OKD, kubelet runs as a systemd service on the node, not as a Pod in kube-system. The current description (and matching tests) expect kubelet Pods and will consistently find none.

Consider updating both docs and tests to validate kubelet health via Node conditions (Ready) and/or kubelet health endpoint on the node, rather than listing Pods. If you keep a kubelet status section, reword to avoid “kubelet pods.”

🧰 Tools

🪛 LanguageTool

[grammar] ~60-~60: There might be a mistake here.
Context: ...limits ### 4. Kubelet Status Validation - Test ID: node-health-009 to `node-he...

(QB_NEW_EN)

[grammar] ~61-~61: There might be a mistake here.
Context: ...ion - Test ID: node-health-009 to node-health-010 - Purpose: Ensure kubelet service health...

(QB_NEW_EN)

[grammar] ~63-~63: There might be a mistake here.
Context: ...ing - Labels: kubelet-pod-status, kubelet-service-check - Description: Verifies kubelet pods are...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...nodes ### 5. Node Conditions Validation - Test ID: node-health-011 to `node-he...

(QB_NEW_EN)

coderabbitai · 2025-09-02T14:01:29Z

tests/cnf/ran/node-health/README.md

+```bash
+# Run tests with specific labels
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness"
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure"
+go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet"
+```


🛠️ Refactor suggestion

Fix label names in run examples

Examples use non-existent labels. Replace them with labels actually present in the code.

Apply:

-# Run tests with specific labels -go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness" -go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure" -go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet" +# Run tests with specific labels +go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-readiness" +go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-pressure" +go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="kubelet-status"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```bash

# Run tests with specific labels

go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-readiness"

go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-pressure"

go test ./tests/cnf/ran/node-health/ -v -ginkgo.label-filter="node-health-kubelet"

```

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/README.md around lines 121 to 126, the example go test commands reference nonexistent labels; update those three example lines to use the actual label names used in the test suite: inspect the test files (ginkgo Describe/It labels or any label metadata) to determine the correct label keys/values (e.g., readiness, pressure, kubelet may be named differently) and replace "node-health-readiness", "node-health-pressure", and "node-health-kubelet" with the exact labels found; ensure the updated commands preserve the same flags and quoting and run a quick local go test with one corrected label to verify correctness.

coderabbitai · 2025-09-02T14:01:29Z

tests/cnf/ran/node-health/README.md

+### Debug Mode
+Enable verbose logging by setting the log level:
+
+```bash
+export GLOMAXLEVEL=5
+go test ./tests/cnf/ran/node-health/ -v
+```


⚠️ Potential issue

Wrong env var for verbosity

Use ECO_VERBOSE_LEVEL (per tests/internal/config/config.go), not GLOMAXLEVEL.

-export GLOMAXLEVEL=5 +export ECO_VERBOSE_LEVEL=5 go test ./tests/cnf/ran/node-health/ -v

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Debug Mode

Enable verbose logging by setting the log level:

```bash

export GLOMAXLEVEL=5

go test ./tests/cnf/ran/node-health/ -v

```

### Debug Mode

Enable verbose logging by setting the log level:

🧰 Tools

🪛 LanguageTool

[grammar] ~210-~210: There might be a mistake here.
Context: ...let is properly deployed ### Debug Mode Enable verbose logging by setting the lo...

(QB_NEW_EN)

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/README.md around lines 210 to 216, the README shows the wrong environment variable for test verbosity; update the instructions to use ECO_VERBOSE_LEVEL (as defined in tests/internal/config/config.go) instead of GLOMAXLEVEL and demonstrate exporting a numeric level (e.g., export ECO_VERBOSE_LEVEL=5) before running the go test command to enable verbose logging.

coderabbitai · 2025-09-02T14:01:30Z

tests/cnf/ran/node-health/tests/node_health_validation.go

+		Context("Kubelet Status Validation", Label("kubelet-status"), func() {
+			It("Verify kubelet pods are running on all nodes",
+				Label("kubelet-pod-status"),
+				reportxml.ID("node-health-009"),
+				func() {
+					for _, nodeBuilder := range nodesList {
+						nodeObj := nodeBuilder.Object
+						glog.Infof("Checking kubelet status on node: %s", nodeObj.Name)
+
+						// Check if kubelet pod is running on this node
+						podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(
+							context.TODO(),
+							metav1.ListOptions{
+								FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name),
+								LabelSelector: nodehealthparams.KubeletPodSelector,
+							},
+						)
+						Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name)
+
+						// Verify at least one kubelet pod is running
+						hasRunningKubelet := false
+						for _, pod := range podList.Items {
+							if pod.Status.Phase == corev1.PodRunning {
+								hasRunningKubelet = true
+								break
+							}
+						}
+
+						Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name)
+						glog.Infof("Kubelet pod is running on node %s", nodeObj.Name)
+					}
+				})
+
+			It("Verify kubelet service is responding",
+				Label("kubelet-service-check"),
+				reportxml.ID("node-health-010"),
+				func() {
+					// This test would typically check kubelet health endpoint
+					// For now, we'll verify the kubelet pods are healthy
+					for _, nodeBuilder := range nodesList {
+						nodeObj := nodeBuilder.Object
+						glog.Infof("Checking kubelet service health on node: %s", nodeObj.Name)
+
+						// Check kubelet pod readiness
+						podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(
+							context.TODO(),
+							metav1.ListOptions{
+								FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name),
+								LabelSelector: nodehealthparams.KubeletPodSelector,
+							},
+						)
+						Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name)
+
+						for _, pod := range podList.Items {
+							// Check if pod is ready
+							isReady := false
+							for _, condition := range pod.Status.Conditions {
+								if condition.Type == corev1.PodReady {
+									isReady = condition.Status == corev1.ConditionTrue
+									break
+								}
+							}
+
+							Expect(isReady).To(BeTrue(), "Kubelet pod %s on node %s is not ready", pod.Name, nodeObj.Name)
+							glog.Infof("Kubelet pod %s on node %s is ready", pod.Name, nodeObj.Name)
+						}
+					}
+				})
+		})


⚠️ Potential issue

Kubelet checks assume kubelet Pods exist; this will fail on OpenShift

Kubelet isn’t a Pod on OpenShift/OKD. Listing kubelet Pods in kube-system with label k8s-app=kubelet will return nothing and the tests will fail.

Replace both kubelet tests with condition-based checks (or node heartbeat), e.g.:

- It("Verify kubelet pods are running on all nodes", - Label("kubelet-pod-status"), - reportxml.ID("node-health-009"), - func() { - ... - podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(...) - ... - hasRunningKubelet := false - for _, pod := range podList.Items { - if pod.Status.Phase == corev1.PodRunning { hasRunningKubelet = true; break } - } - Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name) - }) + It("Verify kubelet is reporting Ready on all nodes", + Label("kubelet-service-check"), + reportxml.ID("node-health-009"), + func() { + for _, nodeBuilder := range nodesList { + nodeObj := nodeBuilder.Object + var readyCond *corev1.NodeCondition + for i := range nodeObj.Status.Conditions { + if nodeObj.Status.Conditions[i].Type == corev1.NodeReady { + readyCond = &nodeObj.Status.Conditions[i] + break + } + } + Expect(readyCond).NotTo(BeNil(), "Node %s has no Ready condition", nodeObj.Name) + Expect(readyCond.Status).To(Equal(corev1.ConditionTrue), "Node %s kubelet is not Ready", nodeObj.Name) + } + }) - It("Verify kubelet service is responding", - Label("kubelet-service-check"), - reportxml.ID("node-health-010"), - func() { - // Lists kubelet pods and checks readiness - ... - }) + // Optionally, add a heartbeat freshness check by inspecting LastHeartbeatTime if desired.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Context("Kubelet Status Validation", Label("kubelet-status"), func() {

It("Verify kubelet pods are running on all nodes",

Label("kubelet-pod-status"),

reportxml.ID("node-health-009"),

func() {

for _, nodeBuilder := range nodesList {

nodeObj := nodeBuilder.Object

glog.Infof("Checking kubelet status on node: %s", nodeObj.Name)

// Check if kubelet pod is running on this node

podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(

context.TODO(),

metav1.ListOptions{

FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name),

LabelSelector: nodehealthparams.KubeletPodSelector,

},

)

Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name)

// Verify at least one kubelet pod is running

hasRunningKubelet := false

for _, pod := range podList.Items {

if pod.Status.Phase == corev1.PodRunning {

hasRunningKubelet = true

break

}

}

Expect(hasRunningKubelet).To(BeTrue(), "No running kubelet pod found on node %s", nodeObj.Name)

glog.Infof("Kubelet pod is running on node %s", nodeObj.Name)

}

})

It("Verify kubelet service is responding",

Label("kubelet-service-check"),

reportxml.ID("node-health-010"),

func() {

// This test would typically check kubelet health endpoint

// For now, we'll verify the kubelet pods are healthy

for _, nodeBuilder := range nodesList {

nodeObj := nodeBuilder.Object

glog.Infof("Checking kubelet service health on node: %s", nodeObj.Name)

// Check kubelet pod readiness

podList, err := APIClient.CoreV1Interface.Pods(nodehealthparams.KubeletNamespace).List(

context.TODO(),

metav1.ListOptions{

FieldSelector: fmt.Sprintf("spec.nodeName=%s", nodeObj.Name),

LabelSelector: nodehealthparams.KubeletPodSelector,

},

)

Expect(err).NotTo(HaveOccurred(), "Failed to list kubelet pods on node %s", nodeObj.Name)

for _, pod := range podList.Items {

// Check if pod is ready

isReady := false

for _, condition := range pod.Status.Conditions {

if condition.Type == corev1.PodReady {

isReady = condition.Status == corev1.ConditionTrue

break

}

}

Expect(isReady).To(BeTrue(), "Kubelet pod %s on node %s is not ready", pod.Name, nodeObj.Name)

glog.Infof("Kubelet pod %s on node %s is ready", pod.Name, nodeObj.Name)

}

}

})

})

Context("Kubelet Status Validation", Label("kubelet-status"), func() {

It("Verify kubelet is reporting Ready on all nodes",

Label("kubelet-service-check"),

reportxml.ID("node-health-009"),

func() {

for _, nodeBuilder := range nodesList {

nodeObj := nodeBuilder.Object

var readyCond *corev1.NodeCondition

for i := range nodeObj.Status.Conditions {

if nodeObj.Status.Conditions[i].Type == corev1.NodeReady {

readyCond = &nodeObj.Status.Conditions[i]

break

}

}

Expect(readyCond).NotTo(BeNil(), "Node %s has no Ready condition", nodeObj.Name)

Expect(readyCond.Status).To(Equal(corev1.ConditionTrue), "Node %s kubelet is not Ready", nodeObj.Name)

}

})

// Optionally, add a heartbeat freshness check by inspecting LastHeartbeatTime if desired.

})

coderabbitai · 2025-09-02T14:01:30Z

tests/cnf/ran/node-health/tests/node_health_validation.go

+							if condition.Type == corev1.NodeNetworkUnavailable &&
+								strings.Contains(nodeObj.Name, "master") {
+								// Master nodes might have NetworkUnavailable=True during initial setup
+								continue
+							}


🛠️ Refactor suggestion

Don’t infer control-plane nodes from name substrings

Checking strings.Contains(nodeObj.Name, "master") is brittle. Use role labels.

- if condition.Type == corev1.NodeNetworkUnavailable && - strings.Contains(nodeObj.Name, "master") { + if condition.Type == corev1.NodeNetworkUnavailable && isControlPlaneNode(nodeObj) { continue }

Add helper (outside this hunk):

func isControlPlaneNode(n *corev1.Node) bool { if n == nil { return false } _, hasCP := n.Labels["node-role.kubernetes.io/control-plane"] _, hasMaster := n.Labels["node-role.kubernetes.io/master"] return hasCP || hasMaster }

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/tests/node_health_validation.go around lines 316 to 320, the code infers control-plane nodes by checking if nodeObj.Name contains "master", which is brittle; add a helper function (placed outside this hunk) named isControlPlaneNode that returns true if the node has either node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label, then replace the strings.Contains(nodeObj.Name, "master") check with a call to isControlPlaneNode(&nodeObj) so the NetworkUnavailable exception is based on role labels rather than name substrings.

coderabbitai · 2025-09-02T14:01:30Z

tests/cnf/ran/node-health/tests/node_health_validation.go

+			It("Verify node last transition times are recent",
+				Label("node-transition-time-check"),
+				reportxml.ID("node-health-012"),
+				func() {
+					for _, nodeBuilder := range nodesList {
+						nodeObj := nodeBuilder.Object
+						glog.Infof("Checking transition times on node: %s", nodeObj.Name)
+
+						// Check if node conditions have recent transition times
+						for _, condition := range nodeObj.Status.Conditions {
+							if condition.LastTransitionTime.IsZero() {
+								continue // Skip conditions without transition time
+							}
+
+							// Check if transition time is within reasonable bounds (not too old)
+							timeSinceTransition := time.Since(condition.LastTransitionTime.Time)
+							Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour),
+								"Node %s condition %s has very old transition time: %s",
+								nodeObj.Name, condition.Type, condition.LastTransitionTime)
+
+							glog.Infof("Node %s condition %s transition time: %s (age: %v)",
+								nodeObj.Name, condition.Type, condition.LastTransitionTime, timeSinceTransition)
+						}
+
+						glog.Infof("Transition times on node %s are recent", nodeObj.Name)
+					}
+				})
+		})


🛠️ Refactor suggestion

Transition time < 24h will flake on stable clusters

Many clusters have not transitioned conditions for days/weeks. The 24h bound will cause false failures.

- Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour), + Expect(timeSinceTransition).To(BeNumerically("<", 30*24*time.Hour), "Node %s condition %s has very old transition time: %s", nodeObj.Name, condition.Type, condition.LastTransitionTime)

Optionally, make this threshold configurable (env or nodehealthparams) and only warn/log when exceeded.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

It("Verify node last transition times are recent",

Label("node-transition-time-check"),

reportxml.ID("node-health-012"),

func() {

for _, nodeBuilder := range nodesList {

nodeObj := nodeBuilder.Object

glog.Infof("Checking transition times on node: %s", nodeObj.Name)

// Check if node conditions have recent transition times

for _, condition := range nodeObj.Status.Conditions {

if condition.LastTransitionTime.IsZero() {

continue // Skip conditions without transition time

}

// Check if transition time is within reasonable bounds (not too old)

timeSinceTransition := time.Since(condition.LastTransitionTime.Time)

Expect(timeSinceTransition).To(BeNumerically("<", 24*time.Hour),

"Node %s condition %s has very old transition time: %s",

nodeObj.Name, condition.Type, condition.LastTransitionTime)

glog.Infof("Node %s condition %s transition time: %s (age: %v)",

nodeObj.Name, condition.Type, condition.LastTransitionTime, timeSinceTransition)

}

glog.Infof("Transition times on node %s are recent", nodeObj.Name)

}

})

})

// Check if transition time is within reasonable bounds (not too old)

timeSinceTransition := time.Since(condition.LastTransitionTime.Time)

Expect(timeSinceTransition).To(BeNumerically("<", 30*24*time.Hour),

"Node %s condition %s has very old transition time: %s",

nodeObj.Name, condition.Type, condition.LastTransitionTime)

🤖 Prompt for AI Agents

In tests/cnf/ran/node-health/tests/node_health_validation.go around lines 343 to 370, the hardcoded 24-hour threshold for LastTransitionTime causes flakes because many clusters don't transition conditions within 24h; replace the hardcoded value with a configurable duration and stop failing the test on exceedance—log or warn instead. Concretely: introduce a configurable threshold (env var like NODE_HEALTH_TRANSITION_THRESHOLD or a nodeHealthParams field) parsed as a time.Duration with a sensible default (e.g., 7*24*time.Hour), replace the BeNumerically("<", 24*time.Hour) Expect with a conditional that if timeSinceTransition > threshold then glog.Warningf (or Ginkgo Log/Warn) with the node, condition, transition time and age (do not call Expect to fail), otherwise keep logging the recent transition; ensure parsing the env var falls back to the default and add a short comment explaining the change.

klaskosk reviewed Aug 26, 2025

View reviewed changes

bazem8 added 2 commits August 27, 2025 15:28

add new healthcheck eco-go-test

7bb6eb0

fixing kirsten notes and rebase to repo

40e4f89

bazem8 force-pushed the b-ecogo-health-test branch from 7909402 to 40e4f89 Compare August 27, 2025 13:21

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

tests/cnf/ran/healthcheck/healthcheck_suite_test.go Outdated Show resolved Hide resolved

tests/cnf/ran/healthcheck/internal/tsparams/params.go Show resolved Hide resolved

tests/cnf/ran/healthcheck/tests/healthcheck.go Show resolved Hide resolved

bazem8 added 2 commits August 28, 2025 14:03

remove unused fmt and addind glog.Level identification and Update imp…

896a4af

…orts new module path

add a new test called node-health using cursor-ai

3955b21

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

add new healthcheck eco-go-test #754

Are you sure you want to change the base?

add new healthcheck eco-go-test #754

Uh oh!

Conversation

bazem8 commented Aug 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

klaskosk left a comment

Choose a reason for hiding this comment

Uh oh!

klaskosk Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

klaskosk Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

klaskosk Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

klaskosk Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

klaskosk Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

klaskosk commented Aug 26, 2025

Uh oh!

coderabbitai bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

bazem8 commented Aug 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)