Skip to content

Conversation

@zhiqiangf
Copy link

@zhiqiangf zhiqiangf commented Dec 5, 2025

Add SR-IOV test suite and basic test implementations:

  • sriov_suite_test.go: Suite setup, cleanup, and reporter config
  • basic.go: 9 test cases covering VF spoof/trust, VLAN, link state,
    MTU configuration, and DPDK functionality

Test cases use caseID variables for resource naming and support
dynamic device configuration via SRIOV_DEVICES env variable."

Summary by CodeRabbit

  • Tests

    • Added a comprehensive SR‑IOV test suite covering VF initialization, DPDK flows, network scenarios (spoofing, trust, VLAN, QoS, link state), MTU updates, traffic verification, interface checks, and cleanup utilities.
  • New Features

    • Config now supports per‑device entries and VF count configurable via env or YAML with env parsing overrides and sensible defaults.
    • Added many test parameters and timeouts to improve test stability and configurability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Warning

Rate limit exceeded

@zhiqiangf has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca8a42 and 56491cc.

📒 Files selected for processing (2)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go
  • tests/ocp/sriov/tests/basic.go
📝 Walkthrough

Walkthrough

Adds SR‑IOV test infrastructure: extended config parsing (env/YAML) and device model, default YAML device list and vf_num, many test constants, a large sriovenv utilities module for network/policy/pod/VF lifecycle and verification, and a new basic SR‑IOV test suite.

Changes

Cohort / File(s) Change Summary
SR‑IOV configuration
tests/ocp/sriov/internal/ocpsriovconfig/config.go
Added DeviceConfig type; extended SriovOcpConfig with Devices, DevicesEnv, VFNum; added GetSriovDevices(), GetVFNum(), and parseSriovDevicesEnv(); changed default config path to default.yaml; improved config-path resolution and error logging.
Default YAML
tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
Added vf_num: 2 and a default devices list of NIC entries (device_id, vendor, interface_name, supports_min_tx_rate).
SR‑IOV environment utilities
tests/ocp/sriov/internal/sriovenv/sriovenv.go
New large module: network/NAD creation & removal, policy management, VF/DPDK init, pod helpers, interface verification, traffic checks, PCI address extraction, cleanup routines, and related exported helpers/constants.
Test parameters / constants
tests/ocp/sriov/internal/tsparams/consts.go
Added many test constants (timeouts, polling intervals, MAC/IPs, MTU, VLAN/QoS, TX rates, BCM vendor ID, labels).
Test vars update
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go
Added package doc comment; removed DefaultTimeout variable and unnecessary time import; clarified Labels comment.
Basic SR‑IOV tests
tests/ocp/sriov/tests/basic.go
New Ginkgo test suite with setup/teardown, per-device namespaces/networks, VF and DPDK flows, spoof/trust/VLAN/QoS/link-state/MTU tests, pass‑traffic verification, PCI annotation checks, and cleanup helpers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • evgenLevin
  • ajaggapa
  • gkopels

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'ocp-sriov: add test suite and basic test cases' directly and clearly describes the main changes: introducing a new SR-IOV test suite with basic test cases, which aligns with the comprehensive changes across config, environment setup, and test implementations.
Docstring Coverage ✅ Passed Docstring coverage is 95.35% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch 8 times, most recently from d001197 to 8a89658 Compare December 5, 2025 19:39
@zhiqiangf zhiqiangf marked this pull request as ready for review December 5, 2025 19:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
tests/ocp/sriov/internal/tsparams/consts.go (1)

51-52: Consider documenting the MTU choice.

The value 1800 is reasonable for testing, but a brief comment explaining why this specific value was chosen (e.g., "above standard 1500 to test jumbo frame support" or "between minimum and standard to verify MTU configuration") would be helpful for maintainability.

Apply this diff to add clarifying documentation:

-	// DefaultTestMTU is the default MTU value used in MTU configuration tests.
+	// DefaultTestMTU is the default MTU value used in MTU configuration tests.
+	// Set to 1800 to verify non-standard MTU configuration support.
 	DefaultTestMTU = 1800
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

57-62: Defensive check is appropriate but consider logging more context.

The runtime.Caller(0) check handles an edge case. Consider including the error or additional context in the log message for debugging purposes.

 	_, filename, _, ok := runtime.Caller(0)
 	if !ok {
-		log.Print("Error: unable to determine config file path")
+		log.Print("Error: runtime.Caller(0) failed - unable to determine config file path")
 
 		return nil
 	}
tests/ocp/sriov/internal/tsparams/sriovvars.go (2)

24-35: Consider documenting the default device list origin.

The hardcoded device configurations appear environment-specific. Consider adding a comment explaining where these defaults come from (e.g., lab hardware inventory) to help maintainers understand when updates are needed.


65-71: Consider adding basic validation for deviceID and vendor fields.

The parsed values are used directly without validation. Consider validating that DeviceID and Vendor are valid hex strings (typical PCI ID format) to catch configuration errors early.

+		// Basic validation for PCI vendor/device ID format (4 hex chars)
+		deviceID := strings.TrimSpace(parts[1])
+		vendor := strings.TrimSpace(parts[2])
+		if len(deviceID) != 4 || len(vendor) != 4 {
+			klog.Warningf("Skipping SRIOV_DEVICES entry %q - deviceID and vendor should be 4 hex characters", entry)
+			continue
+		}
+
 		devices = append(devices, DeviceConfig{
 			Name:              strings.TrimSpace(parts[0]),
-			DeviceID:          strings.TrimSpace(parts[1]),
-			Vendor:            strings.TrimSpace(parts[2]),
+			DeviceID:          deviceID,
+			Vendor:            vendor,
 			InterfaceName:     strings.TrimSpace(parts[3]),
 			SupportsMinTxRate: true, // Default to true for env-configured devices
 		})
tests/ocp/sriov/tests/basic.go (1)

101-119: Consider asserting cleanup success or documenting the lenient approach.

Policy cleanup errors are logged but the test suite continues. While this is acceptable for cleanup, consider either:

  1. Using Expect to fail if critical cleanup fails, or
  2. Adding a comment explaining why lenient cleanup is acceptable here.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)

167-177: Consider distinguishing "not found" from other pull errors.

Returning nil for any pull error assumes the resource is already deleted. Consider explicitly checking for "not found" errors to avoid masking other failures.

 func RemoveSriovNetwork(name string, timeout time.Duration) error {
 	sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace
 
 	network, err := sriov.PullNetwork(APIClient, name, sriovOpNs)
 	if err != nil {
-		return nil // Already deleted
+		// Assume not found means already deleted
+		// Note: eco-goinfra doesn't expose typed errors, so we treat all pull failures as "not found"
+		return nil
 	}
 
 	return network.DeleteAndWait(timeout)
 }

290-343: Delete-and-recreate introduces a brief gap without policy.

The workaround is documented, but the brief window between delete and recreate might cause issues if VFs are actively used. Consider if WaitForSriovPolicyReady should be called after delete to ensure clean state before recreation.

 	// Delete the existing policy
 	if err = policy.Delete(); err != nil {
 		return fmt.Errorf("failed to delete policy %q for MTU update: %w", policyName, err)
 	}
 
+	// Wait for deletion to complete before recreating
+	_ = wait.PollUntilContextTimeout(context.Background(), tsparams.PollingInterval, tsparams.DefaultTimeout, true,
+		func(ctx context.Context) (bool, error) {
+			_, pullErr := sriov.PullPolicy(APIClient, policyName, sriovOpNs)
+			return pullErr != nil, nil
+		})
+
 	// Recreate with updated MTU, preserving all original settings
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1c772b and 8a896580e12eebd3339e6d82ec625d3846d25166.

📒 Files selected for processing (9)
  • tests/ocp/internal/ocpconfig/config.go (2 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (4 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (0 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/consts.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (0 hunks)
  • tests/ocp/sriov/internal/tsparams/sriovvars.go (1 hunks)
  • tests/ocp/sriov/sriov_suite_test.go (1 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
💤 Files with no reviewable changes (2)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/sriov_suite_test.go (2)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (2)
  • SriovOcpConfig (13-13)
  • APIClient (11-11)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
  • CleanupLeftoverResources (735-801)
  • CheckSriovOperatorStatus (40-42)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
  • SriovOcpConfig (13-13)
🔇 Additional comments (27)
tests/ocp/sriov/internal/tsparams/consts.go (3)

6-50: LGTM! Well-structured timeout constants.

The timeout and interval constants are comprehensive and appropriate for SR-IOV testing operations. The derived polling intervals (lines 45, 47) are a good practice, making it easy to maintain consistent timing relationships.


59-64: Excellent MAC address documentation.

The MAC addresses are correctly configured as locally administered unicast addresses, and the comments clearly explain the addressing scheme. This is helpful for anyone reviewing or maintaining the test suite.


67-72: LGTM! Clear documentation for the Labels pattern.

The documentation clearly explains how individual test files extend the Labels slice with their own specific labels, which helps maintainers understand the test filtering mechanism.

tests/ocp/internal/ocpconfig/config.go (1)

38-55: The save/restore pattern for GeneralConfig is a necessary workaround for yaml.v2 behavior, not an optional defensive measure. When yaml.v2 unmarshals YAML without a corresponding field section, embedded pointer fields remain nil—so preserving the pre-initialized config and restoring it after YAML decode is the standard approach for merging YAML data with defaults. This pattern is correct and commonly used with yaml.v2. No changes needed.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)

28-41: Good practice: Decoupled configuration structs prevent embedded config mutation.

The helper structs sriovYAMLConfig and sriovEnvConfig cleanly isolate SR-IOV-specific field decoding from the embedded OcpConfig/GeneralConfig. This prevents unintended side effects during YAML/env parsing.

Note that SriovInterfaces is only in sriovEnvConfig (not in YAML struct), which means it can only be set via environment variable. Ensure this is intentional.


94-107: LGTM!

The temporary struct pattern correctly isolates YAML decoding from the embedded config structs, preventing unintended mutations.


113-134: LGTM!

The conditional application of non-empty environment values correctly implements override semantics while preserving values loaded from YAML.

tests/ocp/sriov/internal/tsparams/sriovvars.go (3)

16-22: LGTM!

The DeviceConfig struct is well-defined with clear field semantics. The SupportsMinTxRate flag allows test cases to skip unsupported configurations gracefully.


77-92: Panic for misconfigured environment is intentional but verify test framework behavior.

Using panic for invalid SRIOV_DEVICES ensures fast failure on misconfiguration. Verify that Ginkgo handles this panic appropriately during test initialization (it should fail the suite early).


94-113: LGTM!

The VF number retrieval handles invalid values gracefully with logging and fallback. The dual environment variable support with clear priority is well implemented.

tests/ocp/sriov/tests/basic.go (6)

22-32: LGTM!

The isNoCarrierError helper function cleanly encapsulates the NO-CARRIER detection logic for consistent handling across tests.


34-60: LGTM!

The namespace setup correctly applies privileged labels, waits for creation, and registers cleanup. The naming convention with testID and device name ensures uniqueness across test iterations.


62-75: LGTM!

The network setup helper correctly creates the network and registers cleanup with appropriate timeout.


121-157: LGTM!

The test structure is clear with proper device iteration, VF initialization, network setup, and verification. The Skip conditions handle NO-CARRIER and missing device configurations appropriately.


438-443: Good practice: Documenting device-specific limitations with bug references.

The BCM NIC skip with OCPBUGS-30909 reference helps maintainers understand the limitation's origin and track when it might be resolved.


458-476: LGTM!

The NAD readiness check with Eventually and the DPDK pod creation with proper cleanup registration follow good test patterns.

tests/ocp/sriov/sriov_suite_test.go (4)

36-41: LGTM!

The WorkerLabel validation with clear failure message ensures configuration issues are caught early with actionable guidance.


53-75: Good recovery logic for pre-existing namespaces.

The delete-and-recreate approach for existing test namespaces handles partial failure recovery well. The namespace builder recreation after deletion ensures labels are properly applied.


82-88: LGTM!

The nil check on testNS handles the edge case where BeforeSuite failed before creating the namespace.


98-114: LGTM!

The report path printing is helpful for CI pipelines and debugging. The disabled message provides clarity when reporting is turned off.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (7)

39-42: LGTM!

The operator status check correctly delegates to the existing sriovoperator.IsSriovDeployed function.


49-81: LGTM!

The stability check correctly polls for both SR-IOV node state synchronization and MCP updated conditions. Returning (false, nil) for intermediate states is correct for wait.PollUntilContextTimeout.


87-137: LGTM!

The functional options pattern provides a clean, composable API for network configuration. Each option is focused and well-documented.


215-266: Clarify single-node policy creation behavior.

The function creates a policy on the first successful node and returns. If multi-node deployment is intended, consider documenting this limitation or modifying the logic. Current behavior is reasonable for single-node testing.


645-684: LGTM!

The debug pod approach for node-level command execution is well-implemented with proper name length handling, existing pod cleanup, and deferred deletion.


690-727: LGTM!

The PCI address extraction correctly parses the network-status annotation and handles namespaced network names with the LastIndex approach.


764-764: The regex pattern ^\d{5}-|\w+dpdknet$ is correctly formed and behaves as intended.

The alternation operator | has the lowest precedence in regex, so the pattern parses as (^\d{5}-)|(​\w+dpdknet$), matching either networks starting with 5 digits followed by a dash, or networks ending with the "dpdknet" suffix. This correctly identifies the two types of test networks created by the test suite.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch 4 times, most recently from 0fff32b to e0d6393 Compare December 9, 2025 18:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

75-86: Potential maintenance burden: duplicated device defaults.

GetDefaultDevices() duplicates the exact same device list defined in default.yaml. If either is updated, they can drift out of sync. Consider loading defaults from the YAML file or removing one source of truth.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (3)

268-274: Consider waiting for MCP to stabilize after policy deletion on failure.

When WaitForSriovPolicyReady fails and the policy is deleted, the MCP may still be in an updating state. The next iteration immediately tries a different node, which could cause issues if MCP hasn't settled. Consider adding a brief stabilization wait or logging a warning.


562-571: Consider using explicit parameters instead of parsing description strings.

The logic branches based on whether description contains "spoof checking" and "off". This is fragile and could break if description wording changes. Consider adding explicit parameters for verification options.

-func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string,
-	timeout time.Duration) error {
+type VFStatusOptions struct {
+	VerifySpoofCheck bool
+	ExpectedSpoofState string // "on" or "off"
+}
+
+func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string,
+	timeout time.Duration, opts ...VFStatusOptions) error {

385-405: CreateDpdkTestPod hardcodes client IP/MAC, limiting reusability.

This function always assigns TestPodClientIP and TestPodClientMAC, which prevents creating multiple DPDK pods (they'd have IP/MAC conflicts). Consider parameterizing these values similar to CreateTestPod.

-func CreateDpdkTestPod(name, namespace, networkName string) (*pod.Builder, error) {
-	secNetwork := pod.StaticIPAnnotationWithMacAddress(networkName,
-		[]string{tsparams.TestPodClientIP}, tsparams.TestPodClientMAC)
+func CreateDpdkTestPod(name, namespace, networkName, ip, mac string) (*pod.Builder, error) {
+	secNetwork := pod.StaticIPAnnotationWithMacAddress(networkName,
+		[]string{ip}, mac)
tests/ocp/sriov/sriov_suite_test.go (1)

48-75: LGTM with minor suggestion: consider extracting label application.

The namespace create-or-recreate logic handles partial failures correctly. The label application is duplicated (lines 49-51 and 66-68), which could be extracted to a helper function for maintainability, but this is a minor point.

tests/ocp/sriov/tests/basic.go (5)

78-120: Suite-level setup/teardown covers core SR-IOV operator state

BeforeAll’s operator status check and worker discovery, paired with AfterAll policy cleanup and final WaitForSriovPolicyReady, give the suite a solid guardrail around cluster state. If you ever need stricter guarantees on cleanup, you could promote cleanupErrors from a warning to a failing expectation, but the current behavior makes sense for flaky environments.


122-263: Spoof/trust VF tests share identical flow; consider a small helper to reduce duplication

The four tests around spoof/trust combinations all follow the same pattern (init VF → create ns/network → CheckVFStatusWithPassTraffic → NO-CARRIER skip handling). A small helper that takes caseID, description string, and a list of sriovenv.NetworkOption values would significantly cut duplication and ensure any future behavior changes (e.g., better logging, changed timeouts) remain consistent across all four specs.


309-385: Link-state tests are well-structured; you might reuse verification helper for the 'auto' case

The split between an “auto” link-state test (traffic-only) and an “enable” test (explicit VerifyLinkStateConfiguration + traffic) is clear and the NO-CARRIER handling is sane. If you decide you also care about asserting the configured state in the “auto” test, consider reusing VerifyLinkStateConfiguration there as well to make both specs symmetric.


387-430: MTU policy update sequence is reasonable; be mindful of suite-wide interactions

Using UpdateSriovPolicyMTU followed by WaitForSriovPolicyReady before creating test workloads is a sensible way to validate MTU propagation end‑to‑end. Just be aware that this deletes/recreates SR-IOV policies, so if you later introduce other suites sharing the same policies, you’ll want to ensure they are ordered/serialized appropriately.


432-509: DPDK validation test is thorough; a couple of micro-optimizations are possible

The DPDK test does a nice job: per-device gating (including skipping BCM NICs), namespace/network isolation, NAD readiness check, pod readiness, PCI address extraction via GetPciAddress, and annotation validation all line up well.

Two optional cleanups you could consider:

  • CreateSriovNetwork already waits for NAD creation, and CreateDpdkTestPod already waits for the pod to be ready, so the extra Eventually(nad.Pull...) and WaitForPodWithLabelReady are slightly redundant and add latency.
  • You could capture the pod returned by CreateDpdkTestPod instead of re‑pulling it with pod.Pull when validating annotations, saving one API call.

These are minor and non-blocking; the current implementation is functionally sound.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a896580e12eebd3339e6d82ec625d3846d25166 and e0d6393db78bc5b77e589c4b1474f427a2903c71.

📒 Files selected for processing (9)
  • tests/ocp/internal/ocpconfig/config.go (2 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (4 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (1 hunks)
  • tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/consts.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1 hunks)
  • tests/ocp/sriov/sriov_suite_test.go (1 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go
  • tests/ocp/internal/ocpconfig/config.go
🧰 Additional context used
🧬 Code graph analysis (3)
tests/ocp/sriov/tests/basic.go (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • DeviceConfig (26-32)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (14)
  • NetworkOption (91-91)
  • CreateSriovNetwork (143-158)
  • RemoveSriovNetwork (171-180)
  • WithSpoof (94-98)
  • WithTrust (101-105)
  • WithVLAN (108-112)
  • WithVlanQoS (115-119)
  • WithMinTxRate (122-126)
  • WithMaxTxRate (129-133)
  • WithLinkState (136-140)
  • UpdateSriovPolicyMTU (308-359)
  • InitDpdkVF (220-222)
  • CreateDpdkTestPod (386-405)
  • GetPciAddress (732-768)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
  • Name (22-22)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
  • SriovOcpConfig (15-15)
tests/ocp/sriov/sriov_suite_test.go (7)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (2)
  • SriovOcpConfig (15-15)
  • APIClient (13-13)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • SriovOcpConfig (35-42)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (3)
  • Labels (14-14)
  • ReporterNamespacesToDump (26-28)
  • ReporterCRDsToDump (17-23)
tests/cnf/core/network/internal/netparam/netvars.go (2)
  • Labels (11-11)
  • DefaultTimeout (17-17)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
  • CleanupLeftoverResources (776-842)
  • CheckSriovOperatorStatus (43-45)
tests/system-tests/internal/link/link.go (1)
  • NewBuilder (48-77)
tests/ocp/sriov/internal/tsparams/consts.go (2)
  • TestNamespaceName (8-8)
  • DefaultTimeout (17-17)
🔇 Additional comments (17)
tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (1)

5-45: LGTM!

The default device configuration structure is well-organized with clear documentation on the environment variable override format. The device list covers common SR-IOV NICs (Intel, Broadcom, Mellanox) with appropriate vendor/device ID mappings.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)

268-288: LGTM!

The legacy SRIOV_VF_NUM parsing logic correctly gives precedence to the new ECO_OCP_SRIOV_VF_NUM environment variable and YAML configuration. The comment clearly explains the intention.


44-73: Excellent documentation of the yaml.v2 embedded pointer behavior.

The detailed comment explaining why intermediate structs are needed prevents future confusion and accidental refactoring that could reintroduce the bug.


88-137: LGTM!

The initialization flow with pre-read defaults, file loading, environment overrides, and legacy parsing is well-structured. Error handling is consistent with the existing pattern in the codebase.

tests/ocp/sriov/internal/tsparams/consts.go (2)

14-52: LGTM!

The timeout constants are well-organized with clear documentation. The 20-minute PolicyApplicationTimeout appropriately accounts for MCP updates during SR-IOV policy application.


56-66: LGTM!

The test IP addresses use an appropriate private range, and MAC addresses correctly use the locally administered unicast format (first octet 0x02), which prevents conflicts with manufacturer-assigned addresses.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (5)

52-84: LGTM!

The polling logic correctly checks both SR-IOV node sync status and MCP stability conditions. The nil-safety check on ns.Objects prevents panics.


686-725: LGTM!

The debug pod creation for on-node command execution is well-implemented with proper cleanup of existing pods, name length handling, and correct namespace flags for host access.


90-158: LGTM!

The functional options pattern for network configuration is idiomatic and provides a clean API for building SR-IOV networks with various settings.


731-768: LGTM!

The PCI address extraction correctly parses the CNI network-status annotation and handles namespaced network names by stripping the namespace prefix.


787-795: Verify namespace cleanup prefix against actual test namespace configuration.

The cleanup searches for namespaces prefixed with "e2e-", but this may not match the main test namespace. Verify that tsparams.TestNamespaceName is not expected to be cleaned by this function and confirm whether the "e2e-" pattern is intentional for temporary test namespaces only.

tests/ocp/sriov/sriov_suite_test.go (3)

82-88: LGTM!

The nil-safety check prevents panics when BeforeSuite fails before namespace creation.


98-114: LGTM!

The report location printing is helpful for CI environments. The conditional message when reporting is disabled provides clear feedback.


37-41: LGTM!

The WorkerLabel validation provides early failure with a clear error message if the configuration is missing.

tests/ocp/sriov/tests/basic.go (3)

23-33: NO-CARRIER helper is simple and nil-safe

The helper cleanly handles nil and centralizes the environment-specific substrings, which keeps the specs readable and avoids duplication.


35-76: Namespace and SR-IOV network setup/cleanup utilities look robust

Centralizing namespace and network creation plus DeferCleanup-based teardown is a good pattern for keeping tests hermetic and avoiding leftover resources. Labeling privileged namespaces here is also a nice touch.


265-307: VLAN and rate-limiting test is gated correctly on device capability

Checking data.SupportsMinTxRate before applying min/max rate options avoids misconfiguring unsupported hardware, and reusing isNoCarrierError keeps the NO-CARRIER handling consistent with the other VF tests. The overall flow looks correct.

Comment on lines 250 to 221
devices = append(devices, DeviceConfig{
Name: strings.TrimSpace(parts[0]),
DeviceID: strings.TrimSpace(parts[1]),
Vendor: strings.TrimSpace(parts[2]),
InterfaceName: strings.TrimSpace(parts[3]),
SupportsMinTxRate: true, // Default to true for env-configured devices
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default SupportsMinTxRate: true may not be accurate for all devices.

When parsing devices from SRIOV_DEVICES env var, SupportsMinTxRate defaults to true. However, some NICs (e.g., BCM, older Intel) don't support min TX rate. Consider either:

  1. Adding a 5th field to the env format for this flag, or
  2. Documenting this limitation clearly.
🤖 Prompt for AI Agents
In tests/ocp/sriov/internal/ocpsriovconfig/config.go around lines 250 to 256,
the code unconditionally sets SupportsMinTxRate: true when building DeviceConfig
from SRIOV_DEVICES; change parsing to accept an optional 5th CSV field for the
supports-min-tx-rate flag (if parts length >= 5, parse it as a boolean with
strict true/false handling, falling back to false on parse error), and if the
5th field is absent default SupportsMinTxRate to false (safer for older NICs);
also add a concise comment explaining the new optional field and update any
related env var documentation to show the new format.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch 6 times, most recently from e0d97e3 to 585ecef Compare December 12, 2025 18:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

333-348: Some policy spec fields may not be preserved during MTU update.

This concern was raised in a previous review. The delete/recreate pattern only copies specific fields (DeviceType, NicSelector.Vendor, NicSelector.DeviceID, Priority). Other fields like IsRdma, LinkType, ExternallyManaged, RootDevices, and NetFilter would be lost if set on the original policy.

Consider copying the entire NicSelector via DeepCopy() and preserving additional spec fields to ensure no configuration is lost.

🧹 Nitpick comments (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)

45-57: Hardcoded interface names may not match all environments.

The default device configurations include specific interface names (e.g., eno12409, ens2f2) that are hardware-dependent. While the SRIOV_DEVICES environment variable provides an override mechanism, users might be surprised when defaults don't work on their hardware.

Consider adding a comment noting that these are reference values for a specific test environment, and users should configure via SRIOV_DEVICES or YAML for their setup.


208-221: Default SupportsMinTxRate: true when 5th field is omitted may not suit all NICs.

The 5th field for SupportsMinTxRate is now supported, which addresses the previous review concern. However, when the field is omitted (4-field format), the default is still true (line 208). Some NICs (BCM, older Intel) don't support min TX rate, and users using the shorter 4-field format might encounter issues.

Consider defaulting to false when omitted (safer), or clearly document that the 4-field format assumes min TX rate support.

-		supportsMinTxRate := true // default
+		supportsMinTxRate := false // safer default for unknown NICs

 		if len(parts) == 5 {
 			val := strings.ToLower(strings.TrimSpace(parts[4]))
 			supportsMinTxRate = val == "true" || val == "1" || val == "yes"
 		}
tests/ocp/sriov/tests/basic.go (1)

473-480: Redundant NAD readiness check.

The Eventually block waiting for NAD readiness (lines 475-480) is redundant because setupSriovNetworkCreateSriovNetwork already calls WaitForNADCreation with tsparams.NADTimeout. This double-wait adds unnecessary test execution time.

 			setupSriovNetwork(networkName, data.Name, testNamespace)

-			// Wait for NAD to be ready
-			By("Waiting for NetworkAttachmentDefinition to be fully ready")
-			Eventually(func() error {
-				_, err := nad.Pull(APIClient, networkName, testNamespace)
-
-				return err
-			}, tsparams.NADTimeout, tsparams.PollingInterval).ShouldNot(HaveOccurred(),
-				"NAD %q should be ready", networkName)
-
 			// Create DPDK test pod
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0d6393db78bc5b77e589c4b1474f427a2903c71 and 585ecef2aa5f78b67c384ca43a7c7ec9208139af.

📒 Files selected for processing (6)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (5 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (1 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/consts.go (2 hunks)
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (2 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
  • tests/ocp/sriov/internal/tsparams/consts.go
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/tests/basic.go (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • DeviceConfig (27-33)
tests/ocp/sriov/internal/tsparams/consts.go (11)
  • BCMVendorID (72-72)
  • NamespaceTimeout (30-30)
  • CleanupTimeout (34-34)
  • PodReadyTimeout (32-32)
  • TestVLAN (61-61)
  • TestVlanQoS (63-63)
  • TestMinTxRate (65-65)
  • TestMaxTxRate (67-67)
  • DefaultTestMTU (56-56)
  • NADTimeout (40-40)
  • PollingInterval (28-28)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (20)
  • NetworkOption (88-88)
  • CreateSriovNetwork (140-155)
  • RemoveSriovNetwork (168-177)
  • CheckSriovOperatorStatus (40-42)
  • InitVF (209-211)
  • WithSpoof (91-95)
  • WithTrust (98-102)
  • CheckVFStatusWithPassTraffic (510-585)
  • WithVLAN (105-109)
  • WithVlanQoS (112-116)
  • WithMinTxRate (119-123)
  • WithMaxTxRate (126-130)
  • WithLinkState (133-137)
  • VerifyLinkStateConfiguration (589-638)
  • UpdateSriovPolicyMTU (305-356)
  • InitDpdkVF (217-219)
  • CreateDpdkTestPod (383-402)
  • DeleteDpdkTestPod (405-414)
  • WaitForPodWithLabelReady (417-454)
  • GetPciAddress (729-765)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)
tests/ocp/sriov/internal/tsparams/consts.go (1)
  • LabelSuite (10-10)
🔇 Additional comments (16)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)

1-33: LGTM!

The documentation improvements are clear and helpful. The package doc comment and updated Labels comment provide better context for maintainers. The refactoring of DefaultTimeout to the centralized consts.go file aligns with better code organization.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

73-103: LGTM!

Good defensive programming with the runtime.Caller ok check. The config initialization flow with YAML loading, environment processing, and legacy env var parsing is well-structured. The ordering ensures environment variables can override YAML values appropriately.

tests/ocp/sriov/tests/basic.go (6)

25-36: LGTM!

The isNoCarrierError helper is well-implemented with clear documentation. Checking for both "NO-CARRIER" and "no physical connection" strings covers common error message patterns.


38-88: LGTM!

The helper functions follow good patterns:

  • isBCMDevice correctly identifies Broadcom NICs by name or vendor ID with appropriate OCPBUGS reference.
  • setupTestNamespace and setupSriovNetwork properly encapsulate resource creation with DeferCleanup for automatic cleanup, following Ginkgo best practices.

96-132: LGTM!

The test setup and teardown are well-structured:

  • BeforeAll validates operator status and node discovery as prerequisites.
  • AfterAll uses best-effort cleanup for policies (logging warnings) while ensuring cluster stability before completion (with Expect).

This design appropriately balances cleanup reliability with test isolation.


134-275: LGTM!

The spoof and trust test cases follow a consistent, clear pattern. Each test:

  1. Iterates through configured devices
  2. Initializes VFs with appropriate settings
  3. Creates isolated test namespace and network
  4. Verifies VF status with traffic tests
  5. Handles NO-CARRIER gracefully with Skip

The repetition is acceptable for test maintainability and debuggability.


277-443: LGTM!

Good coverage of VLAN, link state, and MTU configurations:

  • VLAN test properly checks SupportsMinTxRate capability before applying rate limits.
  • Link state "enable" test (lines 357-398) correctly separates configuration verification from connectivity testing.
  • MTU test properly sequences policy update → wait for ready → traffic verification.

445-523: LGTM overall for DPDK test!

The DPDK test provides good coverage:

  • Correctly skips BCM NICs with bug reference (OCPBUGS-30909).
  • Verifies PCI address assignment and network-status annotation.
  • Proper cleanup with DeferCleanup.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (8)

39-81: LGTM!

The operator status and stability functions are well-implemented:

  • CheckSriovOperatorStatus delegates to sriovoperator.IsSriovDeployed appropriately.
  • WaitForSriovAndMCPStable correctly checks both SR-IOV node states (SyncStatus == "Succeeded") and MCP conditions (MachineConfigPoolUpdated).

87-155: LGTM!

The network creation helpers follow a clean functional options pattern. CreateSriovNetwork properly chains network builder configuration and waits for NAD readiness before returning.


183-279: LGTM!

The policy management functions are well-structured:

  • RemoveSriovPolicy properly waits for deletion confirmation.
  • initVFWithDevType implements smart retry across nodes, cleaning up failed attempts before trying the next node.
  • Good documentation noting this is for single-node testing scenarios.

362-454: LGTM!

Pod creation and readiness helpers are well-implemented:

  • CreateTestPod and CreateDpdkTestPod properly configure privileged pods with secondary networks.
  • WaitForPodWithLabelReady thoroughly checks both pod phase and container readiness states.
  • DeleteDpdkTestPod handles already-deleted pods gracefully.

460-503: LGTM!

Interface verification functions are robust:

  • Proper use of ip link show output parsing.
  • ExtractPodInterfaceMAC correctly identifies the link/ether field.
  • Good error messages with interface context.

509-585: LGTM!

CheckVFStatusWithPassTraffic provides comprehensive VF validation:

  • Creates isolated client/server pod pairs.
  • Verifies interface readiness and carrier status.
  • Performs connectivity test with ping.
  • Proper cleanup via defer blocks.

The spoof check verification based on description string content (lines 559-568) works but is implicit. Consider using explicit parameters if this pattern expands.


644-722: LGTM!

The spoof verification and node execution functions are well-designed:

  • verifySpoofCheck correctly identifies VF by MAC address and verifies spoof state.
  • executeOnNode uses a debug pod pattern with nsenter for host namespace access, appropriate for test environments.
  • Proper cleanup of existing debug pods and 63-char name truncation for Kubernetes compliance.

728-850: LGTM!

The PCI address extraction and cleanup functions are solid:

  • GetPciAddress correctly parses the CNI network-status annotation JSON, handling namespace prefixes in network names.
  • CleanupLeftoverResources is appropriately best-effort, logging warnings but continuing cleanup for robustness.
  • The regex pattern (^\d{5}-|\w+dpdknet$) correctly matches the test naming conventions used in basic.go.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch from 585ecef to ddf26dc Compare December 12, 2025 20:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

132-137: Default SupportsMinTxRate: true may cause test failures for unsupported devices.

When the optional 5th field is omitted, supportsMinTxRate defaults to true. For NICs that don't support min TX rate (e.g., BCM, older Intel), this could cause tests to attempt rate limiting configuration that will fail. Consider defaulting to false for safety, requiring explicit opt-in.

This was flagged in a previous review. The current implementation does support the 5th field but defaults to true when absent.

🧹 Nitpick comments (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

151-159: Consider using a named constant for the fallback value.

The magic number 2 on line 155 would be clearer as a named constant (e.g., DefaultVFNum). This would make the intent explicit and ensure consistency if this value is used elsewhere.

+const (
+	// DefaultVFNum is the default number of virtual functions when not configured.
+	DefaultVFNum = 2
+)
+
 // GetVFNum returns the configured number of virtual functions.
 // Returns the value from YAML/env config, or 2 as fallback if not set.
 func (sriovOcpConfig *SriovOcpConfig) GetVFNum() int {
 	if sriovOcpConfig.VFNum <= 0 {
-		return 2 // fallback if YAML wasn't loaded properly
+		return DefaultVFNum // fallback if YAML wasn't loaded properly
 	}

 	return sriovOcpConfig.VFNum
 }
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)

333-356: Some policy fields may still be lost during MTU update.

The past review noted that manually copying specific fields could lose others (e.g., IsRdma, LinkType, ExternallyManaged). While the commit marked this as addressed, the implementation still selectively copies fields rather than preserving the entire spec. For test code this may be acceptable, but be aware that policies with RDMA or LinkType configured would lose those settings.


558-568: String-based spoof check trigger is fragile.

The spoof check verification is triggered by strings.Contains(description, "spoof checking"). This couples the verification to the exact wording used when calling the function. Consider passing an explicit parameter instead.

-func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string,
+type VFCheckOptions struct {
+	VerifySpoofCheck bool
+	SpoofCheckState  string // "on" or "off"
+}
+
+func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string,
-	timeout time.Duration) error {
+	timeout time.Duration, opts ...VFCheckOptions) error {

However, for test code this level of coupling may be acceptable if descriptions are consistent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 585ecef2aa5f78b67c384ca43a7c7ec9208139af and ddf26dc11e94fbccb5204b1c5b9b23b7c2ba99ed.

📒 Files selected for processing (4)
  • tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml (1 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (5 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/tests/basic.go (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • DeviceConfig (25-31)
tests/ocp/sriov/internal/tsparams/consts.go (12)
  • BCMVendorID (72-72)
  • NamespaceTimeout (30-30)
  • CleanupTimeout (34-34)
  • LabelBasic (15-15)
  • PodReadyTimeout (32-32)
  • TestVLAN (61-61)
  • TestVlanQoS (63-63)
  • TestMinTxRate (65-65)
  • TestMaxTxRate (67-67)
  • DefaultTestMTU (56-56)
  • NADTimeout (40-40)
  • PollingInterval (28-28)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (15)
  • NetworkOption (88-88)
  • CreateSriovNetwork (140-155)
  • RemoveSriovNetwork (168-177)
  • RemoveSriovPolicy (184-203)
  • WaitForSriovPolicyReady (45-47)
  • InitVF (209-211)
  • WithSpoof (91-95)
  • WithTrust (98-102)
  • VerifyLinkStateConfiguration (589-638)
  • UpdateSriovPolicyMTU (305-356)
  • InitDpdkVF (217-219)
  • CreateDpdkTestPod (383-402)
  • DeleteDpdkTestPod (405-414)
  • WaitForPodWithLabelReady (417-454)
  • GetPciAddress (729-765)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
  • SriovOcpConfig (13-13)
🔇 Additional comments (17)
tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml (1)

1-50: LGTM!

The default configuration provides a sensible baseline for common SR-IOV NICs (Intel, Broadcom, Mellanox). The structure aligns with DeviceConfig in config.go, and the documentation clearly indicates these are overridable via the SRIOV_DEVICES environment variable.

tests/ocp/sriov/tests/basic.go (5)

25-43: LGTM!

The helper functions are well-documented and handle edge cases appropriately. isBCMDevice correctly identifies Broadcom NICs by both name pattern and vendor ID for DPDK exclusion (OCPBUGS-30909).


45-72: LGTM!

The namespace setup follows good test isolation patterns with proper cleanup registration via DeferCleanup. The Eventually block ensures the namespace is ready before proceeding.


74-88: LGTM!

Clean helper that properly abstracts network creation with cleanup registration.


139-175: LGTM!

The test structure is well-organized with proper device iteration, skip conditions for environmental issues (NO-CARRIER, no matching devices), and cleanup via helpers.


450-528: LGTM!

The DPDK test is comprehensive with proper BCM skip (referencing OCPBUGS-30909), NAD readiness wait, and thorough PCI address verification from network-status annotations.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)

24-31: LGTM!

The DeviceConfig struct is well-defined with appropriate YAML tags matching the configuration file structure.


218-239: LGTM!

The implementation correctly checks if ECO_OCP_SRIOV_VF_NUM was explicitly set before falling back to the legacy SRIOV_VF_NUM variable. This addresses the concern from the previous review about the logic flaw when comparing against DefaultVFNum.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (9)

39-81: LGTM!

The operator and cluster stability checks are well-implemented with proper polling and dual verification (SR-IOV node states + MCP stability).


87-137: LGTM!

Clean functional options pattern for configuring SR-IOV networks. Each option follows a consistent closure-based approach.


139-155: LGTM!

Network creation correctly applies options and waits for NAD readiness before returning.


205-279: LGTM!

The VF initialization functions are well-documented about their single-node targeting behavior. The cleanup of failed policies (line 268) prevents orphaned resources.


362-454: LGTM!

Pod creation and waiting functions are well-implemented with proper timeout handling and readiness checks.


460-503: LGTM!

Interface verification helpers are straightforward and correctly parse ip link show output.


683-722: LGTM!

The executeOnNode function correctly uses a debug pod with host network/PID and nsenter to execute commands on the node. The pod name truncation to 63 characters respects Kubernetes naming limits.


728-765: LGTM!

PCI address extraction correctly parses the network-status annotation and handles namespaced network names.


771-842: LGTM!

The cleanup function is thorough with best-effort semantics, cleaning up namespaces, networks, and policies that match test naming conventions. The regex pattern correctly identifies test resources.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch 2 times, most recently from 638ff3f to b3de097 Compare December 13, 2025 00:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

24-26: Default SupportsMinTxRate: true may cause issues with unsupported NICs.

Some NICs (e.g., BCM, older Intel) don't support min TX rate. When parsing from SRIOV_DEVICES without the optional 5th field, SupportsMinTxRate defaults to true, which may cause test failures on unsupported hardware.

Consider defaulting to false for safer behavior, or document this limitation clearly in the env var format documentation.

tests/ocp/sriov/tests/basic.go (1)

96-100: Move vfNum initialization inside BeforeAll to ensure config is loaded.

vfNum is assigned during Describe block registration, which may happen before SriovOcpConfig is fully initialized. Move this assignment into BeforeAll after configuration validation.

 		var (
-			vfNum       = SriovOcpConfig.GetVFNum()
+			vfNum       int
 			workerNodes []*nodes.Builder
 			testData    []sriovconfig.DeviceConfig
 		)

 		BeforeAll(func() {
 			By("Checking the SR-IOV operator is running")
 			err := sriovenv.CheckSriovOperatorStatus()
 			Expect(err).ToNot(HaveOccurred(), "SR-IOV operator is not running")

+			vfNum = SriovOcpConfig.GetVFNum()
+
 			By("Loading SR-IOV device configuration")
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

333-356: Policy spec fields may still be lost during MTU update.

While the past review was marked as addressed, the current implementation still only copies specific fields (DeviceType, Vendor, DeviceID, Priority). Other policy fields like IsRdma, LinkType, ExternallyManaged, and the full NicSelector may be lost during the delete/recreate cycle.

Consider using DeepCopy for the entire NicSelector or spec:

 	// Preserve other policy settings from original spec
 	if spec.DeviceType != "" {
 		newPolicy = newPolicy.WithDevType(spec.DeviceType)
 	}

-	if spec.NicSelector.Vendor != "" {
-		newPolicy.Definition.Spec.NicSelector.Vendor = spec.NicSelector.Vendor
-	}
-
-	if spec.NicSelector.DeviceID != "" {
-		newPolicy.Definition.Spec.NicSelector.DeviceID = spec.NicSelector.DeviceID
-	}
+	// Copy NicSelector fields (PfNames already set in constructor)
+	newPolicy.Definition.Spec.NicSelector.Vendor = spec.NicSelector.Vendor
+	newPolicy.Definition.Spec.NicSelector.DeviceID = spec.NicSelector.DeviceID
+	newPolicy.Definition.Spec.NicSelector.RootDevices = spec.NicSelector.RootDevices
+	newPolicy.Definition.Spec.NicSelector.NetFilter = spec.NicSelector.NetFilter

 	if spec.Priority != 0 {
 		newPolicy.Definition.Spec.Priority = spec.Priority
 	}
+
+	// Preserve additional spec fields
+	newPolicy.Definition.Spec.IsRdma = spec.IsRdma
+	newPolicy.Definition.Spec.LinkType = spec.LinkType
🧹 Nitpick comments (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

155-163: Potential inconsistency between GetVFNum() fallback and YAML default.

GetVFNum() returns DefaultVFNum when VFNum <= 0, but the YAML config might intentionally set a value. If YAML parsing fails silently or sets VFNum = 0, this method would mask that failure. Consider logging when the fallback is used.

 func (sriovOcpConfig *SriovOcpConfig) GetVFNum() int {
 	if sriovOcpConfig.VFNum <= 0 {
+		klog.V(90).Infof("VFNum not configured or invalid, using default: %d", DefaultVFNum)
 		return DefaultVFNum // fallback if YAML wasn't loaded properly
 	}

 	return sriovOcpConfig.VFNum
 }
tests/ocp/sriov/tests/basic.go (1)

139-175: Test pattern is solid but consider extracting common logic.

The test structure is correct with proper VF initialization, namespace setup, network creation, and verification. However, the pattern is repeated across multiple tests (spoof on/off, trust on/off, etc.) with only minor variations. Consider a table-driven test approach to reduce duplication.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

802-802: Regex pattern may have unintended matches.

The pattern ^\d{5}-|\w+dpdknet$ uses | (OR), meaning it matches:

  1. Anything starting with 5 digits and a hyphen, OR
  2. Anything ending with dpdknet (with word chars before)

The \w+dpdknet$ part would also match legitimate non-test networks if they happen to end with dpdknet. Consider making the pattern more specific or documenting the expected naming conventions clearly.

-	pattern := regexp.MustCompile(`^\d{5}-|\w+dpdknet$`)
+	// Match test networks: "XXXXX-*" (5-digit caseID prefix) or "*-dpdk" (DPDK suffix)
+	pattern := regexp.MustCompile(`^\d{5}-.*|-dpdk$`)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddf26dc11e94fbccb5204b1c5b9b23b7c2ba99ed and b3de097.

📒 Files selected for processing (6)
  • .cursorrules (1 hunks)
  • .gitignore (1 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml (1 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (5 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (1)
  • IsSriovDeployed (32-54)
tests/ocp/sriov/internal/tsparams/consts.go (13)
  • MCPStableInterval (38-38)
  • NADTimeout (40-40)
  • PollingInterval (28-28)
  • NamespaceTimeout (30-30)
  • PolicyApplicationTimeout (42-42)
  • PodReadyTimeout (32-32)
  • TestPodClientIP (51-51)
  • TestPodClientMAC (47-47)
  • CleanupTimeout (34-34)
  • TestPodServerIP (53-53)
  • TestPodServerMAC (49-49)
  • CarrierWaitTimeout (36-36)
  • DebugPodCleanupTimeout (44-44)
🔇 Additional comments (12)
.cursorrules (2)

1-807: Comprehensive project guidelines document.

This is a well-structured documentation file covering test conventions, code organization, and best practices. The guidelines align with the implementation patterns seen in the other files in this PR.


9-9: Go version v1.24.x is valid and was released in 2025. The latest stable version is currently 1.25.5 (as of December 2025), so this requirement is reasonable. No action needed.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)

61-66: Good defensive check for runtime path discovery.

The check for runtime.Caller success prevents potential panics when determining the config file path. This is a solid improvement for reliability.


222-243: Legacy VFNum parsing correctly checks for new env var first.

The implementation now properly checks if ECO_OCP_SRIOV_VF_NUM is set before falling back to the legacy SRIOV_VF_NUM variable. This addresses the previous review concern about the parsing logic.

tests/ocp/sriov/tests/basic.go (3)

45-72: Well-structured namespace setup with proper cleanup.

The setupTestNamespace helper properly creates namespaces with required labels, waits for existence confirmation, and registers cleanup via DeferCleanup. This follows the project guidelines for resource management.


450-528: DPDK test implementation is thorough.

The DPDK test properly:

  • Skips BCM NICs with documented bug reference (OCPBUGS-30909)
  • Waits for NAD readiness before pod creation
  • Validates PCI address assignment
  • Checks network status annotations

Good implementation following the test patterns.


119-137: AfterAll cleanup continues despite individual failures - good resilience.

The cleanup logic collects errors from individual policy removals and only logs warnings, allowing cleanup to continue for remaining policies. This is a good pattern for test cleanup.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (5)

39-42: Good reuse of existing operator check.

CheckSriovOperatorStatus delegates to the existing sriovoperator.IsSriovDeployed function from tests/internal/sriovoperator, following the guideline to reuse existing functionality rather than reimplementing.


87-137: Clean functional options pattern for network configuration.

The NetworkOption type with With* functions provides a flexible, extensible API for configuring SR-IOV networks. This pattern allows adding new options without breaking existing callers.


683-722: Debug pod for node-level command execution is well-implemented.

The executeOnNode function properly:

  • Truncates pod name to 63 characters for K8s compliance
  • Cleans up existing debug pods before creating new ones
  • Uses nsenter to access host namespaces
  • Ensures cleanup via defer

509-585: CheckVFStatusWithPassTraffic has comprehensive verification flow.

The function creates client/server pods, verifies interfaces, checks carrier status, validates spoof checking when applicable, and tests connectivity. The deferred cleanup ensures pods are removed even on failure.

One minor note: the function uses hardcoded interface name "net1" (lines 540, 544, 549, 656). Consider making this configurable if different interface names might be used.


167-177: Correct handling of "already deleted" case in cleanup.

RemoveSriovNetwork returns nil when the network pull fails (indicating it's already deleted). This is the correct pattern for idempotent cleanup operations.

.gitignore Outdated
reports
.vscode

.cursorrules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistency: .cursorrules is both added and ignored.

The .cursorrules file is being added to the repository (as shown in this PR) but is also being added to .gitignore. This is contradictory:

  • If it's meant to be a shared project guideline file, remove it from .gitignore
  • If it's a local IDE configuration file, don't commit it to the repository

Based on the content (project-wide guidelines), it appears this should be a committed file and should not be in .gitignore.

🤖 Prompt for AI Agents
In .gitignore around line 24, the .cursorrules entry conflicts with the PR which
adds a .cursorrules file; remove the .cursorrules line from .gitignore so the
shared project guideline file remains tracked, or alternatively if it should be
local only, revert the commit adding .cursorrules and keep the ignore — choose
the former (remove the ignore entry) given the file is project-wide.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch from b3de097 to a23189a Compare December 19, 2025 21:01
- Add DeviceConfig struct for SR-IOV device configuration
- Add GetSriovDevices() method with ECO_OCP_SRIOV_DEVICES env var support
- Add GetVFNum() method with error handling
- Move all defaults to default.yaml
- Fix suite-level labels to exclude feature-specific labels
- Add constants for timeouts, labels, and test parameters
@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch 2 times, most recently from 4a8f652 to ead7a0a Compare December 19, 2025 22:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)

36-51: LGTM! Clean delegation to existing utilities.

The functions properly delegate to existing SR-IOV operator utilities. The hardcoded "worker" MCP label in WaitForSriovPolicyReady (line 48) is reasonable for most OCP clusters, though parameterizing it could provide additional flexibility for custom MCP configurations.


273-326: MTU update logic works correctly for common use cases.

The delete/recreate pattern for MTU updates is appropriate given the PolicyBuilder API limitations. The code preserves the most commonly used policy fields (DeviceType, Vendor, DeviceID, Priority). For completeness, consider whether other Spec fields like IsRdma, LinkType, or ExternallyManaged might need preservation in your test scenarios.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3de097 and ead7a0a.

📒 Files selected for processing (6)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go (3 hunks)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (1 hunks)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go (1 hunks)
  • tests/ocp/sriov/internal/tsparams/consts.go (2 hunks)
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (2 hunks)
  • tests/ocp/sriov/tests/basic.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
  • tests/ocp/sriov/internal/tsparams/ocpsriovvars.go
🧰 Additional context used
🧬 Code graph analysis (3)
tests/ocp/sriov/tests/basic.go (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • DeviceConfig (25-31)
tests/ocp/sriov/internal/tsparams/consts.go (12)
  • BCMVendorID (72-72)
  • NamespaceTimeout (30-30)
  • CleanupTimeout (34-34)
  • LabelBasic (15-15)
  • PodReadyTimeout (32-32)
  • TestVLAN (61-61)
  • TestVlanQoS (63-63)
  • TestMinTxRate (65-65)
  • TestMaxTxRate (67-67)
  • DefaultTestMTU (56-56)
  • NADTimeout (40-40)
  • PollingInterval (28-28)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (10)
  • NetworkOption (58-58)
  • CreateSriovNetwork (110-125)
  • WithSpoof (61-65)
  • WithTrust (68-72)
  • UpdateSriovPolicyMTU (275-326)
  • InitDpdkVF (187-189)
  • CreateDpdkTestPod (353-372)
  • DeleteDpdkTestPod (375-384)
  • WaitForPodWithLabelReady (387-424)
  • GetPciAddress (672-707)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
  • Name (22-22)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (4)
  • IsSriovDeployed (32-54)
  • WaitForSriovAndMCPStable (80-100)
  • RemoveAllSriovNetworks (156-176)
  • RemoveAllPoliciesAndWaitForSriovAndMCPStable (180-195)
tests/internal/cluster/cluster.go (1)
  • ExecCmdWithStdout (98-180)
🔇 Additional comments (18)
tests/ocp/sriov/internal/tsparams/consts.go (1)

1-73: LGTM! Well-structured test constants.

The constants are well-documented, follow consistent naming conventions, and use appropriate values for SR-IOV testing. Timeout values are reasonable for OCP operations, and test data (MAC addresses, IPs, MTU, VLAN/QoS values) are all within valid ranges.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (4)

19-22: Path change aligns with runtime.Caller usage.

The removal of the ./ prefix from PathToDefaultOcpSriovParamsFile is appropriate given that the path is now resolved using runtime.Caller and filepath.Join (lines 58-66), which provides a more robust way to locate the config file relative to the source file location.


24-31: LGTM! Clean struct definition.

The DeviceConfig struct is well-designed with appropriate fields for SR-IOV device configuration. YAML tags follow conventional snake_case naming.


58-83: Excellent improvement in config path resolution.

Using runtime.Caller to determine the config file path is a robust pattern that ensures the configuration file is found relative to the source file location, regardless of where the tests are executed from. The enhanced error messages also improve debuggability.


85-157: Well-implemented device configuration with flexible parsing.

The implementation properly handles both environment variable overrides and YAML configuration. The parseSriovDevicesEnv function includes good format validation and clear error messages. The optional 5th field for minTxRate provides flexibility for device-specific capabilities.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (8)

57-147: Excellent use of functional options pattern.

The network management functions are well-designed with the functional options pattern providing flexible configuration. The RemoveSriovNetwork function correctly handles the already-deleted case (line 143), which is important for cleanup reliability.


175-249: VF initialization logic is sound with appropriate single-node constraints.

The VF initialization functions correctly document their single-node behavior (lines 176-178, 184-186) and implement a robust first-success pattern. The cleanup-on-failure approach (lines 235-240) prevents resource leaks. The comments clearly communicate the intended usage for single-node or targeted testing scenarios.


332-424: Robust pod creation and management with proper readiness checks.

The pod creation functions correctly configure privileged containers and secondary networks for SR-IOV testing. The WaitForPodWithLabelReady function includes comprehensive checks for both pod phase and container readiness (lines 402-419), ensuring pods are truly ready before tests proceed.


430-473: LGTM! Interface verification functions are well-implemented.

The functions correctly parse standard ip link show output to verify interface state and extract MAC addresses. Error handling is appropriate, and the string parsing logic is clear and robust.


479-604: Comprehensive connectivity testing with proper cleanup.

The connectivity test functions are well-designed with deferred cleanup for all resources. The VerifyLinkStateConfiguration function appropriately returns (false, nil) on carrier timeout (lines 597-600), allowing tests to skip rather than fail when there's no physical connection. This design provides good test flexibility for different hardware setups.


610-664: Node-level spoof check verification is correctly implemented.

The function properly executes commands on the node using the cluster helper and includes robust hostname matching logic (lines 637-644) to handle variations in hostname formats. The spoof check string matching covers both verbose and abbreviated formats (line 654-655).


670-707: LGTM! PCI address extraction correctly parses network-status annotation.

The function properly unmarshals the Kubernetes network-status annotation and extracts the PCI address. Error handling covers both missing annotations and interfaces without PCI addresses, providing clear feedback for debugging.


713-750: Comprehensive cleanup with appropriate error resilience.

The cleanup function properly handles leftover test resources and continues even if individual cleanup steps fail, logging warnings instead of failing (lines 723, 731, 738, 744). This resilience is important for cleanup routines to ensure maximum resource removal even when some operations fail.

tests/ocp/sriov/tests/basic.go (5)

25-88: Well-designed helper functions with proper cleanup.

The helper functions are cleanly implemented with appropriate cleanup using DeferCleanup. The isNoCarrierError function correctly identifies NO-CARRIER conditions (lines 34-35), and isBCMDevice properly checks both name and vendor ID for BCM devices (lines 41-42) as referenced in OCPBUGS-30909.


90-141: Proper test suite setup with validation and cleanup.

The suite setup correctly initializes vfNum inside BeforeAll (line 108) after config validation, addressing the initialization timing concern from previous reviews. The AfterAll cleanup properly removes policies and waits for cluster stability (lines 138-140), ensuring a clean state for subsequent test runs.


143-329: Test cases follow consistent patterns with proper skip logic.

The first five test cases are well-structured with consistent patterns for device iteration, VF initialization, and verification. The VLAN test correctly checks SupportsMinTxRate (lines 293-297) before attempting to configure rate limits, preventing failures on unsupported hardware. The skip logic for NO-CARRIER conditions is appropriate for handling physical connection issues.


366-407: Excellent two-part verification for link state test.

The enabled link state test (lines 366-407) uses a smart two-part approach: first verifying the configuration is applied (lines 389-391), then testing connectivity only if carrier is present (lines 393-401). This allows the test to validate configuration even when physical connections are unavailable, improving test coverage and reducing false negatives.


454-532: Comprehensive DPDK test with appropriate BCM handling.

The DPDK test correctly skips BCM devices (lines 462-467) per OCPBUGS-30909 and includes thorough validation: NAD readiness (lines 484-489), pod creation with cleanup (lines 496-500), PCI address verification (lines 507-511), and network-status annotation checks (lines 520-526). The test properly validates all aspects of DPDK SR-IOV functionality.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch from ead7a0a to 5b2e42c Compare December 23, 2025 03:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

273-326: MTU update may still not preserve all policy fields.

The delete-and-recreate pattern at lines 293-318 only explicitly copies DeviceType, NicSelector.Vendor, NicSelector.DeviceID, and Priority. Despite deep-copying the spec at line 284, other fields like IsRdma, LinkType, ExternallyManaged, and any future spec additions would be lost during the update.

Although a past review flagged this and was marked as addressed, the current implementation doesn't preserve all fields. Consider one of these approaches:

  1. Assign the entire deep-copied spec and only override MTU:

    newPolicy.Definition.Spec = *spec
    newPolicy.Definition.Spec.Mtu = mtuValue
  2. Or preserve each spec field explicitly if selective copy is required.

🔎 Shell script to verify which SR-IOV policy spec fields exist
#!/bin/bash
# Check SR-IOV NetworkNodePolicy CRD to identify all spec fields
kubectl get crd sriovnetworknodepolicies.sriovnetwork.openshift.io -o jsonpath='{.spec.versions[0].schema.openAPIV3Schema.properties.spec.properties}' | jq 'keys'
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)

110-147: Default SupportsMinTxRate: true may be unsafe for some devices.

While the implementation correctly supports an optional 5th field for minTxRate (lines 132-135), it defaults to true (line 130) when the field is absent. Some NICs (e.g., BCM, older Intel) don't support minimum TX rate configuration, and attempting to configure it can cause policy application failures.

A past review comment suggested this issue, and while the optional field was added, the default remains true. Consider defaulting to false for safer compatibility, or document this limitation clearly in the format description comment (lines 111-112).

🔎 Suggested safer default
 func parseSriovDevicesEnv(envDevices string) ([]DeviceConfig, error) {
 	var devices []DeviceConfig
 
 	entries := strings.Split(envDevices, ",")
 	for _, entry := range entries {
 		entry = strings.TrimSpace(entry)
 		if entry == "" {
 			continue
 		}
 
 		parts := strings.Split(entry, ":")
 		if len(parts) != 4 && len(parts) != 5 {
 			return nil, fmt.Errorf(
 				"invalid ECO_OCP_SRIOV_DEVICES entry %q - expected format: name:deviceid:vendor:interface[:minTxRate]",
 				entry)
 		}
 
-		supportsMinTxRate := defaultSupportsMinTxRate
+		// Default to false for safer compatibility with older NICs
+		supportsMinTxRate := false
 
 		if len(parts) == 5 {
 			val := strings.ToLower(strings.TrimSpace(parts[4]))
 			supportsMinTxRate = val == "true" || val == "1" || val == "yes"
 		}
🧹 Nitpick comments (5)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (4)

41-51: Consider using DefaultMcpLabel constant for consistency.

Line 48 hardcodes "worker" as the MCP name, but line 29 defines DefaultMcpLabel = "machineconfiguration.openshift.io/role=worker". While these serve different purposes (MCP name vs label selector), this creates potential confusion. If the intent is to always use worker nodes, consider extracting "worker" as a named constant (e.g., DefaultMcpName) or adding a comment explaining why the constant isn't used here.


175-189: VF initialization creates policies on only one node.

Both InitVF and InitDpdkVF create an SR-IOV policy on the first node where device discovery succeeds and immediately return. While the function comments document this behavior (lines 177-178, 185-186), this single-node limitation could cause issues in multi-node test scenarios where VFs are expected on multiple nodes. If tests require VFs on specific nodes or all nodes, consider either:

  1. Adding a InitVFOnAllNodes variant that iterates all nodes without early return, or
  2. Accepting a target node parameter to allow explicit node selection.

430-452: Interface status parsing relies on brittle string matching.

VerifyInterfaceReady (line 437) checks for "UP" substring and CheckInterfaceCarrier (line 451) checks for absence of "NO-CARRIER" in ip link show output. While this works for current versions, the output format could change with different kernel versions or iproute2 versions. Consider using structured parsing (e.g., ip -json link show) if available, or at minimum add comments documenting the expected output format and any known version dependencies.


479-551: Spoof check verification depends on description string content.

Lines 529-538 conditionally verify spoof checking only if the word "spoof checking" appears in the description parameter, and infer the expected state ("on" or "off") by checking if "off" is in the description. This creates fragile coupling between function behavior and informal string content. Consider adding explicit boolean parameters (verifySpoofCheck bool, expectedSpoofState string) to make the function contract clearer and less error-prone.

tests/ocp/sriov/tests/basic.go (1)

123-141: AfterAll stability check may block subsequent test runs.

Line 139 uses Expect to assert cluster stability after cleanup. If the cluster fails to stabilize (e.g., due to stuck MCP update or SR-IOV operator issue), the AfterAll will fail and prevent subsequent test suite runs from starting cleanly. Consider using a warning log instead of Expect here, or implementing a forced cleanup that bypasses the stability check after a timeout.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ead7a0a and 5b2e42c.

📒 Files selected for processing (3)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go
  • tests/ocp/sriov/tests/basic.go
🧰 Additional context used
🧬 Code graph analysis (3)
tests/ocp/sriov/tests/basic.go (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
  • DeviceConfig (26-32)
tests/ocp/sriov/internal/tsparams/consts.go (12)
  • BCMVendorID (72-72)
  • NamespaceTimeout (30-30)
  • CleanupTimeout (34-34)
  • LabelBasic (15-15)
  • PodReadyTimeout (32-32)
  • TestVLAN (61-61)
  • TestVlanQoS (63-63)
  • TestMinTxRate (65-65)
  • TestMaxTxRate (67-67)
  • DefaultTestMTU (56-56)
  • NADTimeout (40-40)
  • PollingInterval (28-28)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (17)
  • NetworkOption (58-58)
  • CreateSriovNetwork (110-125)
  • RemoveSriovNetwork (138-147)
  • WaitForSriovPolicyReady (43-51)
  • InitVF (179-181)
  • WithSpoof (61-65)
  • WithTrust (68-72)
  • WithVLAN (75-79)
  • WithVlanQoS (82-86)
  • WithMinTxRate (89-93)
  • WithMaxTxRate (96-100)
  • WithLinkState (103-107)
  • InitDpdkVF (187-189)
  • CreateDpdkTestPod (353-372)
  • DeleteDpdkTestPod (375-384)
  • WaitForPodWithLabelReady (387-424)
  • GetPciAddress (672-707)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
  • Name (22-22)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
  • SriovOcpConfig (13-13)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (4)
  • IsSriovDeployed (32-54)
  • WaitForSriovAndMCPStable (80-100)
  • RemoveAllSriovNetworks (156-176)
  • RemoveAllPoliciesAndWaitForSriovAndMCPStable (180-195)
tests/internal/cluster/cluster.go (1)
  • ExecCmdWithStdout (98-180)
🔇 Additional comments (6)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)

553-604: LGTM! Good error handling design for carrier timeout.

VerifyLinkStateConfiguration returns (false, nil) when carrier wait times out (line 600) rather than returning an error. This allows calling tests to gracefully skip when there's no physical connection rather than failing outright. This is a good pattern for test environments where NO-CARRIER might be an acceptable condition depending on lab setup.


670-707: PCI address extraction is well-implemented.

The JSON parsing at lines 683-694 correctly handles the nested structure of the network-status annotation, and the error messages clearly distinguish between "PCI address not present" (line 702) vs "interface not found" (line 706). This is helpful for debugging.

tests/ocp/sriov/tests/basic.go (2)

143-532: Test isolation is well-designed with per-test resources.

Each test case creates its own namespace (with unique testID prefix) and SR-IOV network using setupTestNamespace and setupSriovNetwork with proper DeferCleanup registration. This ensures test isolation and prevents resource conflicts. The pattern of checking executed flag and skipping when no devices match (e.g., lines 176-178) provides clear feedback when tests are skipped due to configuration.


454-532: DPDK test correctly skips BCM devices per known issue.

Lines 462-467 skip Broadcom NICs for DPDK tests with reference to OCPBUGS-30909. The test then comprehensively validates DPDK VF functionality: NAD creation, DPDK pod creation with proper label, PCI address assignment, and network-status annotation presence. The two-stage annotation check (lines 520-526) - first for "pci-address" substring, then for the actual value - is good defensive validation.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)

45-84: LGTM! Improved config file path resolution.

Lines 59-67 use runtime.Caller(0) to determine the config file path relative to the source file location, which is more robust than the previous relative path approach. This ensures default.yaml is always found correctly regardless of where tests are executed from. The enhanced error logging (lines 71, 78) also improves debuggability.


188-206: Good fix for EOF handling in YAML decoding.

Line 201 correctly handles the EOF condition from the YAML decoder - io.EOF is expected at end of file and shouldn't be treated as an error. The condition err != nil && err != io.EOF prevents false errors during config loading.

@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch from 5b2e42c to 8ca8a42 Compare December 23, 2025 15:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/ocp/sriov/tests/basic.go (1)

143-532: Consider extracting common test pattern to reduce duplication.

All nine test cases follow a nearly identical pattern:

  1. Initialize executed flag
  2. Loop over testData
  3. Call InitVF or InitDpdkVF
  4. Create namespace and network
  5. Verify with pass traffic or other checks
  6. Handle NO-CARRIER with Skip
  7. Skip if no devices executed

This repetition spans ~390 lines. Consider extracting a helper function that accepts test-specific parameters (caseID, network options, verification function) to reduce duplication and improve maintainability.

Example refactor approach
// testVFConfiguration is a helper that runs a standard VF test pattern
func testVFConfiguration(caseID string, testData []sriovconfig.DeviceConfig, 
    vfNum int, workerNodes []*nodes.Builder,
    networkOpts []sriovenv.NetworkOption,
    verifyFunc func(string, string, string) error) {
    
    executed := false
    for _, data := range testData {
        // common init/setup logic
        // call verifyFunc for test-specific verification
    }
    if !executed {
        Skip("No SR-IOV devices matched the requested configuration")
    }
}

Then each test becomes:

It("SR-IOV VF with spoof checking enabled", Label("25959"), reportxml.ID("25959"), func() {
    testVFConfiguration("25959", testData, vfNum, workerNodes,
        []sriovenv.NetworkOption{sriovenv.WithSpoof(true), sriovenv.WithTrust(false)},
        func(networkName, interfaceName, namespace string) error {
            return sriovenv.CheckVFStatusWithPassTraffic(networkName, interfaceName,
                namespace, "spoof checking on", tsparams.PodReadyTimeout)
        })
})
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

292-346: Field preservation during MTU update is incomplete but not manifested in current tests.

The delete-and-recreate pattern preserves DeviceType, NicSelector.Vendor, NicSelector.DeviceID, and Priority. However, it doesn't preserve other policy fields like ExternallyManaged, IsRdma, LinkType, or EswitchMode if they were set on the original policy. This is a design limitation but doesn't currently affect tests: the only test using UpdateSriovPolicyMTU (in basic.go) creates policies via InitVF(), which doesn't set these additional fields. Tests that set ExternallyManaged (externalllymanaged.go, qinq.go) don't use MTU updates. To be more robust, consider either preserving all spec fields or providing a safer update mechanism that doesn't require deletion.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b2e42c and 8ca8a42.

📒 Files selected for processing (5)
  • tests/ocp/sriov/internal/ocpsriovconfig/config.go
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go
  • tests/ocp/sriov/internal/tsparams/consts.go
  • tests/ocp/sriov/tests/basic.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/internal/ocpconfig/config.go (1)
  • OcpConfig (20-22)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (4)
  • IsSriovDeployed (32-54)
  • WaitForSriovAndMCPStable (80-100)
  • RemoveAllSriovNetworks (156-176)
  • RemoveAllPoliciesAndWaitForSriovAndMCPStable (180-195)
tests/internal/cluster/cluster.go (1)
  • ExecCmdWithStdout (98-180)
🔇 Additional comments (16)
tests/ocp/sriov/tests/basic.go (3)

25-88: LGTM: Helper functions are well-structured.

The helper functions properly handle error checking, resource creation, and cleanup registration using Ginkgo's DeferCleanup. The NO-CARRIER detection and BCM device identification logic is clear and correct.


102-121: LGTM: BeforeAll setup is correct.

The setup properly loads VF configuration, device data, and discovers worker nodes with appropriate error handling. The vfNum initialization issue from the past review has been correctly addressed by moving the call inside BeforeAll.


454-532: LGTM: DPDK test implementation is thorough.

The DPDK test properly:

  • Skips BCM NICs with bug reference (OCPBUGS-30909)
  • Waits for NAD readiness before pod creation
  • Registers cleanup for DPDK pods
  • Verifies PCI address assignment
  • Validates network-status annotation contains PCI address

The multi-stage verification approach ensures DPDK VF functionality is properly validated.

tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)

59-84: LGTM: Enhanced config initialization with proper error handling.

The use of runtime.Caller(0) correctly determines the config file path relative to this source file, and the enhanced error logging provides better diagnostics when config loading fails.


86-108: LGTM: Device configuration resolution with proper fallback.

The function correctly prioritizes the environment variable override (ECO_OCP_SRIOV_DEVICES) over YAML configuration, with clear error messages when neither source provides devices.


151-160: LGTM: VF number validation is correct.

The function properly validates that VFNum is configured and positive, with a clear error message directing users to the environment variable.

tests/ocp/sriov/internal/tsparams/consts.go (1)

1-78: LGTM: Test constants are well-organized.

The constants are logically grouped (timeouts, test parameters, VLAN/QoS, vendor IDs, resource labels) with clear documentation. The timeout values are reasonable for SR-IOV operations which can involve hardware resets and node reboots.

tests/ocp/sriov/internal/sriovenv/sriovenv.go (9)

37-152: LGTM: Operator checks and network management are well-implemented.

The operator status checks properly delegate to existing utilities, and the functional options pattern for network configuration is clean and extensible. The NAD creation wait logic correctly polls until the NetworkAttachmentDefinition is available.


158-182: LGTM: Policy removal with proper deletion verification.

The function correctly handles the case where the policy is already deleted and polls to confirm deletion completes, which is important for SR-IOV policy cleanup that triggers node-level changes.


184-268: LGTM: VF initialization with first-success pattern.

The implementation correctly:

  • Validates VF number
  • Cleans up existing policies
  • Discovers interfaces when not explicitly provided
  • Creates policy on the first node where device is found
  • Waits for policy to be ready before returning

The single-node design is documented in comments (lines 185-187, 193-195) and appropriate for the test scenarios.


352-448: LGTM: Pod lifecycle management is comprehensive.

The pod creation functions properly:

  • Configure secondary networks with static IP/MAC
  • Set privileged flag for SR-IOV access
  • Wait for pod readiness with appropriate timeouts
  • Handle deletion with proper error checking

The WaitForPodWithLabelReady function correctly validates both pod phase and container ready status.


454-497: LGTM: Interface verification utilities are correct.

The functions properly use ip link show commands and parse output to verify interface state, carrier status, and MAC address. The string parsing is straightforward and handles the expected output format.


503-637: LGTM: Traffic verification with proper NO-CARRIER handling.

Both functions correctly:

  • Generate unique pod names to avoid collisions (lines 509, 590)
  • Register cleanup handlers for test pods
  • Verify interface readiness before testing
  • Handle NO-CARRIER scenarios appropriately

VerifyLinkStateConfiguration intelligently returns a boolean carrier status rather than failing, allowing callers to decide whether to skip tests when physical connections are absent (lines 630-634).


643-697: LGTM: Spoof check verification with robust hostname matching.

The function correctly:

  • Extracts pod node name and MAC address
  • Executes ip link show on the target node via cluster utilities
  • Handles hostname variations (short vs FQDN) in output map (lines 671-677)
  • Searches for both "spoof checking" and "spoofchk" formats in output

703-740: LGTM: PCI address extraction from network-status annotation.

The function properly:

  • Pulls the pod to get current annotations
  • Parses the CNI network-status JSON annotation
  • Finds the requested interface and extracts PCI address
  • Returns clear errors when annotation, interface, or PCI address is missing

746-800: LGTM: Comprehensive cleanup with graceful error handling.

The cleanup function properly:

  • Uses label selector for safe cleanup in shared clusters (lines 755-768)
  • Falls back to prefix-based cleanup for backward compatibility (lines 771-784)
  • Delegates to existing cleanup utilities for networks and policies
  • Logs warnings without failing when cleanup encounters errors (appropriate for best-effort cleanup)

Comment on lines +110 to +149
// parseSriovDevicesEnv parses device configuration from ECO_OCP_SRIOV_DEVICES environment variable.
// Format: "name1:deviceid1:vendor1:interface1[:minTxRate],name2:deviceid2:vendor2:interface2[:minTxRate],..."
// Example: "e810xxv:159b:8086:ens2f0,e810c:1593:8086:ens2f2".
// Note: SupportsMinTxRate defaults to true when omitted from env var.
// For YAML-defined devices, ensure supports_min_tx_rate is explicitly set.
func parseSriovDevicesEnv(envDevices string) ([]DeviceConfig, error) {
var devices []DeviceConfig

entries := strings.Split(envDevices, ",")
for _, entry := range entries {
entry = strings.TrimSpace(entry)
if entry == "" {
continue
}

parts := strings.Split(entry, ":")
if len(parts) != 4 && len(parts) != 5 {
return nil, fmt.Errorf(
"invalid ECO_OCP_SRIOV_DEVICES entry %q - expected format: name:deviceid:vendor:interface[:minTxRate]",
entry)
}

supportsMinTxRate := defaultSupportsMinTxRate

if len(parts) == 5 {
val := strings.ToLower(strings.TrimSpace(parts[4]))
supportsMinTxRate = val == "true" || val == "1" || val == "yes"
}

devices = append(devices, DeviceConfig{
Name: strings.TrimSpace(parts[0]),
DeviceID: strings.TrimSpace(parts[1]),
Vendor: strings.TrimSpace(parts[2]),
InterfaceName: strings.TrimSpace(parts[3]),
SupportsMinTxRate: supportsMinTxRate,
})
}

return devices, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for documentation about SupportsMinTxRate, ECO_OCP_SRIOV_DEVICES, and supports_min_tx_rate
rg -n -C3 "SupportsMinTxRate|ECO_OCP_SRIOV_DEVICES|supports_min_tx_rate" \
  --type md --type yaml --type txt

Repository: rh-ecosystem-edge/eco-gotests

Length of output: 3669


🏁 Script executed:

# Find the definition of defaultSupportsMinTxRate
rg -n "defaultSupportsMinTxRate" --type go

Repository: rh-ecosystem-edge/eco-gotests

Length of output: 408


🏁 Script executed:

# Look at the full config.go file to understand context
wc -l tests/ocp/sriov/internal/ocpsriovconfig/config.go

Repository: rh-ecosystem-edge/eco-gotests

Length of output: 126


🏁 Script executed:

# Check if there are tests for the parseSriovDevicesEnv function
rg -n "parseSriovDevicesEnv|TestParseSriov" --type go -A5

Repository: rh-ecosystem-edge/eco-gotests

Length of output: 1752


Env var default for SupportsMinTxRate contradicts YAML-defined device capabilities.

The parseSriovDevicesEnv function defaults SupportsMinTxRate to true when the optional 5th field is omitted, but the YAML defaults show many common devices explicitly set to false (e.g., Intel x710, Broadcom bcm57414, bcm57508). This creates an inconsistency: users specifying devices via ECO_OCP_SRIOV_DEVICES get a permissive default, while YAML-defined devices require explicit configuration.

While the code comment (lines 113-114) documents this behavior, it places the burden on users to remember the :false suffix for known unsupporting devices.

Consider either changing the env var default to false (aligning with the YAML defaults for most device types) or providing a mapping of device types to capabilities.

🤖 Prompt for AI Agents
In tests/ocp/sriov/internal/ocpsriovconfig/config.go around lines 110 to 149 the
env parser defaults SupportsMinTxRate to true when the 5th field is omitted,
which contradicts YAML device defaults (many common devices default to false);
change the env parsing to default SupportsMinTxRate to false (or consult a
device-capability map if you prefer per-device accuracy) by replacing the use of
defaultSupportsMinTxRate here with false (or a lookup) and update the function
comment to reflect the new default; also add/adjust unit tests to cover entries
without the 5th field expecting false.

root added 2 commits December 23, 2025 11:29
- Add SR-IOV environment helper functions using eco-goinfra
- Use existing sriovoperator functions for stability checks and cleanup
- Add network creation with NetworkOption pattern
- Add policy creation and VF initialization helpers
- Add pod verification and traffic testing utilities
- Add 9 basic SR-IOV test cases with test IDs as labels
- Test spoof checking, trust, VLAN, link state, MTU, min/max tx rate
- Add DPDK test with BCM NIC skip (OCPBUGS-30909)
- Use constants from tsparams for test values
- Add proper cleanup in AfterAll and AfterEach
@zhiqiangf zhiqiangf force-pushed the pr3-sriov-suite-tests branch from 8ca8a42 to 56491cc Compare December 23, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant