-
Notifications
You must be signed in to change notification settings - Fork 49
ocp-sriov: add sriovenv helper functions #985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds SR‑IOV device configuration parsing and VF count handling, a new comprehensive SR‑IOV test environment module (network/NAD/policy/VF lifecycle, pod helpers, discovery, and cleanup), many test timing/network constants, and a small tsparams cleanup (removed DefaultTimeout/import). Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
bfb7a3b to
7e0a1c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/ocp/internal/ocpconfig/config.go (1)
77-84: Handleio.EOFfor empty config files.
yaml.Decoder.Decodereturnsio.EOFwhen the file is empty. Since the GeneralConfig preservation logic (lines 38-55) is designed to handle empty files, this should not be treated as a fatal error.+import "io" + func readFile(ocpConfig *OcpConfig, cfgFile string) error { openedCfgFile, err := os.Open(cfgFile) if err != nil { return err } defer func() { _ = openedCfgFile.Close() }() decoder := yaml.NewDecoder(openedCfgFile) err = decoder.Decode(ocpConfig) - if err != nil { + if err != nil && err != io.EOF { return err } return nil }
🧹 Nitpick comments (4)
tests/ocp/sriov/internal/tsparams/sriovvars.go (2)
65-71: DefaultSupportsMinTxRate: truemay not be accurate for all devices.When users specify devices via
SRIOV_DEVICES, defaultingSupportsMinTxRatetotruecould cause test failures for devices that don't support it. Consider adding an optional 5th field to the format (e.g.,name:deviceid:vendor:interface:minTxRate).
85-89: Consider returning error instead of panic for invalid config.Panicking in configuration parsing can crash the entire test suite. Returning an error would allow callers to handle this more gracefully (e.g., skip tests that require device config).
-// GetDeviceConfig returns device configuration from environment variable or defaults. -func GetDeviceConfig() []DeviceConfig { +// GetDeviceConfig returns device configuration from environment variable or defaults. +// Returns error if SRIOV_DEVICES is set but contains no valid entries. +func GetDeviceConfig() ([]DeviceConfig, error) { envDevices := os.Getenv("SRIOV_DEVICES") if devices := parseDeviceConfig(); len(devices) > 0 { - return devices + return devices, nil } if envDevices != "" { - panic(fmt.Sprintf( + return nil, fmt.Errorf( "SRIOV_DEVICES is set to %q but no valid entries could be parsed; "+ - "expected format: name:deviceid:vendor:interface", envDevices)) + "expected format: name:deviceid:vendor:interface", envDevices) } - return GetDefaultDeviceConfig() + return GetDefaultDeviceConfig(), nil }tests/ocp/sriov/internal/tsparams/consts.go (1)
59-62: MAC addresses are not locally administered.The test MAC addresses
20:04:0f:f1:88:01and20:04:0f:f1:88:02are not locally administered (the second least significant bit of the first octet should be 1, e.g.,22:04:...). While this likely won't cause issues in isolated test environments, using locally administered MACs is best practice to avoid potential conflicts.- // TestPodClientMAC is the default MAC address for test client pods. - TestPodClientMAC = "20:04:0f:f1:88:01" - // TestPodServerMAC is the default MAC address for test server pods. - TestPodServerMAC = "20:04:0f:f1:88:02" + // TestPodClientMAC is the default MAC address for test client pods (locally administered). + TestPodClientMAC = "02:04:0f:f1:88:01" + // TestPodServerMAC is the default MAC address for test server pods (locally administered). + TestPodServerMAC = "02:04:0f:f1:88:02"tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
21-21: Dot import forocpsriovinittools.Using dot imports (
. "...") can reduce code clarity by making it unclear where symbols likeAPIClientandSriovOcpConfigoriginate. Consider using a named import for better readability, unless this is an established pattern in the codebase.- . "github.com/rh-ecosystem-edge/eco-gotests/tests/ocp/sriov/internal/ocpsriovinittools" + "github.com/rh-ecosystem-edge/eco-gotests/tests/ocp/sriov/internal/ocpsriovinittools"Then update usages to
ocpsriovinittools.APIClient,ocpsriovinittools.SriovOcpConfig, etc.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between b1c772b and 7e0a1c52190b13670ccdebc28da2e89907a87109.
📒 Files selected for processing (6)
tests/ocp/internal/ocpconfig/config.go(2 hunks)tests/ocp/sriov/internal/ocpsriovconfig/config.go(4 hunks)tests/ocp/sriov/internal/ocpsriovconfig/default.yaml(0 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)tests/ocp/sriov/internal/tsparams/consts.go(1 hunks)tests/ocp/sriov/internal/tsparams/sriovvars.go(1 hunks)
💤 Files with no reviewable changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
🧰 Additional context used
🧬 Code graph analysis (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
🔇 Additional comments (10)
tests/ocp/internal/ocpconfig/config.go (1)
38-55: Good defensive pattern for preserving embedded config during YAML decode.The snapshot-and-restore approach correctly handles the case where
yaml.v2zeros out embedded pointer fields when decoding empty or incomplete YAML files.tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)
28-41: Well-structured decoupling of SRIOV-specific config fields.Using separate temporary structs for YAML and environment decoding prevents side effects on embedded
OcpConfig/GeneralConfig. This is a clean separation of concerns.
57-62: Good addition of runtime.Caller validation.Checking the
okreturn value prevents potential issues from using an empty or invalid filename path.
122-133: Intentional non-empty check prevents clearing YAML values via empty env vars.The conditional overrides only apply non-empty environment values, preserving YAML defaults. This is typically the desired behavior but worth noting: setting
ECO_OCP_SRIOV_OPERATOR_NAMESPACE=""won't clear a YAML-configured value.tests/ocp/sriov/internal/tsparams/sriovvars.go (2)
24-35: Hardcoded default device configurations.These defaults provide reasonable fallbacks for common SR-IOV NICs. The
SupportsMinTxRatefield correctly identifies devices that lack this capability.
94-113: LGTM! Well-structured VF number retrieval with validation.The function correctly prioritizes env vars, validates parsed values, warns on invalid input, and provides a sensible default.
tests/ocp/sriov/internal/tsparams/consts.go (1)
14-49: Well-organized timeout and interval constants.The constants are clearly documented and provide sensible defaults for various operations. The separation between different operation types (namespace, pod, policy, cleanup) improves code clarity.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (3)
167-177: Good idempotent cleanup behavior.Returning
nilwhen the network doesn't exist ensures cleanup operations are idempotent and don't fail unnecessarily.
215-266: LGTM! Robust node fallback pattern for VF initialization.The function correctly iterates over worker nodes, attempting policy creation on each until one succeeds. Cleanup of failed policies and logging at
V(90)provides debugging capability without noise.
628-667: Good debug pod pattern for node-level command execution.The function correctly creates a privileged debug pod with host namespaces, uses
nsenterfor host access, and ensures cleanup viadefer. The pod name truncation to 63 characters ensures DNS compliance.
7e0a1c5 to
87103d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/ocp/internal/ocpconfig/config.go (1)
38-55: Pointer preservation looks good; consider treating empty YAML (io.EOF) as non-fatal.Capturing
generalConfigand restoring it whenocpConf.GeneralConfigcomes backnilis a sensible safeguard against YAML decoding wiping the embeddedGeneralConfigpointer.One remaining edge case:
readFilestill propagates anyDecodeerror, includingio.EOFfor an empty config file. If an emptydefault.yamlis meant to be valid (i.e., "just use defaults/env"),NewOcpConfigwill still fail and returnnileven though the pointer-preservation logic would otherwise be sufficient.You can keep the new behavior and make empty YAMLs safe with something like:
func readFile(ocpConfig *OcpConfig, cfgFile string) error { @@ - err = decoder.Decode(ocpConfig) - if err != nil { - return err - } + err = decoder.Decode(ocpConfig) + if err != nil && err != io.EOF { + return err + } return nil }(and add
ioto the imports). This aligns with the intent of preservingGeneralConfigwhen there is no YAML content at all.Also applies to: 77-82
♻️ Duplicate comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
94-107: Still treat empty YAML (io.EOF) as non-fatal when decoding SR-IOV defaults.The temporary
sriovYAMLConfigand copy-back pattern are good, butdecoder.Decode(&yamlConf)will still returnio.EOFfor an emptydefault.yaml, which currently bubbles up as an error and causesNewSriovOcpConfigto returnnil. Given SR-IOV-specific fields are optional and may be provided via env, an empty YAML file should likely be acceptable.You can align this with the earlier review suggestion by special-casing
io.EOF:func readFile(sriovOcpConfig *SriovOcpConfig, cfgFile string) error { @@ - err = decoder.Decode(&yamlConf) - if err != nil { - return err - } + err = decoder.Decode(&yamlConf) + if err != nil && err != io.EOF { + return err + } @@ // Copy decoded values to the actual config sriovOcpConfig.OcpSriovOperatorNamespace = yamlConf.OcpSriovOperatorNamespace sriovOcpConfig.OcpSriovTestContainer = yamlConf.OcpSriovTestContainer(and add
ioto the imports). This lets you keep YAML entirely empty while still loading defaults and env-based overrides.
🧹 Nitpick comments (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
112-134: Env-only struct cleanly isolates SR-IOV overrides; minor duplication trade-off.Decoding into
sriovEnvConfigand then conditionally copying non-empty values ontoSriovOcpConfigneatly avoids clobbering embedded config fields. The string-based checks are straightforward for these keys.Longer term, consider whether you want to centralize the env tags (they now exist on both
SriovOcpConfigandsriovEnvConfig) to avoid drift if variable names change, but that’s a minor maintenance concern and not a blocker.tests/ocp/sriov/internal/tsparams/consts.go (1)
1-71: Timeouts and test network defaults look consistent; watch out for mutatingLabels.The consolidated timeout/polling constants and the test IP/MAC/MTU defaults are well-structured and self-documenting, making SR-IOV tests easier to tune and reason about. The MAC choices as locally administered unicast addresses are appropriate for test usage.
For
Labels, exposing a mutable slice is slightly risky if any test code does:tsparams.Labels = append(tsparams.Labels, tsparams.LabelBasic)rather than working on a copy; that could introduce implicit coupling between test files and potential data races under
go test -parallel. If that’s a concern, you might prefer a helper like:func SuiteLabels() []string { return []string{LabelSuite} }and have each test file
appendto the returned slice, or at least treatLabelsas read-only by convention. Not a blocker, just something to keep in mind.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 7e0a1c52190b13670ccdebc28da2e89907a87109 and 87103d4beb0262b6bd9128f2a8b557ba9af63faa.
📒 Files selected for processing (6)
tests/ocp/internal/ocpconfig/config.go(2 hunks)tests/ocp/sriov/internal/ocpsriovconfig/config.go(4 hunks)tests/ocp/sriov/internal/ocpsriovconfig/default.yaml(0 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)tests/ocp/sriov/internal/tsparams/consts.go(1 hunks)tests/ocp/sriov/internal/tsparams/sriovvars.go(1 hunks)
💤 Files with no reviewable changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
✅ Files skipped from review due to trivial changes (1)
- tests/ocp/sriov/internal/tsparams/sriovvars.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/sriovenv/sriovenv.go
🧰 Additional context used
🧬 Code graph analysis (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
🔇 Additional comments (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
1-42: Good separation of SR-IOV-specific config from embedded OcpConfig.The package doc plus the new
sriovYAMLConfig/sriovEnvConfigtypes clearly scope YAML/env handling to SR-IOV fields only, avoiding side effects on the embeddedOcpConfig/GeneralConfig. This matches the intent of keeping general OCP config loading independent of SR-IOV defaults.
57-62:runtime.Callerguard improves robustness of config path resolution.Checking the
okreturn fromruntime.Callerand failing early if it’s false is a reasonable defensive improvement for locatingdefault.yaml; the log message is sufficient for debugging in this context.
2e589ec to
3aadda1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
201-204: Handleio.EOFfor empty YAML files.When decoding an empty YAML file,
yaml.Decoder.Decodereturnsio.EOF. Since the function sets defaults before reading config files and SR-IOV-specific fields are optional, an empty YAML file should not cause initialization to fail.Apply this diff to handle empty files gracefully:
+import "io" + err = decoder.Decode(&yamlConf) - if err != nil { + if err != nil && err != io.EOF { return err }
🧹 Nitpick comments (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
255-306: Consider simplifying warning message format.The function correctly parses the legacy
SRIOV_DEVICESformat with appropriate validation. However, line 276 uses%qwhich will add extra quotes around the already-quoted entry in the output.Apply this diff for cleaner log output:
- klog.Warningf("Skipping invalid SRIOV_DEVICES entry %q - expected format: name:deviceid:vendor:interface[:minTxRate]", - entry) + klog.Warningf("Skipping invalid SRIOV_DEVICES entry %s - expected format: name:deviceid:vendor:interface[:minTxRate]", + entry)tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
460-472: Consider more robust link state checking.The
VerifyInterfaceReadyfunction checks for "UP" in the output but doesn't verify the interface is actually in an operational state. For test environments this is likely sufficient, but a more robust check could parse the specific link state flags.If stricter verification is needed in the future, consider parsing the state field:
// Enhanced version that checks for both UP and LOWER_UP flags output, err := podObj.ExecCommand([]string{"ip", "link", "show", interfaceName}) if err != nil { return fmt.Errorf("failed to check interface %q: %w", interfaceName, err) } if !strings.Contains(output.String(), "UP") || !strings.Contains(output.String(), "LOWER_UP") { return fmt.Errorf("interface %q is not fully operational", interfaceName) }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 9041e6c9ac6070d586b5b0fa67cb605b80576c9c and 3aadda1a59b63d949ccf6eda2279a4a8026450ba.
📒 Files selected for processing (5)
tests/ocp/sriov/internal/ocpsriovconfig/config.go(4 hunks)tests/ocp/sriov/internal/ocpsriovconfig/default.yaml(1 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)tests/ocp/sriov/internal/tsparams/consts.go(2 hunks)tests/ocp/sriov/internal/tsparams/ocpsriovvars.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)
tests/ocp/sriov/internal/tsparams/consts.go (2)
LabelSuite(10-10)LabelBasic(15-15)
🔇 Additional comments (19)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)
1-17: LGTM!The reorganization cleanly consolidates timing constants into
consts.goand extends the test labels to includeLabelBasic. The changes align well with the PR objectives.tests/ocp/sriov/internal/tsparams/consts.go (3)
14-15: LGTM!The
LabelBasicconstant follows the established pattern and enables filtering for basic test cases.
21-44: LGTM!The timeout values are well-suited for SR-IOV operations. The
PolicyApplicationTimeoutof 35 minutes is appropriate given that SR-IOV policy changes can trigger MachineConfigPool updates and node reboots.
46-56: LGTM!The test pod networking constants are well-chosen. The locally administered MAC addresses and private IP range are appropriate for isolated test environments, and the 9000 MTU is standard for SR-IOV performance testing.
tests/ocp/sriov/internal/ocpsriovconfig/config.go (6)
26-88: LGTM!The
DeviceConfigtype and default device list provide comprehensive coverage of major SR-IOV NIC vendors. TheSupportsMinTxRatefield correctly identifies device capabilities, which is important for rate-limiting test scenarios.
45-74: Excellent documentation of the temporary struct pattern.The extensive comment block clearly explains why the temporary struct approach is necessary to avoid yaml.v2 zeroing embedded pointers. This pattern prevents configuration clobbering and is well-justified.
90-139: LGTM!The initialization sequence establishes proper configuration precedence: defaults → YAML file → environment variables → legacy environment variables. The enhanced error logging with context improves debuggability.
141-157: LGTM!The accessor methods provide sensible fallbacks to default values when configuration is not explicitly set, following Go best practices.
226-253: LGTM!The
readEnvfunction correctly uses the temporary struct pattern to preserve parent config fields, consistent with thereadFileimplementation.
308-328: LGTM!The
parseLegacyVFNumfunction correctly implements fallback to the legacySRIOV_VF_NUMenvironment variable only when the newECO_OCP_SRIOV_VF_NUMis not set. The validation ensures positive values and logs warnings for invalid input without failing initialization.tests/ocp/sriov/internal/sriovenv/sriovenv.go (9)
40-81: LGTM!The operator and cluster stability checks properly poll for SR-IOV node state synchronization and MachineConfigPool readiness. The error handling correctly continues polling on transient errors.
87-177: LGTM!The network creation helpers use the functional options pattern effectively, providing a clean API for SR-IOV network configuration. The defensive check in
RemoveSriovNetwork(line 173) correctly treats an already-deleted network as success.
183-203: LGTM!The
RemoveSriovPolicyfunction correctly waits for policy deletion by polling until the policy can no longer be retrieved. The defensive check for already-deleted policies prevents spurious errors.
205-279: LGTM with documented limitation.The VF initialization functions correctly create SR-IOV policies on the first node where the device is found. The function comments (lines 206-208, 214-216) explicitly document that this is single-node behavior suitable for testing scenarios. The interface discovery logic (lines 239-243) appropriately allows explicit interface specification for multi-port NICs.
362-454: LGTM!The pod creation and readiness functions correctly handle SR-IOV network attachment and wait for complete readiness (both phase and container statuses). The defensive check in
DeleteDpdkTestPodprevents errors when pods are already deleted.
509-638: LGTM!The traffic verification functions correctly handle the complete lifecycle: pod creation, interface verification, connectivity testing, and cleanup. The
VerifyLinkStateConfigurationfunction appropriately returns(false, nil)on carrier timeout (line 634), allowing tests to skip scenarios where physical connections are not established rather than failing.
644-722: LGTM!The spoof check verification correctly uses a debug pod with host network and PID namespaces to inspect VF settings on the node. The cleanup logic (lines 690-692, 707) ensures debug pods don't accumulate. The nsenter command construction (line 714) properly enters all necessary namespaces.
728-765: LGTM!The
GetPciAddressfunction correctly parses the CNI network status annotation to extract PCI address information. The logic to strip the namespace prefix (lines 754-757) ensures correct matching for namespaced network names.
771-850: LGTM!The cleanup functions implement best-effort cleanup with appropriate error handling. Errors are logged (lines 781, 789, 799, 809, 817, 828) but don't halt the cleanup process, ensuring maximum cleanup even when some operations fail. The regex pattern (line 802) correctly identifies test resources without affecting production networks.
101d22f to
f4504af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
162-170: Handleio.EOFfor empty YAML files.
yaml.Decoder.Decodereturnsio.EOFfor empty files. Since SR-IOV fields have defaults viaGetDefaultDevices()andGetVFNum(), an empty YAML file should not be a fatal error.+import "io" + func readFile(sriovOcpConfig *SriovOcpConfig, cfgFile string) error { // ... file open code ... err = decoder.Decode(&sriovOcpConfig) - if err != nil { + if err != nil && err != io.EOF { return err } return nil }tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
303-356: Document potential service disruption.The delete-and-recreate approach causes VFs to be deallocated and reallocated, resulting in brief service disruption. Add a note to help users decide when to use this function.
// UpdateSriovPolicyMTU updates the MTU of an existing SR-IOV policy. // PolicyBuilder does not have Update method, so we delete and recreate. +// Note: This causes brief service disruption as VFs are deallocated and reallocated. +// Consider scheduling this operation during maintenance windows. func UpdateSriovPolicyMTU(policyName string, mtuValue int) error {
🧹 Nitpick comments (4)
tests/ocp/sriov/internal/tsparams/consts.go (1)
50-53: Duplicate IP constants withocpsriovvars.go.
TestPodClientIPandTestPodServerIPduplicateClientIPv4IPAddressandServerIPv4IPAddressfromocpsriovvars.go. Consider consolidating to avoid divergence.Either remove the duplicates here or in
ocpsriovvars.go, and use a single source of truth:- // TestPodClientIP is the default IP address for test client pods. - TestPodClientIP = "192.168.0.1/24" - // TestPodServerIP is the default IP address for test server pods. - TestPodServerIP = "192.168.0.2/24"Then update references to use
ClientIPv4IPAddressandServerIPv4IPAddressfromocpsriovvars.go.tests/ocp/sriov/internal/sriovenv/sriovenv.go (3)
167-177: Error handling may mask actual failures.
PullNetworkcould return errors other than "not found" (e.g., network issues, permission errors). Swallowing all errors withreturn nilmay hide actual failures.func RemoveSriovNetwork(name string, timeout time.Duration) error { sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace network, err := sriov.PullNetwork(APIClient, name, sriovOpNs) if err != nil { - return nil // Already deleted + // Check if it's a "not found" error vs other errors + if strings.Contains(err.Error(), "not found") { + return nil // Already deleted + } + return fmt.Errorf("failed to pull network %q: %w", name, err) } return network.DeleteAndWait(timeout) }
183-203: Same error handling concern asRemoveSriovNetwork.Consider distinguishing "not found" from other errors for better debuggability.
683-722: Potential pod name collision with long node names.When truncating to 63 characters, different node names could produce the same pod name if they share the first ~50 characters. This is unlikely in practice but could cause issues in large clusters.
Consider appending a short hash of the full node name:
func executeOnNode(nodeName, namespace string, cmd []string) (string, error) { - debugPodName := "sriov-debug-" + strings.ReplaceAll(nodeName, ".", "-") - if len(debugPodName) > 63 { - debugPodName = debugPodName[:63] - } + sanitized := strings.ReplaceAll(nodeName, ".", "-") + if len(sanitized) > 50 { + // Use first 45 chars + hash suffix to ensure uniqueness + sanitized = sanitized[:45] + fmt.Sprintf("%x", sha256.Sum256([]byte(nodeName)))[:4] + } + debugPodName := "sriov-debug-" + sanitized
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 3aadda1a59b63d949ccf6eda2279a4a8026450ba and f4504af64e28d3de907288e0a6cb1ebf233dd35e.
📒 Files selected for processing (5)
tests/ocp/sriov/internal/ocpsriovconfig/config.go(5 hunks)tests/ocp/sriov/internal/ocpsriovconfig/default.yaml(1 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)tests/ocp/sriov/internal/tsparams/consts.go(2 hunks)tests/ocp/sriov/internal/tsparams/ocpsriovvars.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)
tests/ocp/sriov/internal/tsparams/consts.go (1)
LabelSuite(10-10)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
Name(22-22)tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
🔇 Additional comments (14)
tests/ocp/sriov/internal/tsparams/ocpsriovvars.go (1)
1-19: LGTM!The documentation improvements clarify the intended usage of suite-level vs feature-specific labels.
tests/ocp/sriov/internal/ocpsriovconfig/config.go (4)
26-57: LGTM!The
DeviceConfigstruct and default devices list provide a clean abstraction for SR-IOV device configuration with appropriate vendor-specific feature flags.
36-43: LGTM!The new
DevicesandVFNumfields with proper YAML/envconfig tags extend the configuration model cleanly.
106-122: LGTM!Accessor methods properly return defaults when configuration is not explicitly set.
181-232: LGTM!The environment variable parsing handles validation, empty entries, and optional fields gracefully with appropriate warning logs.
tests/ocp/sriov/internal/tsparams/consts.go (1)
1-73: LGTM on the remaining constants.The timeout constants provide good defaults for SR-IOV operations, and the test configuration values (MTU, VLAN, rate limits) are reasonable for testing purposes.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (8)
39-81: LGTM!The operator and cluster stability functions properly delegate responsibilities and include appropriate nil checks before accessing nested status fields.
87-137: LGTM!Clean functional options pattern that provides flexible network configuration while maintaining type safety.
205-279: LGTM!The VF initialization logic correctly iterates through worker nodes and creates a policy on the first successful node. The interface discovery fallback and cleanup behavior are appropriate.
362-454: LGTM!Pod creation and readiness verification functions properly handle creation, timeout, and container status checks.
460-503: LGTM!Interface verification functions use standard
ip linkoutput parsing which is appropriate for test utilities.
509-585: LGTM!Comprehensive VF status verification with proper cleanup handling and clear error messages.
587-638: LGTM!Returning
(false, nil)on carrier timeout allows calling tests to decide whether to skip rather than fail, which is appropriate for hardware-dependent verification.
728-850: LGTM!
GetPciAddresscorrectly parses the CNI network-status annotation, andCleanupLeftoverResourcesimplements appropriate best-effort cleanup with logging.
f4504af to
0ddb446
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
303-305: Add service disruption warning to docstring.The delete-and-recreate approach causes VFs to be deallocated and reallocated, resulting in brief service disruption for any pods using this policy. This should be documented to help callers decide when to use this function.
Apply this diff:
// UpdateSriovPolicyMTU updates the MTU of an existing SR-IOV policy. // PolicyBuilder does not have Update method, so we delete and recreate. +// Note: This causes brief service disruption as VFs are deallocated and reallocated. func UpdateSriovPolicyMTU(policyName string, mtuValue int) error {tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
199-207: Handleio.EOFfor empty YAML files.
yaml.Decoder.Decodereturnsio.EOFwhen decoding an empty file. Since SR-IOV fields have defaults or can be overridden via environment variables, an empty YAML file should not be a fatal error.+import "io" + func readFile(sriovOcpConfig *SriovOcpConfig, cfgFile string) error { // ... decoder := yaml.NewDecoder(openedCfgFile) err = decoder.Decode(&sriovOcpConfig) - if err != nil { + if err != nil && err != io.EOF { return err } return nil }
🧹 Nitpick comments (2)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
559-568: Consider explicit parameter instead of parsing description.The spoof check verification is triggered by parsing the
descriptionstring for "spoof checking". This is fragile and could lead to unexpected behavior if the description format changes. Consider adding an explicit parameter or options struct.Example alternative approach:
type VFStatusOptions struct { VerifySpoofCheck bool SpoofCheckState string // "on" or "off" } func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string, timeout time.Duration, opts *VFStatusOptions) error { // ... if opts != nil && opts.VerifySpoofCheck { if err := verifySpoofCheck(clientPod, interfaceName, opts.SpoofCheckState); err != nil { return fmt.Errorf("spoof check verification failed: %w", err) } } // ... }tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
112-149: Document default value for minTxRate in env format.The
supportsMinTxRatedefaults totruewhen omitted from the environment variable format. This should be documented in the function comment to avoid confusion.// parseSriovDevicesEnv parses device configuration from SRIOV_DEVICES environment variable. // Format: "name1:deviceid1:vendor1:interface1[:minTxRate],name2:deviceid2:vendor2:interface2[:minTxRate],..." // Example: "e810xxv:159b:8086:ens2f0,e810c:1593:8086:ens2f2". +// Note: minTxRate defaults to true if omitted. func parseSriovDevicesEnv(envDevices string) ([]DeviceConfig, error) {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between f4504af64e28d3de907288e0a6cb1ebf233dd35e and 0ddb446523c4a08113b730e29d3ca5f642cda0a0.
📒 Files selected for processing (3)
tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml(1 hunks)tests/ocp/sriov/internal/ocpsriovconfig/config.go(5 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
🔇 Additional comments (4)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
39-42: LGTM - Well-structured SR-IOV environment helpers.The overall implementation follows good patterns:
- Functional options for network configuration
- Proper cleanup with defer statements
- Best-effort cleanup with logged warnings
- Good use of eco-goinfra builders
tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml (1)
1-50: LGTM - Comprehensive default device configurations.The configuration covers major SR-IOV NIC vendors (Intel, Broadcom, Mellanox) with appropriate device IDs and default settings. The
supports_min_tx_ratefield correctly reflects hardware capabilities.tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
218-239: LGTM - Legacy VFNum handling looks correct.The logic correctly prioritizes
ECO_OCP_SRIOV_VF_NUMover the legacySRIOV_VF_NUM, and only applies the legacy value when the new env var is not explicitly set.
24-41: LGTM - Clean configuration structure.The
DeviceConfigtype and updatedSriovOcpConfigfields provide a clean API for SR-IOV device configuration with proper YAML and environment variable tags.
0ddb446 to
5ad8751
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
189-207: Fix YAML decode target: currently decoding into**SriovOcpConfig(likely no-op / wrong).
readFilereceivessriovOcpConfig *SriovOcpConfigbut decodes into&sriovOcpConfig(Line 201), i.e.,**SriovOcpConfig. This typically won’t populate the struct you expect and can leave config fields at zero values.Also, consider treating empty YAML (
io.EOF) as non-fatal (per prior review note).import ( + "errors" "fmt" + "io" "log" "os" "path/filepath" "runtime" "strconv" "strings" @@ func readFile(sriovOcpConfig *SriovOcpConfig, cfgFile string) error { @@ decoder := yaml.NewDecoder(openedCfgFile) - err = decoder.Decode(&sriovOcpConfig) - if err != nil { + err = decoder.Decode(sriovOcpConfig) + if err != nil && !errors.Is(err, io.EOF) { return err } return nil }
♻️ Duplicate comments (2)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
303-356: Document disruption + wait for readiness after MTU “update” (delete/recreate).This delete+recreate path can briefly deallocate/reallocate VFs (service disruption) and should be called out in the docstring (requested previously). Also, after recreating, it should likely
WaitForSriovPolicyReady(...)(or equivalent) so callers don’t proceed while the cluster is still reconciling.tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
218-239: LegacySRIOV_VF_NUMcurrently overrides YAMLvf_num(likely unintended).
parseLegacyVFNumonly gates onECO_OCP_SRIOV_VF_NUM(Line 221-224), so if YAML setsvf_numand legacySRIOV_VF_NUMis present, legacy will overwrite YAML. If the intent is “legacy only when VFNum is still unset”, add a VFNum guard:func parseLegacyVFNum(sriovOcpConfig *SriovOcpConfig) { // Skip if the new env var was explicitly set if os.Getenv("ECO_OCP_SRIOV_VF_NUM") != "" { return } + + // Skip if YAML already provided a value + if sriovOcpConfig.VFNum > 0 { + return + } @@ sriovOcpConfig.VFNum = vfNum }
🧹 Nitpick comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
151-159: Prefer a named default VF constant (avoid magic “2”).
GetVFNumhardcodes2(Line 155). Considerconst DefaultVFNum = 2shared across config/parsers, so future changes don’t miss call sites.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 0ddb446523c4a08113b730e29d3ca5f642cda0a0 and 5ad8751098b250e4498913893f9fa67ac278790f.
📒 Files selected for processing (4)
.cursorrules(1 hunks)tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml(1 hunks)tests/ocp/sriov/internal/ocpsriovconfig/config.go(5 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .cursorrules
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
Name(22-22)tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (1)
IsSriovDeployed(32-54)tests/ocp/sriov/internal/tsparams/consts.go (13)
MCPStableInterval(38-38)NADTimeout(40-40)PollingInterval(28-28)NamespaceTimeout(30-30)PolicyApplicationTimeout(42-42)PodReadyTimeout(32-32)TestPodClientIP(51-51)TestPodClientMAC(47-47)CleanupTimeout(34-34)TestPodServerIP(53-53)TestPodServerMAC(49-49)CarrierWaitTimeout(36-36)DebugPodCleanupTimeout(44-44)
| // parseSriovDevicesEnv parses device configuration from SRIOV_DEVICES environment variable. | ||
| // Format: "name1:deviceid1:vendor1:interface1[:minTxRate],name2:deviceid2:vendor2:interface2[:minTxRate],..." | ||
| // Example: "e810xxv:159b:8086:ens2f0,e810c:1593:8086:ens2f2". | ||
| func parseSriovDevicesEnv(envDevices string) ([]DeviceConfig, error) { | ||
| var devices []DeviceConfig | ||
|
|
||
| entries := strings.Split(envDevices, ",") | ||
| for _, entry := range entries { | ||
| entry = strings.TrimSpace(entry) | ||
| if entry == "" { | ||
| continue | ||
| } | ||
|
|
||
| parts := strings.Split(entry, ":") | ||
| if len(parts) != 4 && len(parts) != 5 { | ||
| return nil, fmt.Errorf( | ||
| "invalid SRIOV_DEVICES entry %q - expected format: name:deviceid:vendor:interface[:minTxRate]", | ||
| entry) | ||
| } | ||
|
|
||
| supportsMinTxRate := true // default | ||
|
|
||
| if len(parts) == 5 { | ||
| val := strings.ToLower(strings.TrimSpace(parts[4])) | ||
| supportsMinTxRate = val == "true" || val == "1" || val == "yes" | ||
| } | ||
|
|
||
| devices = append(devices, DeviceConfig{ | ||
| Name: strings.TrimSpace(parts[0]), | ||
| DeviceID: strings.TrimSpace(parts[1]), | ||
| Vendor: strings.TrimSpace(parts[2]), | ||
| InterfaceName: strings.TrimSpace(parts[3]), | ||
| SupportsMinTxRate: supportsMinTxRate, | ||
| }) | ||
| } | ||
|
|
||
| return devices, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "config.go" -type f -path "*/ocpsriovconfig/*"Repository: rh-ecosystem-edge/eco-gotests
Length of output: 124
🏁 Script executed:
find . -name "*.go" -path "*sriov*" | grep -i configRepository: rh-ecosystem-edge/eco-gotests
Length of output: 187
🏁 Script executed:
rg "SupportsMinTxRate" -B 5 -A 5Repository: rh-ecosystem-edge/eco-gotests
Length of output: 1858
🏁 Script executed:
rg "parseDevicesFromYAML|parseDevices|YAML" tests/Repository: rh-ecosystem-edge/eco-gotests
Length of output: 3991
🏁 Script executed:
git ls-files tests/ocp/sriov/internal/ocpsriovconfig/Repository: rh-ecosystem-edge/eco-gotests
Length of output: 176
🏁 Script executed:
rg "DeviceConfig" -B 3 -A 10 --type go | head -100Repository: rh-ecosystem-edge/eco-gotests
Length of output: 8335
🏁 Script executed:
rg "yaml|YAML" tests/ocp/sriov/ -A 3 -B 3 --type go | head -150Repository: rh-ecosystem-edge/eco-gotests
Length of output: 6156
🏁 Script executed:
# Search for Unmarshal or YAML parsing
rg "Unmarshal|yaml" tests/ocp/sriov/ --type go -B 2 -A 5Repository: rh-ecosystem-edge/eco-gotests
Length of output: 4197
🏁 Script executed:
# Search for DeviceConfig struct in the entire repository
rg "type DeviceConfig" --type go -A 15Repository: rh-ecosystem-edge/eco-gotests
Length of output: 1709
🏁 Script executed:
# Also search for files that might parse SRIOV config
rg "SRIOV_DEVICES" --type go -B 2 -A 2Repository: rh-ecosystem-edge/eco-gotests
Length of output: 2351
🏁 Script executed:
# Search for any Go files that contain both DeviceConfig and SupportsMinTxRate
rg "DeviceConfig|SupportsMinTxRate" --type go -l | head -10Repository: rh-ecosystem-edge/eco-gotests
Length of output: 533
🏁 Script executed:
# Search across all files for SupportsMinTxRate
rg "SupportsMinTxRate" --type goRepository: rh-ecosystem-edge/eco-gotests
Length of output: 269
🏁 Script executed:
# Also check if there's any reference to parsing devices from YAML
find . -name "*.go" -type f -exec grep -l "Devices.*yaml\|YAML.*[Dd]evices" {} \;Repository: rh-ecosystem-edge/eco-gotests
Length of output: 124
🏁 Script executed:
# Check if there are any YAML-related parsing patterns
git ls-files "*.go" | xargs grep -l "Devices.*yaml\|json:" 2>/dev/null | head -20Repository: rh-ecosystem-edge/eco-gotests
Length of output: 1171
🏁 Script executed:
# Count files and show what's in tests
git ls-files tests/ | head -30Repository: rh-ecosystem-edge/eco-gotests
Length of output: 1578
🏁 Script executed:
# Search for the exact comment text
rg "parseSriovDevicesEnv parses device" --type goRepository: rh-ecosystem-edge/eco-gotests
Length of output: 215
🏁 Script executed:
# Try to find any file with the function parseSriovDevicesEnv
git log --all --name-only --pretty="" | grep -i sriov | sort -u | head -20Repository: rh-ecosystem-edge/eco-gotests
Length of output: 1145
🏁 Script executed:
# Search for minTxRate in the codebase
rg "minTxRate|MinTxRate" --type go -B 3 -A 3Repository: rh-ecosystem-edge/eco-gotests
Length of output: 4474
🏁 Script executed:
# Check if there are any recent commits with sriov
git log --oneline --all | grep -i sriov | head -10Repository: rh-ecosystem-edge/eco-gotests
Length of output: 120
🏁 Script executed:
# Search for struct with Name, DeviceID, Vendor fields
rg "Name.*string|DeviceID|Vendor" --type go | grep -i "type\|struct" | head -20Repository: rh-ecosystem-edge/eco-gotests
Length of output: 2753
🏁 Script executed:
cat -n tests/ocp/sriov/internal/ocpsriovconfig/config.go | sed -n '1,160p'Repository: rh-ecosystem-edge/eco-gotests
Length of output: 5860
🏁 Script executed:
# Also check if there's any post-processing of YAML that might set a default
cat -n tests/ocp/sriov/internal/ocpsriovconfig/config.go | sed -n '160,220p'Repository: rh-ecosystem-edge/eco-gotests
Length of output: 2135
SupportsMinTxRate defaults differ between env and YAML parsing.
parseSriovDevicesEnv defaults SupportsMinTxRate to true (line 132), but YAML omission yields Go's zero value false. Align the defaults: either change env default to false, or use *bool to distinguish "unset" from "false" and apply a single consistent default in one location.
🤖 Prompt for AI Agents
In tests/ocp/sriov/internal/ocpsriovconfig/config.go around lines 112–149,
parseSriovDevicesEnv currently defaults SupportsMinTxRate to true causing
inconsistent behavior with YAML parsing (which yields false when omitted);
change the env parser to default supportsMinTxRate to false (or alternatively
parse into a *bool and keep a single canonical default applied later) and ensure
the parsed boolean logic only sets SupportsMinTxRate when an explicit 5th field
is present so both env and YAML produce the same default behavior.
5ad8751 to
88e6a88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (9)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (2)
199-207: Handleio.EOFfor empty YAML files.
yaml.Decoder.Decodereturnsio.EOFfor empty files. Since SR-IOV-specific fields have defaults or can be set via environment variables, an empty YAML file should not be a fatal error.+import "io" + err = decoder.Decode(&sriovOcpConfig) - if err != nil { + if err != nil && err != io.EOF { return err }
132-137:SupportsMinTxRatedefaults differ between env and YAML parsing.When parsing from environment variable,
SupportsMinTxRatedefaults totrue(line 132), but when parsing from YAML with the field omitted, Go's zero value yieldsfalse. This inconsistency may cause unexpected behavior depending on how devices are configured.Consider aligning the defaults by either:
- Defaulting to
falsein env parsing to match YAML behavior- Using
*boolto distinguish "unset" from "false" and apply a consistent defaulttests/ocp/sriov/internal/sriovenv/sriovenv.go (7)
53-81: Silently swallowing poll errors can mask persistent failures.List errors return
(false, nil)which causes the poll to continue retrying. If the failure is persistent (RBAC, wrong namespace, API down), callers wait until timeout with no diagnostic signal.At minimum, log errors to aid debugging:
func(ctx context.Context) (bool, error) { // Check SR-IOV node states synced nodeStates, err := sriov.ListNetworkNodeState(APIClient, sriovOpNs, client.ListOptions{}) - if err != nil || len(nodeStates) == 0 { + if err != nil { + klog.V(90).Infof("ListNetworkNodeState failed: %v", err) + return false, nil + } + if len(nodeStates) == 0 { return false, nil } // ... // Check MCP stability mcpList, err := mco.ListMCP(APIClient) if err != nil { + klog.V(90).Infof("ListMCP failed: %v", err) return false, nil }
167-203: Useapierrors.IsNotFound()to distinguish resource-not-found from other errors.
RemoveSriovNetworkandRemoveSriovPolicyreturnnilon any pull error, which masks real problems like RBAC failures or transport errors.Differentiate NotFound errors using
k8s.io/apimachinery/pkg/api/errors.IsNotFound():+import apierrors "k8s.io/apimachinery/pkg/api/errors" + func RemoveSriovNetwork(name string, timeout time.Duration) error { sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace network, err := sriov.PullNetwork(APIClient, name, sriovOpNs) if err != nil { - return nil // Already deleted + if apierrors.IsNotFound(err) { + return nil // Already deleted + } + return fmt.Errorf("failed to pull network %q: %w", name, err) } return network.DeleteAndWait(timeout) } func RemoveSriovPolicy(name string, timeout time.Duration) error { sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace policy, err := sriov.PullPolicy(APIClient, name, sriovOpNs) if err != nil { - return nil // Already deleted + if apierrors.IsNotFound(err) { + return nil // Already deleted + } + return fmt.Errorf("failed to pull policy %q: %w", name, err) }
224-279: ValidatevfNumand interface before buildingpfSelector.
pfSelector := fmt.Sprintf("%s#0-%d", actualInterface, vfNum-1)(line 245) produces invalid output forvfNum <= 0(e.g.,"ens1f0#0--1"), andactualInterfacecan remain empty if discovery fails.func initVFWithDevType(name, deviceID, interfaceName, vendor, devType string, vfNum int, workerNodes []*nodes.Builder) (bool, error) { + if vfNum <= 0 { + return false, fmt.Errorf("vfNum must be > 0, got %d", vfNum) + } + sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace // Cleanup existing policy _ = RemoveSriovPolicy(name, tsparams.NamespaceTimeout) for _, node := range workerNodes { nodeName := node.Definition.Name // Try to discover the actual interface name actualInterface := interfaceName // Only discover interface if not explicitly provided if interfaceName == "" && vendor != "" && deviceID != "" { if discovered, err := discoverInterfaceName(nodeName, vendor, deviceID); err == nil { actualInterface = discovered } } + + if actualInterface == "" { + klog.V(90).Infof("No interface resolved on node %q (vendor=%q deviceID=%q)", nodeName, vendor, deviceID) + continue + } pfSelector := fmt.Sprintf("%s#0-%d", actualInterface, vfNum-1)
303-306: Document potential service disruption.The delete-and-recreate approach causes brief service disruption as VFs are deallocated and reallocated. Add a note to help users understand when to use this function.
// UpdateSriovPolicyMTU updates the MTU of an existing SR-IOV policy. // PolicyBuilder does not have Update method, so we delete and recreate. +// Note: This causes brief service disruption as VFs are deallocated and reallocated. func UpdateSriovPolicyMTU(policyName string, mtuValue int) error {
404-414: Useapierrors.IsNotFound()for pod deletion.Same issue as network/policy removal - returning
nilon any pull error masks real problems.func DeleteDpdkTestPod(name, namespace string, timeout time.Duration) error { podBuilder, err := pod.Pull(APIClient, name, namespace) if err != nil { - return nil // Already deleted + if apierrors.IsNotFound(err) { + return nil // Already deleted + } + return fmt.Errorf("failed to pull pod %q: %w", name, err) } _, err = podBuilder.DeleteAndWait(timeout) return err }
509-537: Hard-coded pod names can collide across retries or parallel runs.Pods named
"client"and"server"may conflict if tests share a namespace or rerun quickly. Consider suffixing with a timestamp/random string or allowing caller-provided names.+import "github.com/google/uuid" + func CheckVFStatusWithPassTraffic(networkName, interfaceName, namespace, description string, timeout time.Duration) error { + suffix := uuid.New().String()[:8] + clientName := fmt.Sprintf("client-%s", suffix) + serverName := fmt.Sprintf("server-%s", suffix) + // Create client and server pods - clientPod, err := CreateTestPod("client", namespace, networkName, + clientPod, err := CreateTestPod(clientName, namespace, networkName, tsparams.TestPodClientIP, tsparams.TestPodClientMAC)
771-842: Cleanup logic may be destructive in shared clusters.Deleting any namespace prefixed with
"e2e-"and policies matching device names could affect non-test resources in shared clusters.Consider scoping cleanup by a label/annotation that your tests explicitly set, and use label selectors instead of name heuristics:
+const TestResourceLabel = "eco-gotests.sriov/test-resource" + // In test setup, add label to created resources: // namespace.WithLabel(TestResourceLabel, "true") // policy.WithLabel(TestResourceLabel, "true") // In cleanup: namespaces, err := namespace.List(APIClient, metav1.ListOptions{ + LabelSelector: TestResourceLabel + "=true", })
🧹 Nitpick comments (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
151-159: Consider extracting the fallback value as a constant.The fallback value
2is hardcoded. Consider using a named constant for clarity and maintainability.+const ( + // DefaultVFNum is the default number of virtual functions. + DefaultVFNum = 2 +) + // GetVFNum returns the configured number of virtual functions. -// Returns the value from YAML/env config, or 2 as fallback if not set. +// Returns the value from YAML/env config, or DefaultVFNum as fallback if not set. func (sriovOcpConfig *SriovOcpConfig) GetVFNum() int { if sriovOcpConfig.VFNum <= 0 { - return 2 // fallback if YAML wasn't loaded properly + return DefaultVFNum // fallback if YAML wasn't loaded properly } return sriovOcpConfig.VFNum }tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
21-21: Consider avoiding dot import for better readability.Dot imports (
. "github.com/...") can make it harder to understand where symbols likeAPIClientandSriovOcpConfigcome from. Consider using a named import instead.- . "github.com/rh-ecosystem-edge/eco-gotests/tests/ocp/sriov/internal/ocpsriovinittools" + "github.com/rh-ecosystem-edge/eco-gotests/tests/ocp/sriov/internal/ocpsriovinittools"Then reference as
ocpsriovinittools.APIClient, etc.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 5ad8751098b250e4498913893f9fa67ac278790f and 88e6a88.
📒 Files selected for processing (5)
.cursorrules(1 hunks).gitignore(1 hunks)tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml(1 hunks)tests/ocp/sriov/internal/ocpsriovconfig/config.go(5 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/ocp/sriov/internal/ocpsriovconfig/.default.yaml
- .cursorrules
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (3)
tests/cnf/ran/ptp/internal/iface/iface.go (1)
Name(22-22)tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)tests/ocp/sriov/internal/ocpsriovinittools/ocpsriovinittools.go (1)
SriovOcpConfig(13-13)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (1)
IsSriovDeployed(32-54)tests/ocp/sriov/internal/tsparams/consts.go (13)
MCPStableInterval(38-38)NADTimeout(40-40)PollingInterval(28-28)NamespaceTimeout(30-30)PolicyApplicationTimeout(42-42)PodReadyTimeout(32-32)TestPodClientIP(51-51)TestPodClientMAC(47-47)CleanupTimeout(34-34)TestPodServerIP(53-53)TestPodServerMAC(49-49)CarrierWaitTimeout(36-36)DebugPodCleanupTimeout(44-44)
🔇 Additional comments (8)
.gitignore (1)
24-24: LGTM!Ignoring
.cursorrulesis appropriate to prevent IDE-specific configuration from being committed to the repository.tests/ocp/sriov/internal/ocpsriovconfig/config.go (4)
24-41: LGTM!The
DeviceConfigstruct andSriovOcpConfigextensions are well-structured with appropriate YAML and envconfig tags.
57-62: LGTM!Good defensive check for
runtime.Callerfailure. This prevents potential nil dereference issues.
87-110: LGTM!The fallback logic from environment variable to YAML configuration is well-implemented with appropriate error handling.
218-239: LGTM!The legacy environment variable handling is correctly implemented with proper validation and warning logging for invalid values.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (3)
87-155: LGTM!The functional options pattern for
NetworkOptionis well-implemented and provides a clean, extensible API for network configuration.
683-722: LGTM!Good handling of Kubernetes name length limits and proper cleanup of existing debug pods before creation. The
nsenterapproach for host namespace access is appropriate.
728-765: LGTM!The PCI address extraction from network status annotation is correctly implemented with proper handling of namespaced network names.
88e6a88 to
f0dc4d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (8)
273-276: Document potential service disruption in docstring.The delete-and-recreate approach causes VFs to be deallocated and reallocated, briefly disrupting traffic on affected pods. Add this to the function's documentation to help users understand when to use this function.
🔎 Proposed fix
// UpdateSriovPolicyMTU updates the MTU of an existing SR-IOV policy. // PolicyBuilder does not have Update method, so we delete and recreate. +// Note: This causes brief service disruption as VFs are deallocated and reallocated. +// Schedule this operation during maintenance windows or when temporary disruption is acceptable. func UpdateSriovPolicyMTU(policyName string, mtuValue int) error {
485-498: Static pod names"client"and"server"can collide across parallel runs or retries.Consider generating unique names (e.g., with timestamp or random suffix) or accepting caller-provided names to avoid
AlreadyExistserrors.
559-563: Static pod name"linkstate-test"has the same collision risk.Consider using a unique suffix as recommended for client/server pods.
713-750: Consider scoping cleanup by labels for safer operation in shared clusters.While using the "e2e-" prefix is reasonable, and delegating to
sriovoperator.RemoveAllSriovNetworks/Policiesis appropriate, in shared clusters this could affect resources from other concurrent test runs. Consider adding a unique test-run label to resources and filtering by that label during cleanup for better isolation.
137-147: Useapierrors.IsNotFound()to distinguish resource-not-found from other errors.Returning
nilon any Pull error masks real problems like RBAC failures or transport errors. Only returnnilfor NotFound errors; propagate others.🔎 Proposed fix
+import ( + apierrors "k8s.io/apimachinery/pkg/api/errors" +) func RemoveSriovNetwork(name string, timeout time.Duration) error { sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace network, err := sriov.PullNetwork(APIClient, name, sriovOpNs) if err != nil { - return nil // Already deleted + if apierrors.IsNotFound(err) { + return nil // Already deleted + } + return fmt.Errorf("failed to pull network %q: %w", name, err) } return network.DeleteAndWait(timeout) }
153-173: Same error handling issue: useapierrors.IsNotFound().Apply the same fix as suggested for
RemoveSriovNetworkto properly distinguish NotFound from other errors.
194-249: ValidatevfNumandactualInterfacebefore buildingpfSelector.Line 215 creates an invalid range if
vfNum <= 0(e.g.,"interface#0--1") or invalid selector ifactualInterfaceis empty (e.g.,"#0-2"). Add upfront validation to fail fast with clear error messages.🔎 Proposed fix
func initVFWithDevType(name, deviceID, interfaceName, vendor, devType string, vfNum int, workerNodes []*nodes.Builder) (bool, error) { + if vfNum <= 0 { + return false, fmt.Errorf("vfNum must be > 0, got %d", vfNum) + } + sriovOpNs := SriovOcpConfig.OcpSriovOperatorNamespace // Cleanup existing policy _ = RemoveSriovPolicy(name, tsparams.NamespaceTimeout) for _, node := range workerNodes { nodeName := node.Definition.Name // Try to discover the actual interface name actualInterface := interfaceName // Only discover interface if not explicitly provided if interfaceName == "" && vendor != "" && deviceID != "" { if discovered, err := discoverInterfaceName(nodeName, vendor, deviceID); err == nil { actualInterface = discovered } } + if actualInterface == "" { + klog.V(90).Infof("No interface resolved on node %q (vendor=%q deviceID=%q), skipping", nodeName, vendor, deviceID) + continue + } + pfSelector := fmt.Sprintf("%s#0-%d", actualInterface, vfNum-1)
374-384: Same error handling issue: useapierrors.IsNotFound().Apply the same fix to distinguish NotFound from other errors.
🔎 Proposed fix
func DeleteDpdkTestPod(name, namespace string, timeout time.Duration) error { podBuilder, err := pod.Pull(APIClient, name, namespace) if err != nil { - return nil // Already deleted + if apierrors.IsNotFound(err) { + return nil // Already deleted + } + return fmt.Errorf("failed to pull pod %q: %w", name, err) } _, err = podBuilder.DeleteAndWait(timeout) return err }
🧹 Nitpick comments (4)
tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (1)
50-50: Add trailing newline.YAML files should end with a newline character for better compatibility with text processing tools and version control systems.
🔎 Suggested fix
- name: cx7 device_id: "1021" vendor: "15b3" interface_name: "ens8f0np0" supports_min_tx_rate: true +tests/ocp/sriov/internal/sriovenv/sriovenv.go (3)
19-19: Consider avoiding dot imports for clarity.Dot imports can make code harder to read as it's not immediately clear where
APIClientandSriovOcpConfigcome from. However, if this is an established pattern in the codebase for test utilities, it's acceptable.
352-372: Consider accepting IP/MAC as parameters for flexibility.Unlike
CreateTestPod, this function hardcodes IP and MAC fromtsparams. This limits reuse and could cause conflicts if multiple DPDK pods need to coexist. Consider accepting these as parameters likeCreateTestPoddoes.🔎 Proposed signature
-func CreateDpdkTestPod(name, namespace, networkName string) (*pod.Builder, error) { - secNetwork := pod.StaticIPAnnotationWithMacAddress(networkName, - []string{tsparams.TestPodClientIP}, tsparams.TestPodClientMAC) +func CreateDpdkTestPod(name, namespace, networkName, ip, mac string) (*pod.Builder, error) { + secNetwork := pod.StaticIPAnnotationWithMacAddress(networkName, []string{ip}, mac)
127-135: Consider logging Pull errors for debugging persistent failures.While returning
(false, nil)is correct for polling, logging the error at a verbose level would help diagnose why NAD creation is taking longer than expected.🔎 Proposed fix
return wait.PollUntilContextTimeout(context.Background(), tsparams.PollingInterval, timeout, true, func(ctx context.Context) (bool, error) { _, err := nad.Pull(APIClient, name, namespace) - - return err == nil, nil + if err != nil { + klog.V(90).Infof("Waiting for NAD %s/%s: %v", namespace, name, err) + return false, nil + } + return true, nil })
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/ocp/sriov/internal/ocpsriovconfig/config.go(3 hunks)tests/ocp/sriov/internal/ocpsriovconfig/default.yaml(1 hunks)tests/ocp/sriov/internal/sriovenv/sriovenv.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/config.go
🔇 Additional comments (4)
tests/ocp/sriov/internal/ocpsriovconfig/default.yaml (3)
6-8: LGTM!The default VF count of 2 is reasonable for basic SR-IOV testing scenarios. The configuration aligns with the VFNum field mentioned in the related config.go changes.
36-45: Correct the device_id for cx6 from "101d" to "1018".Device ID 101d maps to the MT2892 Family [ConnectX-6 Dx], not ConnectX-6. ConnectX-6 has a valid PCI Device ID of 1018. The cx6 entry (lines 36-40) has an incorrect device_id that should be changed to "1018" to match the official hardware specification.
Likely an incorrect or invalid review comment.
11-50: Review device ID configurations for potential errors.The device IDs for Intel adapters have been verified against official specifications and are correct:
- Intel e810xxv (159b) and e810c (1593) are correctly specified
- Intel x710 (1572) is correct
However, there is a critical issue with the Mellanox devices: Both cx6 and cx6dx are assigned device_id "101d", which corresponds to MT2892 Family [ConnectX-6 Dx]. This is incorrect for cx6 since the same device ID cannot apply to two different adapter types. The cx6 entry likely has the wrong device_id and should be verified.
Verify that cx7 (1021) corresponds to MT2910 Family [ConnectX-7] — this is correct.
The Broadcom device IDs (bcm57414: 16d7, bcm57508: 1750) could not be verified against official public sources.
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
670-707: LGTM!The implementation correctly uses
networkStatus.Interface(e.g., "net1", "net2") which is unique per pod, addressing the concern about pods with multiple SR-IOV interfaces using the same network.
- Add DeviceConfig struct for SR-IOV device configuration - Add GetSriovDevices() method with ECO_OCP_SRIOV_DEVICES env var support - Add GetVFNum() method with error handling - Move all defaults to default.yaml - Fix suite-level labels to exclude feature-specific labels - Add constants for timeouts, labels, and test parameters
fca5cf1 to
11f99bf
Compare
11f99bf to
7991bc5
Compare
- Add SR-IOV environment helper functions using eco-goinfra - Use existing sriovoperator functions for stability checks and cleanup - Add network creation with NetworkOption pattern - Add policy creation and VF initialization helpers - Add pod verification and traffic testing utilities
7991bc5 to
16f530f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
21-22:SupportsMinTxRatedefault inconsistency remains unresolved.The past review comment about
SupportsMinTxRatedefaulting differently between env and YAML parsing is still valid. Line 22 definesdefaultSupportsMinTxRate = true, and line 132 applies this default when parsing from the environment variable. However, YAML parsing yieldsfalse(Go's zero value) when the field is omitted.This inconsistency means the same device configuration behaves differently depending on whether it's defined in YAML or via the
ECO_OCP_SRIOV_DEVICESenvironment variable. While lines 113-114 document this difference, the documentation doesn't resolve the underlying problem.Consider one of these solutions:
- Change the env default to
falseto match YAML behavior- Use
*boolinDeviceConfigto distinguish "unset" from explicitfalse, then apply a single consistent default inGetSriovDevices()Also applies to: 110-149
🧹 Nitpick comments (1)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
397-435: Consider logging errors in poll callback to aid debugging.
WaitForPodWithLabelReadyreturns(false, nil)whenpod.Listfails (line 402), which silently swallows errors. If the failure is persistent (e.g., RBAC issues, wrong namespace, API unavailability), callers wait until timeout with no diagnostic signal.Consider logging the error to help with debugging:
func(ctx context.Context) (bool, error) { podList, err := pod.List(APIClient, namespace, metav1.ListOptions{LabelSelector: labelSelector}) - if err != nil || len(podList) == 0 { + if err != nil { + klog.V(90).Infof("pod.List failed for label %q: %v", labelSelector, err) return false, nil } + if len(podList) == 0 { + return false, nil + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/ocp/sriov/internal/ocpsriovconfig/config.gotests/ocp/sriov/internal/ocpsriovconfig/default.yamltests/ocp/sriov/internal/sriovenv/sriovenv.gotests/ocp/sriov/internal/tsparams/consts.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ocp/sriov/internal/ocpsriovconfig/default.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/ocp/sriov/internal/ocpsriovconfig/config.go (1)
tests/ocp/internal/ocpconfig/config.go (1)
OcpConfig(20-22)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (2)
tests/internal/sriovoperator/sriovenv.go (4)
IsSriovDeployed(32-54)WaitForSriovAndMCPStable(80-100)RemoveAllSriovNetworks(156-176)RemoveAllPoliciesAndWaitForSriovAndMCPStable(180-195)tests/internal/cluster/cluster.go (1)
ExecCmdWithStdout(98-180)
🔇 Additional comments (2)
tests/ocp/sriov/internal/tsparams/consts.go (1)
1-78: LGTM! Well-organized test constants.The new constants are clearly documented and provide comprehensive test parameters for SR-IOV testing scenarios. The values are reasonable (timeouts, network configurations, VLAN/QoS settings, rate limits, and resource labeling).
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
1-787: Comprehensive SR-IOV test environment helpers.This file provides a well-structured and extensive set of SR-IOV test utilities that follow eco-goinfra patterns and return errors appropriately (as per PR objectives). The code properly handles:
- Network and policy lifecycle with builder options
- VF initialization with device discovery
- Pod creation with unique naming to avoid collisions
- Robust cleanup using label selectors
- Proper NotFound error handling throughout
The implementation addresses all critical issues from past reviews and demonstrates good engineering practices.
Add comprehensive SR-IOV environment helper functions:
All functions use eco-goinfra builders and return errors
instead of using Gomega assertions (as per framework rules).
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.