Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/performance/utils/evaluate.py (1)
615-629:⚠️ Potential issue | 🟠 MajorFail fast when golden GPU utilization is missing instead of storing
None.At line 619, missing
"GPU utilization"is stored asNone. This value flows to line 656-657 wherenp.array([golden_gpu_util.get(s, float("nan")) for s in steps])doesn't help—the.get()default is only used if the key doesn't exist, but the key already exists with aNonevalue. This creates a numpy array with object dtype containing mixedNoneand float values, which destabilizesnp.nanmeanbehavior at line 42 of thevalidate_performancefunction.Raise an explicit error instead, tracking which steps are missing GPU utilization data:
Proposed fix
- golden_gpu_util = {} + golden_gpu_util: dict[str, float] = {} + missing_golden_gpu_util_steps: list[str] = [] @@ - golden_gpu_util[key] = value.get("GPU utilization") + gpu_util = value.get("GPU utilization") + if gpu_util is None: + missing_golden_gpu_util_steps.append(key) + continue + golden_gpu_util[key] = float(gpu_util) + + if missing_golden_gpu_util_steps: + raise ValueError( + "Missing 'GPU utilization' in golden values for steps: " + + ", ".join(missing_golden_gpu_util_steps[:10]) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/utils/evaluate.py` around lines 615 - 629, The loop that populates golden_gpu_util stores None when "GPU utilization" is missing, which later produces an object-dtype array and breaks np.nanmean in validate_performance; modify the loop that iterates expected_golden_values (referencing alloc_metric, max_alloc_metric, steps, golden_gpu_util) to check value.get("GPU utilization") and immediately raise a descriptive ValueError (including the affected step keys) if the metric is missing or is None instead of inserting None, so downstream code can assume numeric GPU utilization values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/performance/utils/evaluate.py`:
- Around line 292-298: The function signature using parameters
current_gpu_util_values, golden_gpu_util_values, steps, logger, wandb_run, and
config should be updated to modern Python 3.10+ typing: replace List[str] with
built-in list[str], Dict[str, Any] with dict[str, Any], and
Optional["wandb.Run"] with "wandb.Run | None" (or wandb.Run | None if not using
string forward refs) and keep np.ndarray as the array type; adjust any
imports/forward references if needed. Locate the function where these parameters
are declared and change the type hints accordingly (steps → list[str], wandb_run
→ wandb.Run | None, config → dict[str, Any]); ensure the rest of the code that
references these types still type-checks.
- Around line 327-337: The slicing for GPU-util windows (variables skip,
current_stable, golden_stable) can produce empty arrays causing np.nanmean to
return NaN and break downstream comparisons; update the logic in evaluate.py
around the computation of current_avg, golden_avg and signed_diff to validate
that current_stable and golden_stable are non-empty and contain finite values
before calling np.nanmean (e.g., check len(...)>0 and np.isfinite on the
arrays), and if validation fails, set a clear failure/invalid flag or raise an
explicit error so the test doesn't silently pass on NaN results; ensure any code
that reads signed_diff handles this explicit error/flag path.
---
Outside diff comments:
In `@scripts/performance/utils/evaluate.py`:
- Around line 615-629: The loop that populates golden_gpu_util stores None when
"GPU utilization" is missing, which later produces an object-dtype array and
breaks np.nanmean in validate_performance; modify the loop that iterates
expected_golden_values (referencing alloc_metric, max_alloc_metric, steps,
golden_gpu_util) to check value.get("GPU utilization") and immediately raise a
descriptive ValueError (including the affected step keys) if the metric is
missing or is None instead of inserting None, so downstream code can assume
numeric GPU utilization values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7c587ee0-a737-471a-bb65-d90b65f46ef4
📒 Files selected for processing (1)
scripts/performance/utils/evaluate.py
| current_gpu_util_values: "np.ndarray", | ||
| golden_gpu_util_values: "np.ndarray", | ||
| steps: List[str], | ||
| logger: logging.Logger, | ||
| wandb_run: Optional["wandb.Run"] = None, | ||
| config: Dict[str, Any] = None, | ||
| ) -> Dict[str, Any]: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the specific lines in the file
head -300 scripts/performance/utils/evaluate.py | tail -15Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 457
🏁 Script executed:
# Get the full function signature and check imports
sed -n '285,305p' scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 625
🏁 Script executed:
# Check what typing imports are used in the file
head -50 scripts/performance/utils/evaluate.py | grep -E "^from typing|^import typing"Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 115
🏁 Script executed:
# Check Python version requirement (setup.py, pyproject.toml, etc.)
find . -maxdepth 2 -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" \) -exec grep -l "python_requires\|python_version" {} \;Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
Update function signature to use Python 3.10+ type-hint syntax.
Lines 296–298 use outdated typing constructs (Optional[...], Dict[...], List[...]) that conflict with coding guidelines requiring PEP 604 unions and built-in generics.
Proposed fix
def validate_performance(
current_gpu_util_values: "np.ndarray",
golden_gpu_util_values: "np.ndarray",
- steps: List[str],
+ steps: list[str],
logger: logging.Logger,
- wandb_run: Optional["wandb.Run"] = None,
- config: Dict[str, Any] = None,
-) -> Dict[str, Any]:
+ wandb_run: "wandb.Run | None" = None,
+ config: dict[str, Any] | None = None,
+) -> dict[str, Any]:Per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'" and "Use built-in generics (list, dict, tuple) instead of typing equivalents."
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 297-297: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/performance/utils/evaluate.py` around lines 292 - 298, The function
signature using parameters current_gpu_util_values, golden_gpu_util_values,
steps, logger, wandb_run, and config should be updated to modern Python 3.10+
typing: replace List[str] with built-in list[str], Dict[str, Any] with dict[str,
Any], and Optional["wandb.Run"] with "wandb.Run | None" (or wandb.Run | None if
not using string forward refs) and keep np.ndarray as the array type; adjust any
imports/forward references if needed. Locate the function where these parameters
are declared and change the type hints accordingly (steps → list[str], wandb_run
→ wandb.Run | None, config → dict[str, Any]); ensure the rest of the code that
references these types still type-checks.
| # Discard first N% of iterations for stable comparison | ||
| skip = max(1, int(len(steps) * config["skip_first_percent_time"])) | ||
| current_stable = current_gpu_util_values[skip:] | ||
| golden_stable = golden_gpu_util_values[skip:] | ||
|
|
||
| # Calculate average step timing | ||
| current_avg_timing = np.mean(current_timing_stable) | ||
| golden_avg_timing = np.mean(golden_timing_stable) | ||
| current_avg = float(np.nanmean(current_stable)) | ||
| golden_avg = float(np.nanmean(golden_stable)) | ||
|
|
||
| # Calculate timing difference | ||
| timing_diff = abs(current_avg_timing - golden_avg_timing) / golden_avg_timing | ||
| # Signed diff: positive = improvement (higher util), negative = regression (lower util) | ||
| signed_diff = (current_avg - golden_avg) / golden_avg if golden_avg != 0 else 0.0 | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, verify the file exists and check its size
ls -la scripts/performance/utils/evaluate.py
wc -l scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 200
🏁 Script executed:
#!/bin/bash
# Read the relevant section of the file with context
sed -n '320,350p' scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1380
🏁 Script executed:
#!/bin/bash
# Look for regression checks mentioned (lines 338-340)
sed -n '330,345p' scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 801
🏁 Script executed:
#!/bin/bash
# Test what np.nanmean returns for empty arrays
python3 << 'EOF'
import numpy as np
# Test empty array
result = np.nanmean([])
print(f"np.nanmean([]) = {result}")
print(f"Type: {type(result)}")
print(f"Is NaN: {np.isnan(result)}")
# Test NaN comparison behavior
nan_value = float('nan')
print(f"\nNaN < -0.05: {nan_value < -0.05}")
print(f"NaN > 0.05: {nan_value > 0.05}")
print(f"NaN == 0: {nan_value == 0}")
print(f"bool(NaN < -0.05): {bool(nan_value < -0.05)}")
# Test converting NaN to float
print(f"\nfloat(np.nanmean([])) = {float(np.nanmean([]))}")
EOFRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 188
🏁 Script executed:
#!/bin/bash
# Check for error handling and look at the broader function context
sed -n '300,360p' scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2705
🏁 Script executed:
#!/bin/bash
# Look for where config["skip_first_percent_time"] is set/used
rg -n "skip_first_percent_time" scripts/performance/utils/evaluate.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 569
🏁 Script executed:
#!/bin/bash
# Check if there's any error handling after the regression checks
sed -n '337,380p' scripts/performance/utils/evaluate.py | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2051
🏁 Script executed:
#!/bin/bash
# Verify how this function is called and if callers handle NaN/exceptions
rg -A 3 "validate_gpu_utilization" scripts/performance/utils/evaluate.py | head -30Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
Guard against empty/invalid GPU-util windows before pass/fail decisions.
Lines 327–333 can produce empty arrays on short runs: when skip >= len(steps), the slices become empty. Then np.nanmean(...) returns NaN, and the comparison checks at lines 337–338 silently fail to trigger—NaN comparisons always evaluate to False, allowing invalid test results to pass without detection.
🛡️ Proposed fix
+ if len(steps) == 0:
+ raise ValueError("No steps available for GPU utilization validation")
+
- skip = max(1, int(len(steps) * config["skip_first_percent_time"]))
+ skip = int(len(steps) * config["skip_first_percent_time"])
+ if skip >= len(steps):
+ skip = 0
current_stable = current_gpu_util_values[skip:]
golden_stable = golden_gpu_util_values[skip:]
current_avg = float(np.nanmean(current_stable))
golden_avg = float(np.nanmean(golden_stable))
+ if np.isnan(current_avg) or np.isnan(golden_avg):
+ raise ValueError("GPU utilization metrics are missing or non-finite for compared steps")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/performance/utils/evaluate.py` around lines 327 - 337, The slicing
for GPU-util windows (variables skip, current_stable, golden_stable) can produce
empty arrays causing np.nanmean to return NaN and break downstream comparisons;
update the logic in evaluate.py around the computation of current_avg,
golden_avg and signed_diff to validate that current_stable and golden_stable are
non-empty and contain finite values before calling np.nanmean (e.g., check
len(...)>0 and np.isfinite on the arrays), and if validation fails, set a clear
failure/invalid flag or raise an explicit error so the test doesn't silently
pass on NaN results; ensure any code that reads signed_diff handles this
explicit error/flag path.
Signed-off-by: oliver könig <okoenig@nvidia.com>
What does this PR do ?
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
New Features
Improvements