Changing hybrid_override_pattern to hybrid_layer_pattern to mirror mcore change by adityavavreNVDA · Pull Request #2628 · NVIDIA-NeMo/Megatron-Bridge

adityavavreNVDA · 2026-03-03T20:56:13Z

What does this PR do ?

This PR mirrors MLM PR#3377 that deprecates hybrid_override_pattern, hybrid_attention_ratio, and hybrid_mlp_ratio in favour of hybrid_layer_pattern.

Summary by CodeRabbit

New Features
- Introduced new hybrid_layer_pattern configuration option for improved hybrid layer processing specification across all model providers.
Deprecation
- Deprecated hybrid_attention_ratio, hybrid_mlp_ratio, and hybrid_override_pattern configuration parameters. These will be removed in a future release. Users should migrate to the new hybrid_layer_pattern option.

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

copy-pr-bot · 2026-03-03T20:56:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

coderabbitai · 2026-03-03T21:02:54Z

📝 Walkthrough

Walkthrough

This PR systematically renames and propagates the configuration field hybrid_override_pattern to hybrid_layer_pattern across Mamba, Nemotron-H, and Nemotron-VL model builders, providers, and bridges, with corresponding updates to training utilities and tests. A deprecation notice is added for the old fields.

Changes

Cohort / File(s)	Summary
Mamba Model Configuration `src/megatron/bridge/models/mamba/mamba_builder.py`, `src/megatron/bridge/models/mamba/mamba_provider.py`	Introduce new `hybrid_layer_pattern` field and propagate it to MCoreMambaModel construction, replacing `hybrid_override_pattern` with deprecation notice.
Nemotron-H Configuration `src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py`, `src/megatron/bridge/models/nemotronh/nemotron_h_provider.py`	Update CONFIG_MAPPING to target `hybrid_layer_pattern` and rename field across all NemotronH provider classes.
Nemotron-VL Configuration `src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py`, `src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py`	Add CONFIG_MAPPING entry for `hybrid_override_pattern` → `hybrid_layer_pattern` mapping and update LLaVAModel constructor call.
Training Utilities `src/megatron/bridge/training/utils/flop_utils.py`, `src/megatron/bridge/training/utils/train_utils.py`, `src/megatron/bridge/training/mlm_compat/model.py`	Replace references from `hybrid_override_pattern` to `hybrid_layer_pattern` in layer counting and MoE tracking logic.
Unit Tests `tests/unit_tests/models/mamba/...`, `tests/unit_tests/models/nemotronh/...`, `tests/unit_tests/training/...`	Update test expectations and assertions to use `hybrid_layer_pattern` instead of `hybrid_override_pattern`.
Functional Tests `tests/functional_tests/models/nemotron_vl/test_nemotron_vl_conversion.py`, `tests/functional_tests/models/nemotronh/test_nemotron_h_conversion.py`, `tests/functional_tests/recipes/test_nemotronh_recipes_*.py`	Add runtime config modifications and update model_overrides dictionaries to use `hybrid_layer_pattern` for parallelism configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

PR #2241: Introduced MambaModelConfig/MambaModelBuilder architecture that this PR modifies to add and propagate the new hybrid_layer_pattern field.
PR #2523: Removes the Mamba builder/config module that this PR updates for hybrid layer pattern renaming.
PR #1914: Introduced NemotronH/Nemotron-VL provider and bridge configuration fields that this PR renames and consolidates.

Suggested labels

enhancement

Suggested reviewers

yaoyu-33
maanug-nv
erhoo82

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description lacks test results, performance data, convergence verification, or testing methodology documentation for the refactoring changes.	Add test execution results, convergence/numerical verification, backward compatibility testing documentation, and address missing guards identified in review comments.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: renaming hybrid_override_pattern to hybrid_layer_pattern across the codebase to align with mcore's configuration convention.
Docstring Coverage	✅ Passed	Docstring coverage is 92.68% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch avavre/hybrid_pattern_fix

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/megatron/bridge/training/utils/flop_utils.py (1)

39-56: ⚠️ Potential issue | 🟠 Major

FLOPs path should fallback to deprecated hybrid key while deprecation is active.

The updated logic only reads hybrid_layer_pattern. For configs that still provide only hybrid_override_pattern, layer/MTP counts can be incorrect.

Proposed fix

     def calculate_layer_counts():
         """Calculate the number of attention, Mamba, MLP, and MoE layers."""
-        if hasattr(cfg.model, "hybrid_layer_pattern") and cfg.model.hybrid_layer_pattern:
+        hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr(
+            cfg.model, "hybrid_override_pattern", None
+        )
+        if hybrid_pattern:
             counts = {"M": 0, "*": 0, "-": 0, "E": 0}
             try:
                 parse_hybrid_pattern = importlib.import_module(
                     "megatron.core.ssm.mamba_hybrid_layer_allocation"
                 ).parse_hybrid_pattern
-                parsed = parse_hybrid_pattern(cfg.model.hybrid_layer_pattern)
+                parsed = parse_hybrid_pattern(hybrid_pattern)
                 if parsed.main_pattern:
                     for layer_type in parsed.main_pattern:
                         if layer_type in counts:
                             counts[layer_type] += 1
                 if parsed.mtp_pattern and parsed.mtp_num_depths > 0:
                     for layer_type in parsed.mtp_pattern:
                         if layer_type in counts:
                             counts[layer_type] += parsed.mtp_num_depths
             except (ImportError, ModuleNotFoundError):
-                for layer_type in cfg.model.hybrid_layer_pattern:
+                for layer_type in hybrid_pattern:
                     if layer_type in counts:
                         counts[layer_type] += 1
@@
-            hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None)
+            hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr(
+                cfg.model, "hybrid_override_pattern", None
+            )

Also applies to: 434-445

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 39 - 56, The
FLOPs counting block only reads cfg.model.hybrid_layer_pattern causing configs
that still use the deprecated cfg.model.hybrid_override_pattern to be ignored;
update the logic in the function containing the parse_hybrid_pattern import (and
the similar block around the second occurrence) to fall back to
cfg.model.hybrid_override_pattern when cfg.model.hybrid_layer_pattern is not
set, i.e., use hybrid_layer_pattern if present else hybrid_override_pattern,
then pass that value to parse_hybrid_pattern and the subsequent loops that
update counts (refer to parse_hybrid_pattern, parsed.main_pattern,
parsed.mtp_pattern, parsed.mtp_num_depths, and counts) so layer/MTP counts are
computed correctly for both new and deprecated keys.

🧹 Nitpick comments (4)

tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py (1)

323-323: Make the key mapping resilient during deprecation cleanup.

Line 323 hard-codes the deprecated source key and can become a brittle KeyError if the HF toy override dict is renamed later.

♻️ Optional hardening diff

 MEGATRON_NEMOTRON_3_NANO_OVERRIDES = {
     "num_layers": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["num_hidden_layers"],
-    "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"],
+    "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get(
+        "hybrid_layer_pattern",
+        HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"],
+    ),
     "hidden_size": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hidden_size"],

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py` at line
323, The test currently hard-codes
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building
the dict entry for "hybrid_layer_pattern", which will raise KeyError if that
source key is renamed; change this to a resilient lookup such as using
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a
sensible fallback, or use a safe dynamic lookup like next((v for k,v in
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value)
so the "hybrid_layer_pattern" assignment never throws and remains robust during
deprecation cleanup.

src/megatron/bridge/models/mamba/mamba_builder.py (1)

227-230: Add an explicit conflict rule between new and deprecated hybrid settings.

Lines 227–230 forward both APIs at once. If users set hybrid_layer_pattern and deprecated hybrid_* knobs together, precedence is implicit and easy to misconfigure.

🧭 Suggested guard (example)

+        if self._model_config.hybrid_layer_pattern is not None and (
+            self._model_config.hybrid_attention_ratio != 0.0
+            or self._model_config.hybrid_mlp_ratio != 0.0
+            or self._model_config.hybrid_override_pattern is not None
+        ):
+            raise ValueError(
+                "Use either hybrid_layer_pattern or deprecated hybrid_* settings, not both."
+            )
+
         return MCoreMambaModel(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/mamba/mamba_builder.py` around lines 227 - 230,
Detect and block simultaneous use of the new and deprecated hybrid APIs before
forwarding values: in the builder method that references _model_config (the
block passing hybrid_layer_pattern, hybrid_attention_ratio, hybrid_mlp_ratio,
hybrid_override_pattern), check if _model_config.hybrid_layer_pattern is set AND
any of _model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or
_model_config.hybrid_override_pattern are also set; if so, raise a clear
ValueError (or log an error and abort) instructing the user to choose either the
new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is
explicit and not implicit.

tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py (1)

193-196: Use exact-match assertions for canonical hybrid pattern defaults.

For fixed default patterns, in is too permissive and can hide regressions. Prefer == in these two tests.

Suggested test tightening

-        assert (
-            "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-"
-            in provider.hybrid_layer_pattern
-        )
+        assert (
+            provider.hybrid_layer_pattern
+            == "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-"
+        )

-        assert (
-            "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
-            in provider.hybrid_layer_pattern
-        )
+        assert (
+            provider.hybrid_layer_pattern
+            == "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
+        )

Also applies to: 227-230

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py` around lines
193 - 196, Replace the permissive substring assertion with an exact-match
assertion: change the test that currently uses "in" against
provider.hybrid_layer_pattern to assert equality (==) with the full canonical
pattern string for provider.hybrid_layer_pattern; apply the same change to the
other similar assertion in the same test file that checks the default hybrid
pattern (the one around the later occurrence) so both tests require an exact
match rather than a substring.

tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py (1)

161-161: Add a test case for the new config key path (hybrid_layer_pattern).

This assertion covers legacy-key compatibility (hybrid_override_pattern → provider hybrid_layer_pattern), but there’s no explicit test for configs that already emit hybrid_layer_pattern.

Suggested additional unit test

+    def test_provider_bridge_mamba_config_accepts_new_hybrid_key(self, nemotronh_8b_config_dict):
+        """Test Mamba config mapping when config uses hybrid_layer_pattern directly."""
+        cfg_dict = {k: v for k, v in nemotronh_8b_config_dict.items() if k != "hybrid_override_pattern"}
+        cfg_dict["hybrid_layer_pattern"] = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
+
+        cfg = Mock(spec=[])
+        for k, v in cfg_dict.items():
+            setattr(cfg, k, v)
+
+        mock_pretrained = Mock(spec=PreTrainedCausalLM)
+        mock_pretrained.config = cfg
+
+        bridge = NemotronHBridge()
+        result = bridge.provider_bridge(mock_pretrained)
+        assert result.hybrid_layer_pattern == cfg.hybrid_layer_pattern

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py` at line 161, Add
an explicit unit test that verifies the new config key path
`hybrid_layer_pattern` is honored (not only the legacy
`hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that
instantiates the same setup used earlier (using mock_nemotronh_config but
setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the
codepath that produces `result` (the same call that yields
`result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern ==
mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is
covered alongside the legacy-key test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py`:
- Line 63: The dataclass now uses renamed fields like hybrid_layer_pattern which
breaks callers passing legacy kwargs (e.g., hybrid_override_pattern); add
backward-compatibility by accepting the old deprecated kwargs as optional fields
(e.g., hybrid_override_pattern: Optional[str] = None) and in the class
__post_init__ map any deprecated field values onto the new fields (set
hybrid_layer_pattern from hybrid_override_pattern when present), emit a
DeprecationWarning via warnings.warn, and repeat this pattern for the other
renamed hybrid knobs (the fields referenced at the other commented locations) so
legacy callers do not get a TypeError.

In `@src/megatron/bridge/training/mlm_compat/model.py`:
- Around line 167-170: The constructor call is assuming
args.hybrid_override_pattern (and similar newly-introduced hybrid_* fields)
always exist and can raise AttributeError for older arg versions; update the
call site in model.py (where hybrid_attention_ratio, hybrid_mlp_ratio,
hybrid_override_pattern, hybrid_layer_pattern are passed) to guard each optional
argument by using safe attribute access (e.g., getattr(args,
'hybrid_override_pattern', <sane default>) and getattr(args,
'hybrid_layer_pattern', <sane default>)) so missing/deprecated fields fall back
to sensible defaults (None or the original default used elsewhere) instead of
crashing.

In `@src/megatron/bridge/training/utils/train_utils.py`:
- Line 627: The current assignment to layers assumes
config.model.hybrid_layer_pattern exists; change it to safely access the
attribute (e.g., pattern = getattr(config.model, "hybrid_layer_pattern", "") or
check hasattr(config.model, "hybrid_layer_pattern") and fall back to an empty
string or the deprecated key if present) and then compute layers =
pattern.count("E") so logging won't fail for configs still using the deprecated
key.

---

Outside diff comments:
In `@src/megatron/bridge/training/utils/flop_utils.py`:
- Around line 39-56: The FLOPs counting block only reads
cfg.model.hybrid_layer_pattern causing configs that still use the deprecated
cfg.model.hybrid_override_pattern to be ignored; update the logic in the
function containing the parse_hybrid_pattern import (and the similar block
around the second occurrence) to fall back to cfg.model.hybrid_override_pattern
when cfg.model.hybrid_layer_pattern is not set, i.e., use hybrid_layer_pattern
if present else hybrid_override_pattern, then pass that value to
parse_hybrid_pattern and the subsequent loops that update counts (refer to
parse_hybrid_pattern, parsed.main_pattern, parsed.mtp_pattern,
parsed.mtp_num_depths, and counts) so layer/MTP counts are computed correctly
for both new and deprecated keys.

---

Nitpick comments:
In `@src/megatron/bridge/models/mamba/mamba_builder.py`:
- Around line 227-230: Detect and block simultaneous use of the new and
deprecated hybrid APIs before forwarding values: in the builder method that
references _model_config (the block passing hybrid_layer_pattern,
hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern), check if
_model_config.hybrid_layer_pattern is set AND any of
_model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or
_model_config.hybrid_override_pattern are also set; if so, raise a clear
ValueError (or log an error and abort) instructing the user to choose either the
new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is
explicit and not implicit.

In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py`:
- Line 323: The test currently hard-codes
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building
the dict entry for "hybrid_layer_pattern", which will raise KeyError if that
source key is renamed; change this to a resilient lookup such as using
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a
sensible fallback, or use a safe dynamic lookup like next((v for k,v in
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value)
so the "hybrid_layer_pattern" assignment never throws and remains robust during
deprecation cleanup.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py`:
- Line 161: Add an explicit unit test that verifies the new config key path
`hybrid_layer_pattern` is honored (not only the legacy
`hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that
instantiates the same setup used earlier (using mock_nemotronh_config but
setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the
codepath that produces `result` (the same call that yields
`result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern ==
mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is
covered alongside the legacy-key test.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py`:
- Around line 193-196: Replace the permissive substring assertion with an
exact-match assertion: change the test that currently uses "in" against
provider.hybrid_layer_pattern to assert equality (==) with the full canonical
pattern string for provider.hybrid_layer_pattern; apply the same change to the
other similar assertion in the same test file that checks the default hybrid
pattern (the one around the later occurrence) so both tests require an exact
match rather than a substring.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b3d00 and 8b20cd9.

📒 Files selected for processing (19)

src/megatron/bridge/models/mamba/mamba_builder.py
src/megatron/bridge/models/mamba/mamba_provider.py
src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py
src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py
src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py
src/megatron/bridge/models/nemotronh/nemotron_h_provider.py
src/megatron/bridge/training/mlm_compat/model.py
src/megatron/bridge/training/utils/flop_utils.py
src/megatron/bridge/training/utils/train_utils.py
tests/functional_tests/models/nemotron_vl/test_nemotron_vl_conversion.py
tests/functional_tests/models/nemotronh/test_nemotron_h_conversion.py
tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py
tests/functional_tests/recipes/test_nemotronh_recipes_pretrain.py
tests/unit_tests/models/mamba/test_mamba_builder.py
tests/unit_tests/models/mamba/test_mamba_provider.py
tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py
tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py
tests/unit_tests/training/mlm_compat/test_model.py
tests/unit_tests/training/utils/test_flop_utils.py

coderabbitai · 2026-03-03T21:02:57Z

src/megatron/bridge/models/nemotronh/nemotron_h_provider.py

    """Configuration for a 4B parameter Nemotron-H model."""

-    hybrid_override_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
+    hybrid_layer_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"


⚠️ Potential issue | 🟠 Major

Deprecation is currently a hard break for provider constructor kwargs.

These renames remove acceptance of legacy kwargs at dataclass init time. Any caller still passing hybrid_override_pattern (or the other deprecated hybrid knobs mentioned in the PR) will now fail with TypeError instead of getting a deprecation warning.

Proposed compatibility bridge for deprecated config fields

`@dataclass` class NemotronHModelProvider(MambaModelProvider): """Configuration for Nemotron-H models.""" + hybrid_layer_pattern: str | None = None + hybrid_override_pattern: str | None = None + hybrid_attention_ratio: float | None = None + hybrid_mlp_ratio: float | None = None + + def __post_init__(self) -> None: + if self.hybrid_override_pattern is not None: + warnings.warn( + "hybrid_override_pattern is deprecated; use hybrid_layer_pattern.", + DeprecationWarning, + stacklevel=2, + ) + if self.hybrid_layer_pattern is not None and self.hybrid_layer_pattern != self.hybrid_override_pattern: + raise ValueError( + "Both hybrid_layer_pattern and hybrid_override_pattern were provided with different values." + ) + self.hybrid_layer_pattern = self.hybrid_override_pattern + + if self.hybrid_attention_ratio is not None or self.hybrid_mlp_ratio is not None: + warnings.warn( + "hybrid_attention_ratio and hybrid_mlp_ratio are deprecated; use hybrid_layer_pattern.", + DeprecationWarning, + stacklevel=2, + ) + + super().__post_init__()

Also applies to: 78-78, 91-91, 106-106, 124-124, 140-140, 158-158

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py` at line 63, The dataclass now uses renamed fields like hybrid_layer_pattern which breaks callers passing legacy kwargs (e.g., hybrid_override_pattern); add backward-compatibility by accepting the old deprecated kwargs as optional fields (e.g., hybrid_override_pattern: Optional[str] = None) and in the class __post_init__ map any deprecated field values onto the new fields (set hybrid_layer_pattern from hybrid_override_pattern when present), emit a DeprecationWarning via warnings.warn, and repeat this pattern for the other renamed hybrid knobs (the fields referenced at the other commented locations) so legacy callers do not get a TypeError.

coderabbitai · 2026-03-03T21:02:57Z

src/megatron/bridge/training/mlm_compat/model.py

        hybrid_attention_ratio=args.hybrid_attention_ratio,
        hybrid_mlp_ratio=args.hybrid_mlp_ratio,
        hybrid_override_pattern=args.hybrid_override_pattern,
+        hybrid_layer_pattern=args.hybrid_layer_pattern,


⚠️ Potential issue | 🟠 Major

Guard deprecated/new hybrid args with safe defaults to prevent AttributeError.

Line 169 assumes args.hybrid_override_pattern always exists. With the rename rollout, this can crash when older/deprecated args are absent.

🐛 Proposed fix

model = MambaModel( config=config, mamba_stack_spec=mamba_stack_spec, vocab_size=args.padded_vocab_size, max_sequence_length=args.max_position_embeddings, pre_process=pre_process, - hybrid_attention_ratio=args.hybrid_attention_ratio, - hybrid_mlp_ratio=args.hybrid_mlp_ratio, - hybrid_override_pattern=args.hybrid_override_pattern, - hybrid_layer_pattern=args.hybrid_layer_pattern, + hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0), + hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0), + hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None), + hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None), post_process=post_process,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

hybrid_attention_ratio=args.hybrid_attention_ratio,

hybrid_mlp_ratio=args.hybrid_mlp_ratio,

hybrid_override_pattern=args.hybrid_override_pattern,

hybrid_layer_pattern=args.hybrid_layer_pattern,

hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0),

hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0),

hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None),

hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/mlm_compat/model.py` around lines 167 - 170, The constructor call is assuming args.hybrid_override_pattern (and similar newly-introduced hybrid_* fields) always exist and can raise AttributeError for older arg versions; update the call site in model.py (where hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern, hybrid_layer_pattern are passed) to guard each optional argument by using safe attribute access (e.g., getattr(args, 'hybrid_override_pattern', <sane default>) and getattr(args, 'hybrid_layer_pattern', <sane default>)) so missing/deprecated fields fall back to sensible defaults (None or the original default used elsewhere) instead of crashing.

coderabbitai · 2026-03-03T21:02:57Z

src/megatron/bridge/training/utils/train_utils.py


        if config.model.is_hybrid_model:
-            layers = config.model.hybrid_override_pattern.count("E")
+            layers = config.model.hybrid_layer_pattern.count("E")


⚠️ Potential issue | 🟠 Major

Guard hybrid pattern access during deprecation window.

Line 627 assumes hybrid_layer_pattern is always populated. Hybrid configs that still only set the deprecated key can fail in logging.

Proposed fix

- if config.model.is_hybrid_model: - layers = config.model.hybrid_layer_pattern.count("E") + if config.model.is_hybrid_model: + hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr( + config.model, "hybrid_override_pattern", None + ) + layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers else: layers = config.model.num_layers

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

layers = config.model.hybrid_layer_pattern.count("E")

if config.model.is_hybrid_model:

hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr(

config.model, "hybrid_override_pattern", None

)

layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers

else:

layers = config.model.num_layers

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/train_utils.py` at line 627, The current assignment to layers assumes config.model.hybrid_layer_pattern exists; change it to safely access the attribute (e.g., pattern = getattr(config.model, "hybrid_layer_pattern", "") or check hasattr(config.model, "hybrid_layer_pattern") and fall back to an empty string or the deprecated key if present) and then compute layers = pattern.count("E") so logging won't fail for configs still using the deprecated key.

adityavavreNVDA · 2026-03-03T21:03:36Z

@yaoyu-33 We also need to remove num_layers and calculate it from pattern to mirror MLM https://github.com/duncanriach/Megatron-LM/blob/aca6ca3970f2eddc1774ac706df8701e8b15ddcd/megatron/training/arguments.py#L601-L616

Where do you think this goes?

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

adityavavreNVDA · 2026-03-04T02:20:05Z

/ok to test fede9aa

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

yaoyu-33 · 2026-03-04T03:16:54Z

/ok to test 0ce7b9f

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…6-03-04-main Made-with: Cursor # Conflicts: # uv.lock

adityavavreNVDA · 2026-03-04T19:20:29Z

/ok to test 4dcf187

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

adityavavreNVDA · 2026-03-05T00:42:20Z

/ok to test 8377697

yaoyu-33 · 2026-03-05T04:28:30Z

/ok to test eebd1f8

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

adityavavreNVDA · 2026-03-05T07:27:15Z

/ok to test 1055d0d

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

adityavavreNVDA · 2026-03-05T17:13:23Z

/ok to test c19f025

Changing hybrid_override_pattern to hybrid_layer_pattern

8b20cd9

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

Remove unecessary args

d0a4549

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

adityavavreNVDA requested review from cuichenx, liding-nv and yaoyu-33 March 3, 2026 23:47

adityavavreNVDA added 2 commits March 3, 2026 17:42

Removing num layers from hybrid model providers

3f5c0d7

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

Adding num layers calculation in finalize()

fede9aa

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

copy-pr-bot bot temporarily deployed to public March 4, 2026 02:22 Inactive

adityavavreNVDA added 2 commits March 3, 2026 18:53

Fixing lint and some tests

f8ce1e5

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

Merge branch 'main' into avavre/hybrid_pattern_fix

0ce7b9f

yaoyu-33 previously approved these changes Mar 4, 2026

View reviewed changes

yaoyu-33 enabled auto-merge (squash) March 4, 2026 03:17

copy-pr-bot bot temporarily deployed to public March 4, 2026 03:49 Inactive

lint fix

6093599

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

adityavavreNVDA dismissed yaoyu-33’s stale review via 6093599 March 4, 2026 17:58

dimapihtar and others added 2 commits March 4, 2026 11:11

chore(beep boop 🤖): Bump uv.lock (main) (2026-03-04)

51cef9d

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into bump-ci-container-202…

010ec73

…6-03-04-main Made-with: Cursor # Conflicts: # uv.lock

adityavavreNVDA requested a review from a team as a code owner March 4, 2026 19:16

Merge branch 'main' into avavre/hybrid_pattern_fix

4dcf187

copy-pr-bot bot temporarily deployed to test March 4, 2026 19:21 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 4, 2026 20:48 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 4, 2026 21:31 Failure

Fixing tests to reflect changes

f7d2f99

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

Merge branch 'main' into avavre/hybrid_pattern_fix

8377697

copy-pr-bot bot temporarily deployed to test March 5, 2026 00:43 Inactive

copy-pr-bot bot temporarily deployed to public March 5, 2026 00:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 5, 2026 03:08 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 5, 2026 03:15 Failure

Merge branch 'main' into avavre/hybrid_pattern_fix

eebd1f8

copy-pr-bot bot temporarily deployed to test March 5, 2026 04:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 5, 2026 07:11 Inactive

unit test fix

1055d0d

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

copy-pr-bot bot temporarily deployed to public March 5, 2026 07:59 Inactive

lint fix

c19f025

Signed-off-by: Aditya Vavre <avavre@nvidia.com>

-            layers = config.model.hybrid_layer_pattern.count("E")
+        if config.model.is_hybrid_model:
+            hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr(
+                config.model, "hybrid_override_pattern", None
+            )
+            layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers
+        else:
+            layers = config.model.num_layers

Conversation

adityavavreNVDA commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

adityavavreNVDA commented Mar 3, 2026

Uh oh!

adityavavreNVDA commented Mar 4, 2026

Uh oh!

yaoyu-33 commented Mar 4, 2026

Uh oh!

adityavavreNVDA commented Mar 4, 2026

Uh oh!

adityavavreNVDA commented Mar 5, 2026

Uh oh!

yaoyu-33 commented Mar 5, 2026

Uh oh!

adityavavreNVDA commented Mar 5, 2026

Uh oh!

adityavavreNVDA commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adityavavreNVDA commented Mar 3, 2026 •

edited by coderabbitai bot

Loading