Skip to content

Changing hybrid_override_pattern to hybrid_layer_pattern to mirror mcore change#2628

Open
adityavavreNVDA wants to merge 15 commits intomainfrom
avavre/hybrid_pattern_fix
Open

Changing hybrid_override_pattern to hybrid_layer_pattern to mirror mcore change#2628
adityavavreNVDA wants to merge 15 commits intomainfrom
avavre/hybrid_pattern_fix

Conversation

@adityavavreNVDA
Copy link
Contributor

@adityavavreNVDA adityavavreNVDA commented Mar 3, 2026

What does this PR do ?

This PR mirrors MLM PR#3377 that deprecates hybrid_override_pattern, hybrid_attention_ratio, and hybrid_mlp_ratio in favour of hybrid_layer_pattern.

Summary by CodeRabbit

  • New Features

    • Introduced new hybrid_layer_pattern configuration option for improved hybrid layer processing specification across all model providers.
  • Deprecation

    • Deprecated hybrid_attention_ratio, hybrid_mlp_ratio, and hybrid_override_pattern configuration parameters. These will be removed in a future release. Users should migrate to the new hybrid_layer_pattern option.

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This PR systematically renames and propagates the configuration field hybrid_override_pattern to hybrid_layer_pattern across Mamba, Nemotron-H, and Nemotron-VL model builders, providers, and bridges, with corresponding updates to training utilities and tests. A deprecation notice is added for the old fields.

Changes

Cohort / File(s) Summary
Mamba Model Configuration
src/megatron/bridge/models/mamba/mamba_builder.py, src/megatron/bridge/models/mamba/mamba_provider.py
Introduce new hybrid_layer_pattern field and propagate it to MCoreMambaModel construction, replacing hybrid_override_pattern with deprecation notice.
Nemotron-H Configuration
src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py, src/megatron/bridge/models/nemotronh/nemotron_h_provider.py
Update CONFIG_MAPPING to target hybrid_layer_pattern and rename field across all NemotronH provider classes.
Nemotron-VL Configuration
src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py, src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py
Add CONFIG_MAPPING entry for hybrid_override_patternhybrid_layer_pattern mapping and update LLaVAModel constructor call.
Training Utilities
src/megatron/bridge/training/utils/flop_utils.py, src/megatron/bridge/training/utils/train_utils.py, src/megatron/bridge/training/mlm_compat/model.py
Replace references from hybrid_override_pattern to hybrid_layer_pattern in layer counting and MoE tracking logic.
Unit Tests
tests/unit_tests/models/mamba/..., tests/unit_tests/models/nemotronh/..., tests/unit_tests/training/...
Update test expectations and assertions to use hybrid_layer_pattern instead of hybrid_override_pattern.
Functional Tests
tests/functional_tests/models/nemotron_vl/test_nemotron_vl_conversion.py, tests/functional_tests/models/nemotronh/test_nemotron_h_conversion.py, tests/functional_tests/recipes/test_nemotronh_recipes_*.py
Add runtime config modifications and update model_overrides dictionaries to use hybrid_layer_pattern for parallelism configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

  • PR #2241: Introduced MambaModelConfig/MambaModelBuilder architecture that this PR modifies to add and propagate the new hybrid_layer_pattern field.
  • PR #2523: Removes the Mamba builder/config module that this PR updates for hybrid layer pattern renaming.
  • PR #1914: Introduced NemotronH/Nemotron-VL provider and bridge configuration fields that this PR renames and consolidates.

Suggested labels

enhancement

Suggested reviewers

  • yaoyu-33
  • maanug-nv
  • erhoo82
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description lacks test results, performance data, convergence verification, or testing methodology documentation for the refactoring changes. Add test execution results, convergence/numerical verification, backward compatibility testing documentation, and address missing guards identified in review comments.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: renaming hybrid_override_pattern to hybrid_layer_pattern across the codebase to align with mcore's configuration convention.
Docstring Coverage ✅ Passed Docstring coverage is 92.68% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch avavre/hybrid_pattern_fix

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/megatron/bridge/training/utils/flop_utils.py (1)

39-56: ⚠️ Potential issue | 🟠 Major

FLOPs path should fallback to deprecated hybrid key while deprecation is active.

The updated logic only reads hybrid_layer_pattern. For configs that still provide only hybrid_override_pattern, layer/MTP counts can be incorrect.

Proposed fix
     def calculate_layer_counts():
         """Calculate the number of attention, Mamba, MLP, and MoE layers."""
-        if hasattr(cfg.model, "hybrid_layer_pattern") and cfg.model.hybrid_layer_pattern:
+        hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr(
+            cfg.model, "hybrid_override_pattern", None
+        )
+        if hybrid_pattern:
             counts = {"M": 0, "*": 0, "-": 0, "E": 0}
             try:
                 parse_hybrid_pattern = importlib.import_module(
                     "megatron.core.ssm.mamba_hybrid_layer_allocation"
                 ).parse_hybrid_pattern
-                parsed = parse_hybrid_pattern(cfg.model.hybrid_layer_pattern)
+                parsed = parse_hybrid_pattern(hybrid_pattern)
                 if parsed.main_pattern:
                     for layer_type in parsed.main_pattern:
                         if layer_type in counts:
                             counts[layer_type] += 1
                 if parsed.mtp_pattern and parsed.mtp_num_depths > 0:
                     for layer_type in parsed.mtp_pattern:
                         if layer_type in counts:
                             counts[layer_type] += parsed.mtp_num_depths
             except (ImportError, ModuleNotFoundError):
-                for layer_type in cfg.model.hybrid_layer_pattern:
+                for layer_type in hybrid_pattern:
                     if layer_type in counts:
                         counts[layer_type] += 1
@@
-            hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None)
+            hybrid_pattern = getattr(cfg.model, "hybrid_layer_pattern", None) or getattr(
+                cfg.model, "hybrid_override_pattern", None
+            )

Also applies to: 434-445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 39 - 56, The
FLOPs counting block only reads cfg.model.hybrid_layer_pattern causing configs
that still use the deprecated cfg.model.hybrid_override_pattern to be ignored;
update the logic in the function containing the parse_hybrid_pattern import (and
the similar block around the second occurrence) to fall back to
cfg.model.hybrid_override_pattern when cfg.model.hybrid_layer_pattern is not
set, i.e., use hybrid_layer_pattern if present else hybrid_override_pattern,
then pass that value to parse_hybrid_pattern and the subsequent loops that
update counts (refer to parse_hybrid_pattern, parsed.main_pattern,
parsed.mtp_pattern, parsed.mtp_num_depths, and counts) so layer/MTP counts are
computed correctly for both new and deprecated keys.
🧹 Nitpick comments (4)
tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py (1)

323-323: Make the key mapping resilient during deprecation cleanup.

Line 323 hard-codes the deprecated source key and can become a brittle KeyError if the HF toy override dict is renamed later.

♻️ Optional hardening diff
 MEGATRON_NEMOTRON_3_NANO_OVERRIDES = {
     "num_layers": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["num_hidden_layers"],
-    "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"],
+    "hybrid_layer_pattern": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get(
+        "hybrid_layer_pattern",
+        HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"],
+    ),
     "hidden_size": HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hidden_size"],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py` at line
323, The test currently hard-codes
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building
the dict entry for "hybrid_layer_pattern", which will raise KeyError if that
source key is renamed; change this to a resilient lookup such as using
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a
sensible fallback, or use a safe dynamic lookup like next((v for k,v in
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value)
so the "hybrid_layer_pattern" assignment never throws and remains robust during
deprecation cleanup.
src/megatron/bridge/models/mamba/mamba_builder.py (1)

227-230: Add an explicit conflict rule between new and deprecated hybrid settings.

Lines 227–230 forward both APIs at once. If users set hybrid_layer_pattern and deprecated hybrid_* knobs together, precedence is implicit and easy to misconfigure.

🧭 Suggested guard (example)
+        if self._model_config.hybrid_layer_pattern is not None and (
+            self._model_config.hybrid_attention_ratio != 0.0
+            or self._model_config.hybrid_mlp_ratio != 0.0
+            or self._model_config.hybrid_override_pattern is not None
+        ):
+            raise ValueError(
+                "Use either hybrid_layer_pattern or deprecated hybrid_* settings, not both."
+            )
+
         return MCoreMambaModel(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/mamba/mamba_builder.py` around lines 227 - 230,
Detect and block simultaneous use of the new and deprecated hybrid APIs before
forwarding values: in the builder method that references _model_config (the
block passing hybrid_layer_pattern, hybrid_attention_ratio, hybrid_mlp_ratio,
hybrid_override_pattern), check if _model_config.hybrid_layer_pattern is set AND
any of _model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or
_model_config.hybrid_override_pattern are also set; if so, raise a clear
ValueError (or log an error and abort) instructing the user to choose either the
new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is
explicit and not implicit.
tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py (1)

193-196: Use exact-match assertions for canonical hybrid pattern defaults.

For fixed default patterns, in is too permissive and can hide regressions. Prefer == in these two tests.

Suggested test tightening
-        assert (
-            "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-"
-            in provider.hybrid_layer_pattern
-        )
+        assert (
+            provider.hybrid_layer_pattern
+            == "M-M-M-M-M-M-M-M-M*-M-M-M-M-M-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-M-M---MM---M-M*-M-M-M-M-M-"
+        )

-        assert (
-            "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
-            in provider.hybrid_layer_pattern
-        )
+        assert (
+            provider.hybrid_layer_pattern
+            == "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
+        )

Also applies to: 227-230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py` around lines
193 - 196, Replace the permissive substring assertion with an exact-match
assertion: change the test that currently uses "in" against
provider.hybrid_layer_pattern to assert equality (==) with the full canonical
pattern string for provider.hybrid_layer_pattern; apply the same change to the
other similar assertion in the same test file that checks the default hybrid
pattern (the one around the later occurrence) so both tests require an exact
match rather than a substring.
tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py (1)

161-161: Add a test case for the new config key path (hybrid_layer_pattern).

This assertion covers legacy-key compatibility (hybrid_override_pattern → provider hybrid_layer_pattern), but there’s no explicit test for configs that already emit hybrid_layer_pattern.

Suggested additional unit test
+    def test_provider_bridge_mamba_config_accepts_new_hybrid_key(self, nemotronh_8b_config_dict):
+        """Test Mamba config mapping when config uses hybrid_layer_pattern directly."""
+        cfg_dict = {k: v for k, v in nemotronh_8b_config_dict.items() if k != "hybrid_override_pattern"}
+        cfg_dict["hybrid_layer_pattern"] = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
+
+        cfg = Mock(spec=[])
+        for k, v in cfg_dict.items():
+            setattr(cfg, k, v)
+
+        mock_pretrained = Mock(spec=PreTrainedCausalLM)
+        mock_pretrained.config = cfg
+
+        bridge = NemotronHBridge()
+        result = bridge.provider_bridge(mock_pretrained)
+        assert result.hybrid_layer_pattern == cfg.hybrid_layer_pattern
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py` at line 161, Add
an explicit unit test that verifies the new config key path
`hybrid_layer_pattern` is honored (not only the legacy
`hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that
instantiates the same setup used earlier (using mock_nemotronh_config but
setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the
codepath that produces `result` (the same call that yields
`result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern ==
mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is
covered alongside the legacy-key test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py`:
- Line 63: The dataclass now uses renamed fields like hybrid_layer_pattern which
breaks callers passing legacy kwargs (e.g., hybrid_override_pattern); add
backward-compatibility by accepting the old deprecated kwargs as optional fields
(e.g., hybrid_override_pattern: Optional[str] = None) and in the class
__post_init__ map any deprecated field values onto the new fields (set
hybrid_layer_pattern from hybrid_override_pattern when present), emit a
DeprecationWarning via warnings.warn, and repeat this pattern for the other
renamed hybrid knobs (the fields referenced at the other commented locations) so
legacy callers do not get a TypeError.

In `@src/megatron/bridge/training/mlm_compat/model.py`:
- Around line 167-170: The constructor call is assuming
args.hybrid_override_pattern (and similar newly-introduced hybrid_* fields)
always exist and can raise AttributeError for older arg versions; update the
call site in model.py (where hybrid_attention_ratio, hybrid_mlp_ratio,
hybrid_override_pattern, hybrid_layer_pattern are passed) to guard each optional
argument by using safe attribute access (e.g., getattr(args,
'hybrid_override_pattern', <sane default>) and getattr(args,
'hybrid_layer_pattern', <sane default>)) so missing/deprecated fields fall back
to sensible defaults (None or the original default used elsewhere) instead of
crashing.

In `@src/megatron/bridge/training/utils/train_utils.py`:
- Line 627: The current assignment to layers assumes
config.model.hybrid_layer_pattern exists; change it to safely access the
attribute (e.g., pattern = getattr(config.model, "hybrid_layer_pattern", "") or
check hasattr(config.model, "hybrid_layer_pattern") and fall back to an empty
string or the deprecated key if present) and then compute layers =
pattern.count("E") so logging won't fail for configs still using the deprecated
key.

---

Outside diff comments:
In `@src/megatron/bridge/training/utils/flop_utils.py`:
- Around line 39-56: The FLOPs counting block only reads
cfg.model.hybrid_layer_pattern causing configs that still use the deprecated
cfg.model.hybrid_override_pattern to be ignored; update the logic in the
function containing the parse_hybrid_pattern import (and the similar block
around the second occurrence) to fall back to cfg.model.hybrid_override_pattern
when cfg.model.hybrid_layer_pattern is not set, i.e., use hybrid_layer_pattern
if present else hybrid_override_pattern, then pass that value to
parse_hybrid_pattern and the subsequent loops that update counts (refer to
parse_hybrid_pattern, parsed.main_pattern, parsed.mtp_pattern,
parsed.mtp_num_depths, and counts) so layer/MTP counts are computed correctly
for both new and deprecated keys.

---

Nitpick comments:
In `@src/megatron/bridge/models/mamba/mamba_builder.py`:
- Around line 227-230: Detect and block simultaneous use of the new and
deprecated hybrid APIs before forwarding values: in the builder method that
references _model_config (the block passing hybrid_layer_pattern,
hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern), check if
_model_config.hybrid_layer_pattern is set AND any of
_model_config.hybrid_attention_ratio, _model_config.hybrid_mlp_ratio, or
_model_config.hybrid_override_pattern are also set; if so, raise a clear
ValueError (or log an error and abort) instructing the user to choose either the
new hybrid_layer_pattern API or the deprecated hybrid_* knobs, so precedence is
explicit and not implicit.

In `@tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py`:
- Line 323: The test currently hard-codes
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES["hybrid_override_pattern"] when building
the dict entry for "hybrid_layer_pattern", which will raise KeyError if that
source key is renamed; change this to a resilient lookup such as using
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.get("hybrid_override_pattern") with a
sensible fallback, or use a safe dynamic lookup like next((v for k,v in
HF_NEMOTRON_3_NANO_TOY_MODEL_OVERRIDES.items() if "hybrid" in k), default_value)
so the "hybrid_layer_pattern" assignment never throws and remains robust during
deprecation cleanup.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py`:
- Line 161: Add an explicit unit test that verifies the new config key path
`hybrid_layer_pattern` is honored (not only the legacy
`hybrid_override_pattern`): create a test in test_nemotron_h_bridge.py that
instantiates the same setup used earlier (using mock_nemotronh_config but
setting `hybrid_layer_pattern` instead of `hybrid_override_pattern`), run the
codepath that produces `result` (the same call that yields
`result.hybrid_layer_pattern`), and assert that `result.hybrid_layer_pattern ==
mock_nemotronh_config.hybrid_layer_pattern` to ensure direct-key behavior is
covered alongside the legacy-key test.

In `@tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py`:
- Around line 193-196: Replace the permissive substring assertion with an
exact-match assertion: change the test that currently uses "in" against
provider.hybrid_layer_pattern to assert equality (==) with the full canonical
pattern string for provider.hybrid_layer_pattern; apply the same change to the
other similar assertion in the same test file that checks the default hybrid
pattern (the one around the later occurrence) so both tests require an exact
match rather than a substring.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b3d00 and 8b20cd9.

📒 Files selected for processing (19)
  • src/megatron/bridge/models/mamba/mamba_builder.py
  • src/megatron/bridge/models/mamba/mamba_provider.py
  • src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py
  • src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py
  • src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py
  • src/megatron/bridge/models/nemotronh/nemotron_h_provider.py
  • src/megatron/bridge/training/mlm_compat/model.py
  • src/megatron/bridge/training/utils/flop_utils.py
  • src/megatron/bridge/training/utils/train_utils.py
  • tests/functional_tests/models/nemotron_vl/test_nemotron_vl_conversion.py
  • tests/functional_tests/models/nemotronh/test_nemotron_h_conversion.py
  • tests/functional_tests/recipes/test_nemotronh_recipes_finetune.py
  • tests/functional_tests/recipes/test_nemotronh_recipes_pretrain.py
  • tests/unit_tests/models/mamba/test_mamba_builder.py
  • tests/unit_tests/models/mamba/test_mamba_provider.py
  • tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py
  • tests/unit_tests/models/nemotronh/test_nemotron_h_provider.py
  • tests/unit_tests/training/mlm_compat/test_model.py
  • tests/unit_tests/training/utils/test_flop_utils.py

"""Configuration for a 4B parameter Nemotron-H model."""

hybrid_override_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
hybrid_layer_pattern: str = "M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M*-M-M-M-M-M-"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deprecation is currently a hard break for provider constructor kwargs.

These renames remove acceptance of legacy kwargs at dataclass init time. Any caller still passing hybrid_override_pattern (or the other deprecated hybrid knobs mentioned in the PR) will now fail with TypeError instead of getting a deprecation warning.

Proposed compatibility bridge for deprecated config fields
 `@dataclass`
 class NemotronHModelProvider(MambaModelProvider):
     """Configuration for Nemotron-H models."""
+    hybrid_layer_pattern: str | None = None
+    hybrid_override_pattern: str | None = None
+    hybrid_attention_ratio: float | None = None
+    hybrid_mlp_ratio: float | None = None
+
+    def __post_init__(self) -> None:
+        if self.hybrid_override_pattern is not None:
+            warnings.warn(
+                "hybrid_override_pattern is deprecated; use hybrid_layer_pattern.",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+            if self.hybrid_layer_pattern is not None and self.hybrid_layer_pattern != self.hybrid_override_pattern:
+                raise ValueError(
+                    "Both hybrid_layer_pattern and hybrid_override_pattern were provided with different values."
+                )
+            self.hybrid_layer_pattern = self.hybrid_override_pattern
+
+        if self.hybrid_attention_ratio is not None or self.hybrid_mlp_ratio is not None:
+            warnings.warn(
+                "hybrid_attention_ratio and hybrid_mlp_ratio are deprecated; use hybrid_layer_pattern.",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
+        super().__post_init__()

Also applies to: 78-78, 91-91, 106-106, 124-124, 140-140, 158-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/nemotronh/nemotron_h_provider.py` at line 63, The
dataclass now uses renamed fields like hybrid_layer_pattern which breaks callers
passing legacy kwargs (e.g., hybrid_override_pattern); add
backward-compatibility by accepting the old deprecated kwargs as optional fields
(e.g., hybrid_override_pattern: Optional[str] = None) and in the class
__post_init__ map any deprecated field values onto the new fields (set
hybrid_layer_pattern from hybrid_override_pattern when present), emit a
DeprecationWarning via warnings.warn, and repeat this pattern for the other
renamed hybrid knobs (the fields referenced at the other commented locations) so
legacy callers do not get a TypeError.

Comment on lines +167 to +170
hybrid_attention_ratio=args.hybrid_attention_ratio,
hybrid_mlp_ratio=args.hybrid_mlp_ratio,
hybrid_override_pattern=args.hybrid_override_pattern,
hybrid_layer_pattern=args.hybrid_layer_pattern,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard deprecated/new hybrid args with safe defaults to prevent AttributeError.

Line 169 assumes args.hybrid_override_pattern always exists. With the rename rollout, this can crash when older/deprecated args are absent.

🐛 Proposed fix
     model = MambaModel(
         config=config,
         mamba_stack_spec=mamba_stack_spec,
         vocab_size=args.padded_vocab_size,
         max_sequence_length=args.max_position_embeddings,
         pre_process=pre_process,
-        hybrid_attention_ratio=args.hybrid_attention_ratio,
-        hybrid_mlp_ratio=args.hybrid_mlp_ratio,
-        hybrid_override_pattern=args.hybrid_override_pattern,
-        hybrid_layer_pattern=args.hybrid_layer_pattern,
+        hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0),
+        hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0),
+        hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None),
+        hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None),
         post_process=post_process,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hybrid_attention_ratio=args.hybrid_attention_ratio,
hybrid_mlp_ratio=args.hybrid_mlp_ratio,
hybrid_override_pattern=args.hybrid_override_pattern,
hybrid_layer_pattern=args.hybrid_layer_pattern,
hybrid_attention_ratio=getattr(args, "hybrid_attention_ratio", 0.0),
hybrid_mlp_ratio=getattr(args, "hybrid_mlp_ratio", 0.0),
hybrid_override_pattern=getattr(args, "hybrid_override_pattern", None),
hybrid_layer_pattern=getattr(args, "hybrid_layer_pattern", None),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/training/mlm_compat/model.py` around lines 167 - 170, The
constructor call is assuming args.hybrid_override_pattern (and similar
newly-introduced hybrid_* fields) always exist and can raise AttributeError for
older arg versions; update the call site in model.py (where
hybrid_attention_ratio, hybrid_mlp_ratio, hybrid_override_pattern,
hybrid_layer_pattern are passed) to guard each optional argument by using safe
attribute access (e.g., getattr(args, 'hybrid_override_pattern', <sane default>)
and getattr(args, 'hybrid_layer_pattern', <sane default>)) so missing/deprecated
fields fall back to sensible defaults (None or the original default used
elsewhere) instead of crashing.


if config.model.is_hybrid_model:
layers = config.model.hybrid_override_pattern.count("E")
layers = config.model.hybrid_layer_pattern.count("E")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard hybrid pattern access during deprecation window.

Line 627 assumes hybrid_layer_pattern is always populated. Hybrid configs that still only set the deprecated key can fail in logging.

Proposed fix
-        if config.model.is_hybrid_model:
-            layers = config.model.hybrid_layer_pattern.count("E")
+        if config.model.is_hybrid_model:
+            hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr(
+                config.model, "hybrid_override_pattern", None
+            )
+            layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers
         else:
             layers = config.model.num_layers
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
layers = config.model.hybrid_layer_pattern.count("E")
if config.model.is_hybrid_model:
hybrid_pattern = getattr(config.model, "hybrid_layer_pattern", None) or getattr(
config.model, "hybrid_override_pattern", None
)
layers = hybrid_pattern.count("E") if hybrid_pattern else config.model.num_layers
else:
layers = config.model.num_layers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/training/utils/train_utils.py` at line 627, The current
assignment to layers assumes config.model.hybrid_layer_pattern exists; change it
to safely access the attribute (e.g., pattern = getattr(config.model,
"hybrid_layer_pattern", "") or check hasattr(config.model,
"hybrid_layer_pattern") and fall back to an empty string or the deprecated key
if present) and then compute layers = pattern.count("E") so logging won't fail
for configs still using the deprecated key.

@adityavavreNVDA
Copy link
Contributor Author

@yaoyu-33 We also need to remove num_layers and calculate it from pattern to mirror MLM https://github.com/duncanriach/Megatron-LM/blob/aca6ca3970f2eddc1774ac706df8701e8b15ddcd/megatron/training/arguments.py#L601-L616

Where do you think this goes?

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@adityavavreNVDA
Copy link
Contributor Author

/ok to test fede9aa

yaoyu-33
yaoyu-33 previously approved these changes Mar 4, 2026
@yaoyu-33
Copy link
Contributor

yaoyu-33 commented Mar 4, 2026

/ok to test 0ce7b9f

@yaoyu-33 yaoyu-33 enabled auto-merge (squash) March 4, 2026 03:17
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
dimapihtar and others added 2 commits March 4, 2026 11:11
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…6-03-04-main

Made-with: Cursor

# Conflicts:
#	uv.lock
@adityavavreNVDA adityavavreNVDA requested a review from a team as a code owner March 4, 2026 19:16
@adityavavreNVDA
Copy link
Contributor Author

/ok to test 4dcf187

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@adityavavreNVDA
Copy link
Contributor Author

/ok to test 8377697

@yaoyu-33
Copy link
Contributor

yaoyu-33 commented Mar 5, 2026

/ok to test eebd1f8

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@adityavavreNVDA
Copy link
Contributor Author

/ok to test 1055d0d

Signed-off-by: Aditya Vavre <avavre@nvidia.com>
@adityavavreNVDA
Copy link
Contributor Author

/ok to test c19f025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants