Skip to content

Store hf_pretrained as properties of Megatron*Bridge classes#2644

Open
HollowMan6 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HollowMan6:hf_pretrained
Open

Store hf_pretrained as properties of Megatron*Bridge classes#2644
HollowMan6 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HollowMan6:hf_pretrained

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Mar 4, 2026

What does this PR do ?

So that downstream model bridges that need hf_pretrained configs information to build mapping_registry no longer need to override build_conversion_tasks (e.g. GLM 4.5 bridge).

Changelog

  • Add hf_pretrained and hf_config properties to MegatronModelBridge and MegatronPeftBridge
  • Update GLM 4.5 MTP mapping

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

Release Notes

  • Refactor

    • Improved HuggingFace model context handling throughout weight conversion and adapter export paths for better consistency.
    • Simplified configuration caching logic in specialized model bridges.
  • Tests

    • Updated unit tests to reflect refined adapter and weight conversion method signatures.

@HollowMan6 HollowMan6 requested review from Copilot March 4, 2026 15:11
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make HuggingFace pretrained/config metadata available on Megatron bridge instances so downstream bridges can build mapping registries (and related conversion tasks) without overriding build_conversion_tasks, and updates GLM 4.5 MTP mapping accordingly.

Changes:

  • Thread hf_pretrained/hf_config through adapter conversion/export paths (including export_adapter_weights).
  • Remove GLM 4.5 / GLM 4.5V overrides that cached HF config/state source; switch to reading from self.hf_pretrained.
  • Update GLM 4.5 MTP mapping logic and adjust unit tests for updated method signatures.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/megatron/bridge/models/conversion/model_bridge.py Passes hf_pretrained into adapter task building; stores hf_pretrained/hf_config on bridge instances during task build/dispatch.
src/megatron/bridge/models/conversion/peft_bridge.py Updates adapter task/stream APIs to accept hf_pretrained and stores it on the bridge for downstream access.
src/megatron/bridge/models/conversion/auto_bridge.py Plumbs self.hf_pretrained into adapter export dispatch.
src/megatron/bridge/models/glm/glm45_bridge.py Removes config caching override; changes fused-expert detection and adds revised MTP mappings using bridge-held HF metadata.
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py Removes config caching override; changes fused-expert detection to use bridge-held HF metadata.
tests/unit_tests/models/test_model_bridge_lora.py Updates tests for new adapter task/stream method signatures.
tests/unit_tests/models/test_auto_bridge.py Updates export tests and adds coverage for export_adapter_weights.
tests/unit_tests/models/glm/test_glm45_bridge.py Updates mocks to provide state.source and sets hf_pretrained for mapping registry usage.
tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py Updates fixtures/mocks to provide state.source and sets hf_pretrained on the bridge.
Comments suppressed due to low confidence (3)

src/megatron/bridge/models/glm/glm45_bridge.py:232

  • Inside the for layer_prefix in ("transformer_layer", "mtp_model_layer") loop, the MTP-specific AutoMappings for enorm/hnorm/eh_proj/final_layernorm don’t depend on layer_prefix, so they get appended twice per MTP layer. Moving those fixed mappings outside the layer_prefix loop avoids redundant duplicate mappings in the registry.
            for layer_prefix in ("transformer_layer", "mtp_model_layer"):
                for megatron_param, hf_param in layer_specific_mappings.items():
                    megatron_param = (
                        megatron_param.replace(".*", f".*.{layer_prefix}")
                        .replace("decoder", "mtp")
                        .replace(".*", f".{mtp_layer}")
                    )
                    hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")
                    mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))

                # MTP specific mappings
                mapping_list.extend(
                    [
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",
                        ),
                    ]
                )

src/megatron/bridge/models/glm/glm45_bridge.py:319

  • list(self.hf_pretrained.state.source.get_all_keys()) makes an unnecessary copy (the source already returns a list and may cache it). Reusing the returned list directly (or caching once on the bridge) avoids repeated list allocations and repeated key retrieval work across _uses_fused_experts() / _hf_expert_suffix().
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

        return False

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:227

  • list(self.hf_pretrained.state.source.get_all_keys()) copies the list of keys on every call. Since get_all_keys() already returns a list (and may internally cache it), consider using it directly (or caching on the bridge) to avoid repeated allocations when mapping_registry() calls _uses_fused_experts() and _hf_expert_suffix().
    def _uses_fused_experts(self) -> bool:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

        return False

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

The PR threads the HuggingFace pretrained model context (hf_pretrained parameter) through Megatron bridge weight export and adapter weight streaming pathways, enabling bridge instances to access HF configuration during conversion. GLM-specific bridges are refactored to remove local HF config caching and access configuration via public attributes instead.

Changes

Cohort / File(s) Summary
Core Bridge Parameter Threading
src/megatron/bridge/models/conversion/auto_bridge.py, src/megatron/bridge/models/conversion/model_bridge.py, src/megatron/bridge/models/conversion/peft_bridge.py
Added hf_pretrained parameter to weight streaming and adapter conversion functions. Updated registration paths to store hf_pretrained and derive hf_config on bridge instances, threading HF context through export and adapter weight pathways.
GLM Bridge Refactoring
src/megatron/bridge/models/glm/glm45_bridge.py, src/megatron/bridge/models/glm_vl/glm_45v_bridge.py
Removed build_conversion_tasks overrides and local HF config caching. Replaced cached _hf_keys and _hf_state_source attributes with direct access to self.hf_pretrained.state.source. Refactored MTP mapping generation in GLM45Bridge to use generalized per-layer loops and simplified mappings. Updated helper methods to read from public hf_pretrained and hf_config attributes.
Test Fixture Updates
tests/unit_tests/models/glm/test_glm45_bridge.py, tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py, tests/unit_tests/models/test_auto_bridge.py, tests/unit_tests/models/test_model_bridge_lora.py
Updated test fixtures to mock HF pretrained state and added state/source initialization. Modified test signatures to accept and wire mock pretrained models. Updated test calls to pass new hf_pretrained parameter to builder and streaming methods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Run CICD

Suggested reviewers

  • yaoyu-33
  • cuichenx
  • chtruong814
🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains major breaking API changes and architectural refactoring without explicit test execution results or CI workflow confirmation in the description. Update PR description with explicit test execution results, regression testing confirmation for breaking API changes, and performance impact assessment.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: storing hf_pretrained as properties on Megatron bridge classes to enable access without overriding build_conversion_tasks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py (1)

212-232: Guard direct hf_pretrained.state/config dereferences for safer conversion paths.

Line 212/Line 226 and Line 248 assume self.hf_pretrained always exposes .state.source and .config. A config-only bridge context (or partially initialized pretrained mock/object) will raise AttributeError before any fallback logic.

Proposed defensive fix
-        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
+        hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None)
+        hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else []
@@
-        hf_source = self.hf_pretrained.state.source
         if hf_source is not None:
             return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
@@
-        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
+        hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None)
+        hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else []
@@
-        hf_source = self.hf_pretrained.state.source
         if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
             return ".weight"
@@
-        text_config = getattr(self.hf_pretrained.config, "text_config", self.hf_pretrained.config)
+        hf_config = getattr(self, "hf_config", getattr(self.hf_pretrained, "config", self.hf_pretrained))
+        text_config = getattr(hf_config, "text_config", hf_config)

Also applies to: 248-249

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Around line 197-232: The MTP-specific AutoMapping entries (the mappings that
set megatron_param starting with "mtp.layers.{mtp_layer}..." for enorm, hnorm,
eh_proj, and final_layernorm) are being appended inside the inner loop over
layer_prefix and therefore are added twice; move the block that extends
mapping_list with these prefix-independent mappings so it executes once per
mtp_layer (i.e., place the mapping_list.extend([...AutoMapping(...)...])
immediately after the inner for layer_prefix in ("transformer_layer",
"mtp_model_layer") loop ends), keeping the existing megatron_param and hf_param
strings and leaving the prefix-dependent mapping generation inside the inner
loop unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 550d18c5-62de-4f2a-b799-c822f0f559c0

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb1af6 and d13b868.

📒 Files selected for processing (9)
  • src/megatron/bridge/models/conversion/auto_bridge.py
  • src/megatron/bridge/models/conversion/model_bridge.py
  • src/megatron/bridge/models/conversion/peft_bridge.py
  • src/megatron/bridge/models/glm/glm45_bridge.py
  • src/megatron/bridge/models/glm_vl/glm_45v_bridge.py
  • tests/unit_tests/models/glm/test_glm45_bridge.py
  • tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py
  • tests/unit_tests/models/test_auto_bridge.py
  • tests/unit_tests/models/test_model_bridge_lora.py

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (5)

src/megatron/bridge/models/glm/glm45_bridge.py:200

  • mapping_registry() now assumes self.hf_config is always set. If someone calls mapping_registry() directly on a fresh bridge instance (as some tests/utilities do), this will raise an AttributeError instead of gracefully skipping MTP mappings.

Consider either (a) restoring a safe fallback (e.g., treat missing hf_config as num_mtp_layers=0) or (b) raising a clear ValueError explaining that hf_pretrained/hf_config must be set (via build_conversion_tasks(...) or assignment) before calling mapping_registry().

        # add MTP mappings
        hf_config = self.hf_config
        num_mtp_layers = getattr(hf_config, "num_nextn_predict_layers", 0)
        num_transformer_layers = hf_config.num_hidden_layers

src/megatron/bridge/models/glm/glm45_bridge.py:314

  • hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) makes an extra full copy of the key list every time _uses_fused_experts() runs. StateSource.get_all_keys() already returns a cached List[str] for common sources; copying it can be costly for large checkpoints.

Prefer using the list returned by get_all_keys() directly (no list(...)), or cache hf_keys / hf_source once on the bridge instance for reuse across _uses_fused_experts() and _hf_expert_suffix().

        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

src/megatron/bridge/models/glm/glm45_bridge.py:324

  • hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) duplicates the (potentially very large) HF key list on every _hf_expert_suffix() call. Since mapping_registry() calls _hf_expert_suffix() multiple times, this can turn into repeated O(N) copies.

Consider reusing a single hf_keys value (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the keys.

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):
            return ".weight"

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
            return ".weight"

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:222

  • hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) creates a full copy of the checkpoint key list every time _uses_fused_experts() runs. For SafeTensorsStateSource, get_all_keys() is cached but the extra list(...) copy is still O(N) and can be expensive.

Prefer using hf_source.get_all_keys() directly (no copy), or cache hf_keys once on the bridge instance for reuse across helper methods.

    def _uses_fused_experts(self) -> bool:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:233

  • hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) duplicates the full key list on each _hf_expert_suffix() call. Since mapping_registry() calls this helper multiple times, it can cause repeated O(N) copies for large checkpoints.

Consider reusing a single hf_keys (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the returned list.

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):
            return ".weight"

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
            return ".weight"


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HollowMan6 HollowMan6 force-pushed the hf_pretrained branch 2 times, most recently from 61a26de to 6a48916 Compare March 4, 2026 19:17
So that downstream model bridges that need hf_pretrained configs
information to build mapping_registry no longer need to
override build_conversion_tasks (e.g. GLM 4.5 bridge).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants