Store hf_pretrained as properties of Megatron*Bridge classes by HollowMan6 · Pull Request #2644 · NVIDIA-NeMo/Megatron-Bridge

HollowMan6 · 2026-03-04T15:11:38Z

What does this PR do ?

So that downstream model bridges that need hf_pretrained configs information to build mapping_registry no longer need to override build_conversion_tasks (e.g. GLM 4.5 bridge).

Changelog

Add hf_pretrained and hf_config properties to MegatronModelBridge and MegatronPeftBridge
Update GLM 4.5 MTP mapping

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Release Notes

Refactor
- Improved HuggingFace model context handling throughout weight conversion and adapter export paths for better consistency.
- Simplified configuration caching logic in specialized model bridges.
Tests
- Updated unit tests to reflect refined adapter and weight conversion method signatures.

copy-pr-bot · 2026-03-04T15:11:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

This PR aims to make HuggingFace pretrained/config metadata available on Megatron bridge instances so downstream bridges can build mapping registries (and related conversion tasks) without overriding build_conversion_tasks, and updates GLM 4.5 MTP mapping accordingly.

Changes:

Thread hf_pretrained/hf_config through adapter conversion/export paths (including export_adapter_weights).
Remove GLM 4.5 / GLM 4.5V overrides that cached HF config/state source; switch to reading from self.hf_pretrained.
Update GLM 4.5 MTP mapping logic and adjust unit tests for updated method signatures.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/megatron/bridge/models/conversion/model_bridge.py`	Passes `hf_pretrained` into adapter task building; stores `hf_pretrained`/`hf_config` on bridge instances during task build/dispatch.
`src/megatron/bridge/models/conversion/peft_bridge.py`	Updates adapter task/stream APIs to accept `hf_pretrained` and stores it on the bridge for downstream access.
`src/megatron/bridge/models/conversion/auto_bridge.py`	Plumbs `self.hf_pretrained` into adapter export dispatch.
`src/megatron/bridge/models/glm/glm45_bridge.py`	Removes config caching override; changes fused-expert detection and adds revised MTP mappings using bridge-held HF metadata.
`src/megatron/bridge/models/glm_vl/glm_45v_bridge.py`	Removes config caching override; changes fused-expert detection to use bridge-held HF metadata.
`tests/unit_tests/models/test_model_bridge_lora.py`	Updates tests for new adapter task/stream method signatures.
`tests/unit_tests/models/test_auto_bridge.py`	Updates export tests and adds coverage for `export_adapter_weights`.
`tests/unit_tests/models/glm/test_glm45_bridge.py`	Updates mocks to provide `state.source` and sets `hf_pretrained` for mapping registry usage.
`tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py`	Updates fixtures/mocks to provide `state.source` and sets `hf_pretrained` on the bridge.

Comments suppressed due to low confidence (3)

src/megatron/bridge/models/glm/glm45_bridge.py:232

Inside the for layer_prefix in ("transformer_layer", "mtp_model_layer") loop, the MTP-specific AutoMappings for enorm/hnorm/eh_proj/final_layernorm don’t depend on layer_prefix, so they get appended twice per MTP layer. Moving those fixed mappings outside the layer_prefix loop avoids redundant duplicate mappings in the registry.

            for layer_prefix in ("transformer_layer", "mtp_model_layer"):
                for megatron_param, hf_param in layer_specific_mappings.items():
                    megatron_param = (
                        megatron_param.replace(".*", f".*.{layer_prefix}")
                        .replace("decoder", "mtp")
                        .replace(".*", f".{mtp_layer}")
                    )
                    hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")
                    mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))

                # MTP specific mappings
                mapping_list.extend(
                    [
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",
                        ),
                        AutoMapping(
                            megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",
                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",
                        ),
                    ]
                )

src/megatron/bridge/models/glm/glm45_bridge.py:319

list(self.hf_pretrained.state.source.get_all_keys()) makes an unnecessary copy (the source already returns a list and may cache it). Reusing the returned list directly (or caching once on the bridge) avoids repeated list allocations and repeated key retrieval work across _uses_fused_experts() / _hf_expert_suffix().

        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

        return False

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:227

list(self.hf_pretrained.state.source.get_all_keys()) copies the list of keys on every call. Since get_all_keys() already returns a list (and may internally cache it), consider using it directly (or caching on the bridge) to avoid repeated allocations when mapping_registry() calls _uses_fused_experts() and _hf_expert_suffix().

    def _uses_fused_experts(self) -> bool:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

        return False

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/megatron/bridge/models/glm/glm45_bridge.py

tests/unit_tests/models/glm/test_glm45_bridge.py

coderabbitai · 2026-03-04T15:21:16Z

📝 Walkthrough

Walkthrough

The PR threads the HuggingFace pretrained model context (hf_pretrained parameter) through Megatron bridge weight export and adapter weight streaming pathways, enabling bridge instances to access HF configuration during conversion. GLM-specific bridges are refactored to remove local HF config caching and access configuration via public attributes instead.

Changes

Cohort / File(s)	Summary
Core Bridge Parameter Threading `src/megatron/bridge/models/conversion/auto_bridge.py`, `src/megatron/bridge/models/conversion/model_bridge.py`, `src/megatron/bridge/models/conversion/peft_bridge.py`	Added `hf_pretrained` parameter to weight streaming and adapter conversion functions. Updated registration paths to store `hf_pretrained` and derive `hf_config` on bridge instances, threading HF context through export and adapter weight pathways.
GLM Bridge Refactoring `src/megatron/bridge/models/glm/glm45_bridge.py`, `src/megatron/bridge/models/glm_vl/glm_45v_bridge.py`	Removed `build_conversion_tasks` overrides and local HF config caching. Replaced cached `_hf_keys` and `_hf_state_source` attributes with direct access to `self.hf_pretrained.state.source`. Refactored MTP mapping generation in GLM45Bridge to use generalized per-layer loops and simplified mappings. Updated helper methods to read from public `hf_pretrained` and `hf_config` attributes.
Test Fixture Updates `tests/unit_tests/models/glm/test_glm45_bridge.py`, `tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py`, `tests/unit_tests/models/test_auto_bridge.py`, `tests/unit_tests/models/test_model_bridge_lora.py`	Updated test fixtures to mock HF pretrained state and added state/source initialization. Modified test signatures to accept and wire mock pretrained models. Updated test calls to pass new `hf_pretrained` parameter to builder and streaming methods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for Nemotron models #2225 — Modifies bridge conversion paths to thread HuggingFace pretrained context and store it on bridge instances via hf_pretrained and hf_config.
Attach hf_config metadata to AutoBridge #1890 — Threads HF model context into bridge instances and streaming APIs by adding hf_config parameter and assigning bridge.hf_config.

Suggested labels

Run CICD

Suggested reviewers

yaoyu-33
cuichenx
chtruong814

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR contains major breaking API changes and architectural refactoring without explicit test execution results or CI workflow confirmation in the description.	Update PR description with explicit test execution results, regression testing confirmation for breaking API changes, and performance impact assessment.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: storing hf_pretrained as properties on Megatron bridge classes to enable access without overriding build_conversion_tasks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py (1)

212-232: Guard direct hf_pretrained.state/config dereferences for safer conversion paths.

Line 212/Line 226 and Line 248 assume self.hf_pretrained always exposes .state.source and .config. A config-only bridge context (or partially initialized pretrained mock/object) will raise AttributeError before any fallback logic.

Proposed defensive fix

-        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
+        hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None)
+        hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else []
@@
-        hf_source = self.hf_pretrained.state.source
         if hf_source is not None:
             return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
@@
-        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
+        hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None)
+        hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else []
@@
-        hf_source = self.hf_pretrained.state.source
         if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
             return ".weight"
@@
-        text_config = getattr(self.hf_pretrained.config, "text_config", self.hf_pretrained.config)
+        hf_config = getattr(self, "hf_config", getattr(self.hf_pretrained, "config", self.hf_pretrained))
+        text_config = getattr(hf_config, "text_config", hf_config)

Also applies to: 248-249

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Around line 197-232: The MTP-specific AutoMapping entries (the mappings that
set megatron_param starting with "mtp.layers.{mtp_layer}..." for enorm, hnorm,
eh_proj, and final_layernorm) are being appended inside the inner loop over
layer_prefix and therefore are added twice; move the block that extends
mapping_list with these prefix-independent mappings so it executes once per
mtp_layer (i.e., place the mapping_list.extend([...AutoMapping(...)...])
immediately after the inner for layer_prefix in ("transformer_layer",
"mtp_model_layer") loop ends), keeping the existing megatron_param and hf_param
strings and leaving the prefix-dependent mapping generation inside the inner
loop unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 550d18c5-62de-4f2a-b799-c822f0f559c0

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb1af6 and d13b868.

📒 Files selected for processing (9)

src/megatron/bridge/models/conversion/auto_bridge.py
src/megatron/bridge/models/conversion/model_bridge.py
src/megatron/bridge/models/conversion/peft_bridge.py
src/megatron/bridge/models/glm/glm45_bridge.py
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py
tests/unit_tests/models/glm/test_glm45_bridge.py
tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py
tests/unit_tests/models/test_auto_bridge.py
tests/unit_tests/models/test_model_bridge_lora.py

src/megatron/bridge/models/glm/glm45_bridge.py

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (5)

src/megatron/bridge/models/glm/glm45_bridge.py:200

mapping_registry() now assumes self.hf_config is always set. If someone calls mapping_registry() directly on a fresh bridge instance (as some tests/utilities do), this will raise an AttributeError instead of gracefully skipping MTP mappings.

Consider either (a) restoring a safe fallback (e.g., treat missing hf_config as num_mtp_layers=0) or (b) raising a clear ValueError explaining that hf_pretrained/hf_config must be set (via build_conversion_tasks(...) or assignment) before calling mapping_registry().

        # add MTP mappings
        hf_config = self.hf_config
        num_mtp_layers = getattr(hf_config, "num_nextn_predict_layers", 0)
        num_transformer_layers = hf_config.num_hidden_layers

src/megatron/bridge/models/glm/glm45_bridge.py:314

hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) makes an extra full copy of the key list every time _uses_fused_experts() runs. StateSource.get_all_keys() already returns a cached List[str] for common sources; copying it can be costly for large checkpoints.

Prefer using the list returned by get_all_keys() directly (no list(...)), or cache hf_keys / hf_source once on the bridge instance for reuse across _uses_fused_experts() and _hf_expert_suffix().

        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

src/megatron/bridge/models/glm/glm45_bridge.py:324

hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) duplicates the (potentially very large) HF key list on every _hf_expert_suffix() call. Since mapping_registry() calls _hf_expert_suffix() multiple times, this can turn into repeated O(N) copies.

Consider reusing a single hf_keys value (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the keys.

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):
            return ".weight"

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
            return ".weight"

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:222

hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) creates a full copy of the checkpoint key list every time _uses_fused_experts() runs. For SafeTensorsStateSource, get_all_keys() is cached but the extra list(...) copy is still O(N) and can be expensive.

Prefer using hf_source.get_all_keys() directly (no copy), or cache hf_keys once on the bridge instance for reuse across helper methods.

    def _uses_fused_experts(self) -> bool:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if hf_keys:
            if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
                "mlp.experts.down_proj" in key for key in hf_keys
            ):
                return True

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None:
            return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:233

hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) duplicates the full key list on each _hf_expert_suffix() call. Since mapping_registry() calls this helper multiple times, it can cause repeated O(N) copies for large checkpoints.

Consider reusing a single hf_keys (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the returned list.

    def _hf_expert_suffix(self, base_name: str) -> str:
        hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
        if any(f"{base_name}.weight" in key for key in hf_keys):
            return ".weight"

        hf_source = self.hf_pretrained.state.source
        if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
            return ".weight"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

So that downstream model bridges that need hf_pretrained configs information to build mapping_registry no longer need to override build_conversion_tasks (e.g. GLM 4.5 bridge). Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 requested review from Copilot March 4, 2026 15:11

Copilot started reviewing on behalf of HollowMan6 March 4, 2026 15:12 View session

github-actions bot added the community-request label Mar 4, 2026

HollowMan6 mentioned this pull request Mar 4, 2026

Add GLM5 support #2469

Open

5 tasks

Copilot AI reviewed Mar 4, 2026

View reviewed changes

src/megatron/bridge/models/glm/glm45_bridge.py Show resolved Hide resolved

tests/unit_tests/models/glm/test_glm45_bridge.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

src/megatron/bridge/models/glm/glm45_bridge.py Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of HollowMan6 March 4, 2026 15:22 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

HollowMan6 force-pushed the hf_pretrained branch 2 times, most recently from 61a26de to 6a48916 Compare March 4, 2026 19:17

HollowMan6 force-pushed the hf_pretrained branch from 6a48916 to 0477f72 Compare March 4, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store hf_pretrained as properties of Megatron*Bridge classes#2644

Store hf_pretrained as properties of Megatron*Bridge classes#2644
HollowMan6 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HollowMan6:hf_pretrained

HollowMan6 commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Mar 4, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HollowMan6 commented Mar 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Mar 4, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HollowMan6 commented Mar 4, 2026 •

edited by coderabbitai bot

Loading