Store hf_pretrained as properties of Megatron*Bridge classes#2644
Store hf_pretrained as properties of Megatron*Bridge classes#2644HollowMan6 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to make HuggingFace pretrained/config metadata available on Megatron bridge instances so downstream bridges can build mapping registries (and related conversion tasks) without overriding build_conversion_tasks, and updates GLM 4.5 MTP mapping accordingly.
Changes:
- Thread
hf_pretrained/hf_configthrough adapter conversion/export paths (includingexport_adapter_weights). - Remove GLM 4.5 / GLM 4.5V overrides that cached HF config/state source; switch to reading from
self.hf_pretrained. - Update GLM 4.5 MTP mapping logic and adjust unit tests for updated method signatures.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/megatron/bridge/models/conversion/model_bridge.py |
Passes hf_pretrained into adapter task building; stores hf_pretrained/hf_config on bridge instances during task build/dispatch. |
src/megatron/bridge/models/conversion/peft_bridge.py |
Updates adapter task/stream APIs to accept hf_pretrained and stores it on the bridge for downstream access. |
src/megatron/bridge/models/conversion/auto_bridge.py |
Plumbs self.hf_pretrained into adapter export dispatch. |
src/megatron/bridge/models/glm/glm45_bridge.py |
Removes config caching override; changes fused-expert detection and adds revised MTP mappings using bridge-held HF metadata. |
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py |
Removes config caching override; changes fused-expert detection to use bridge-held HF metadata. |
tests/unit_tests/models/test_model_bridge_lora.py |
Updates tests for new adapter task/stream method signatures. |
tests/unit_tests/models/test_auto_bridge.py |
Updates export tests and adds coverage for export_adapter_weights. |
tests/unit_tests/models/glm/test_glm45_bridge.py |
Updates mocks to provide state.source and sets hf_pretrained for mapping registry usage. |
tests/unit_tests/models/glm_vl/test_glm_45v_bridge.py |
Updates fixtures/mocks to provide state.source and sets hf_pretrained on the bridge. |
Comments suppressed due to low confidence (3)
src/megatron/bridge/models/glm/glm45_bridge.py:232
- Inside the
for layer_prefix in ("transformer_layer", "mtp_model_layer")loop, the MTP-specificAutoMappings forenorm/hnorm/eh_proj/final_layernormdon’t depend onlayer_prefix, so they get appended twice per MTP layer. Moving those fixed mappings outside thelayer_prefixloop avoids redundant duplicate mappings in the registry.
for layer_prefix in ("transformer_layer", "mtp_model_layer"):
for megatron_param, hf_param in layer_specific_mappings.items():
megatron_param = (
megatron_param.replace(".*", f".*.{layer_prefix}")
.replace("decoder", "mtp")
.replace(".*", f".{mtp_layer}")
)
hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")
mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))
# MTP specific mappings
mapping_list.extend(
[
AutoMapping(
megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",
hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",
),
AutoMapping(
megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",
hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",
),
AutoMapping(
megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",
hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",
),
AutoMapping(
megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",
hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",
),
]
)
src/megatron/bridge/models/glm/glm45_bridge.py:319
list(self.hf_pretrained.state.source.get_all_keys())makes an unnecessary copy (the source already returns a list and may cache it). Reusing the returned list directly (or caching once on the bridge) avoids repeated list allocations and repeated key retrieval work across_uses_fused_experts()/_hf_expert_suffix().
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if hf_keys:
if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
"mlp.experts.down_proj" in key for key in hf_keys
):
return True
hf_source = self.hf_pretrained.state.source
if hf_source is not None:
return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
return False
def _hf_expert_suffix(self, base_name: str) -> str:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if any(f"{base_name}.weight" in key for key in hf_keys):
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:227
list(self.hf_pretrained.state.source.get_all_keys())copies the list of keys on every call. Sinceget_all_keys()already returns a list (and may internally cache it), consider using it directly (or caching on the bridge) to avoid repeated allocations whenmapping_registry()calls_uses_fused_experts()and_hf_expert_suffix().
def _uses_fused_experts(self) -> bool:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if hf_keys:
if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
"mlp.experts.down_proj" in key for key in hf_keys
):
return True
hf_source = self.hf_pretrained.state.source
if hf_source is not None:
return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
return False
def _hf_expert_suffix(self, base_name: str) -> str:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if any(f"{base_name}.weight" in key for key in hf_keys):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📝 WalkthroughWalkthroughThe PR threads the HuggingFace pretrained model context ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py (1)
212-232: Guard directhf_pretrained.state/configdereferences for safer conversion paths.Line 212/Line 226 and Line 248 assume
self.hf_pretrainedalways exposes.state.sourceand.config. A config-only bridge context (or partially initialized pretrained mock/object) will raiseAttributeErrorbefore any fallback logic.Proposed defensive fix
- hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) + hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None) + hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else [] @@ - hf_source = self.hf_pretrained.state.source if hf_source is not None: return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*") @@ - hf_keys = list(self.hf_pretrained.state.source.get_all_keys()) + hf_source = getattr(getattr(self.hf_pretrained, "state", None), "source", None) + hf_keys = list(hf_source.get_all_keys()) if hf_source is not None else [] @@ - hf_source = self.hf_pretrained.state.source if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"): return ".weight" @@ - text_config = getattr(self.hf_pretrained.config, "text_config", self.hf_pretrained.config) + hf_config = getattr(self, "hf_config", getattr(self.hf_pretrained, "config", self.hf_pretrained)) + text_config = getattr(hf_config, "text_config", hf_config)Also applies to: 248-249
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Around line 197-232: The MTP-specific AutoMapping entries (the mappings that
set megatron_param starting with "mtp.layers.{mtp_layer}..." for enorm, hnorm,
eh_proj, and final_layernorm) are being appended inside the inner loop over
layer_prefix and therefore are added twice; move the block that extends
mapping_list with these prefix-independent mappings so it executes once per
mtp_layer (i.e., place the mapping_list.extend([...AutoMapping(...)...])
immediately after the inner for layer_prefix in ("transformer_layer",
"mtp_model_layer") loop ends), keeping the existing megatron_param and hf_param
strings and leaving the prefix-dependent mapping generation inside the inner
loop unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 550d18c5-62de-4f2a-b799-c822f0f559c0
📒 Files selected for processing (9)
src/megatron/bridge/models/conversion/auto_bridge.pysrc/megatron/bridge/models/conversion/model_bridge.pysrc/megatron/bridge/models/conversion/peft_bridge.pysrc/megatron/bridge/models/glm/glm45_bridge.pysrc/megatron/bridge/models/glm_vl/glm_45v_bridge.pytests/unit_tests/models/glm/test_glm45_bridge.pytests/unit_tests/models/glm_vl/test_glm_45v_bridge.pytests/unit_tests/models/test_auto_bridge.pytests/unit_tests/models/test_model_bridge_lora.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (5)
src/megatron/bridge/models/glm/glm45_bridge.py:200
mapping_registry()now assumesself.hf_configis always set. If someone callsmapping_registry()directly on a fresh bridge instance (as some tests/utilities do), this will raise anAttributeErrorinstead of gracefully skipping MTP mappings.
Consider either (a) restoring a safe fallback (e.g., treat missing hf_config as num_mtp_layers=0) or (b) raising a clear ValueError explaining that hf_pretrained/hf_config must be set (via build_conversion_tasks(...) or assignment) before calling mapping_registry().
# add MTP mappings
hf_config = self.hf_config
num_mtp_layers = getattr(hf_config, "num_nextn_predict_layers", 0)
num_transformer_layers = hf_config.num_hidden_layers
src/megatron/bridge/models/glm/glm45_bridge.py:314
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())makes an extra full copy of the key list every time_uses_fused_experts()runs.StateSource.get_all_keys()already returns a cachedList[str]for common sources; copying it can be costly for large checkpoints.
Prefer using the list returned by get_all_keys() directly (no list(...)), or cache hf_keys / hf_source once on the bridge instance for reuse across _uses_fused_experts() and _hf_expert_suffix().
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if hf_keys:
if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
"mlp.experts.down_proj" in key for key in hf_keys
):
return True
hf_source = self.hf_pretrained.state.source
if hf_source is not None:
return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
src/megatron/bridge/models/glm/glm45_bridge.py:324
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())duplicates the (potentially very large) HF key list on every_hf_expert_suffix()call. Sincemapping_registry()calls_hf_expert_suffix()multiple times, this can turn into repeated O(N) copies.
Consider reusing a single hf_keys value (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the keys.
def _hf_expert_suffix(self, base_name: str) -> str:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if any(f"{base_name}.weight" in key for key in hf_keys):
return ".weight"
hf_source = self.hf_pretrained.state.source
if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
return ".weight"
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:222
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())creates a full copy of the checkpoint key list every time_uses_fused_experts()runs. ForSafeTensorsStateSource,get_all_keys()is cached but the extralist(...)copy is still O(N) and can be expensive.
Prefer using hf_source.get_all_keys() directly (no copy), or cache hf_keys once on the bridge instance for reuse across helper methods.
def _uses_fused_experts(self) -> bool:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if hf_keys:
if any("mlp.experts.gate_up_proj" in key for key in hf_keys) or any(
"mlp.experts.down_proj" in key for key in hf_keys
):
return True
hf_source = self.hf_pretrained.state.source
if hf_source is not None:
return hf_source.has_glob("*mlp.experts.gate_up_proj*") or hf_source.has_glob("*mlp.experts.down_proj*")
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py:233
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())duplicates the full key list on each_hf_expert_suffix()call. Sincemapping_registry()calls this helper multiple times, it can cause repeated O(N) copies for large checkpoints.
Consider reusing a single hf_keys (or caching it on self) during mapping construction, and avoid list(...) unless you intend to mutate the returned list.
def _hf_expert_suffix(self, base_name: str) -> str:
hf_keys = list(self.hf_pretrained.state.source.get_all_keys())
if any(f"{base_name}.weight" in key for key in hf_keys):
return ".weight"
hf_source = self.hf_pretrained.state.source
if hf_source is not None and hf_source.has_glob(f"*{base_name}.weight"):
return ".weight"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
61a26de to
6a48916
Compare
So that downstream model bridges that need hf_pretrained configs information to build mapping_registry no longer need to override build_conversion_tasks (e.g. GLM 4.5 bridge). Signed-off-by: Hollow Man <hollowman@opensuse.org>
6a48916 to
0477f72
Compare
What does this PR do ?
So that downstream model bridges that need hf_pretrained configs information to build mapping_registry no longer need to override build_conversion_tasks (e.g. GLM 4.5 bridge).
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
Release Notes
Refactor
Tests