Add GLM5 support by pengdurice · Pull Request #2469 · NVIDIA-NeMo/Megatron-Bridge

pengdurice · 2026-02-20T19:10:49Z

What does this PR do ?

This PR adds support of GLM5 Model.

Changelog

Add GLM5 model's model bridge and model provider classes
Because GLM5 model is only available from 5.2.0 of transformers and Megatron-Bridge is not migrated to transformer 5 yet. So we need to make some additional changes to make it work without breaking backward compatibility. In the new transformers, the experts are fused into one 3D tensor where the 0th dim is the experts. That breaks the 1-1 mapping between hf and megatron tensors. So we need to override the maybe_modify_loaded_hf_weight and maybe_modify_converted_hf_weight functions so that we slice the HF tensor to convert to megatron tensors and concat the megatron tensors to HF tensor.
please locally update your transformers' version to 5.2.0 before using this feature.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to [# (issue)]([Feature]GLM-5 support? #2343)

Summary by CodeRabbit

New Features
- Added GLM-5 MoE model support for converting between HuggingFace and Megatron-Core formats, including provider/config translation and export/import hooks for MoE weights.
- Exposed GLM-5 bridge and model provider in the public API.
Bug Fixes / Improvements
- Improved handling of per-expert (MoE) weights for import/export and added logic to tolerate certain HuggingFace parameter name patterns for gated MLPs.
Tests
- Added functional test suite validating GLM-5 MoE conversion workflows across parallelism configurations.

copy-pr-bot · 2026-02-20T19:10:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HollowMan6

Probably we can also integrate the tilelang fused kernels for DSA, (https://github.com/THUDM/slime/tree/14413cf7c5c0d1c9854a28f471d3ab575461aa51/slime_plugins/models/glm5) to accelerate the training speed.

pengdurice · 2026-02-23T23:39:58Z

Probably we can also integrate the tilelang fused kernels for DSA, (https://github.com/THUDM/slime/tree/14413cf7c5c0d1c9854a28f471d3ab575461aa51/slime_plugins/models/glm5) to accelerate the training speed.

that's a great idea, let me add it in the next PR;-)

pengdurice · 2026-02-24T18:34:55Z

@HollowMan6 would you mind reviewing this PR when you get a chance? thanks! edit: rebasing causing some 3rd party module changes being wrongly picked up, will need to fix them first, converting back to draft

coderabbitai · 2026-02-24T18:41:34Z

📝 Walkthrough

Walkthrough

Adds GLM-5 MoE import/export support: a GLM5Bridge and GLM5ModelProvider, MoE-aware weight slicing/packing and mapping logic, updates to gated MLP mapping, functional tests for GLM-5 conversion, and a Megatron-LM submodule pointer update.

Changes

Cohort / File(s)	Summary
Submodule Update `3rdparty/Megatron-LM`	Updated submodule pointer to a newer commit for GLM-5 MoE compatibility.
Bridge & Provider `src/megatron/bridge/models/glm_moe_dsa/__init__.py`, `src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py`, `src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py`, `src/megatron/bridge/models/__init__.py`	Adds GLM5Bridge (bridge registration, config translation, mapping registry, MoE-aware weight hooks) and GLM5ModelProvider (dataclass with DSA/MoE defaults); exports added to package init.
Weight Mapping & Conversion `src/megatron/bridge/models/conversion/param_mapping.py`, `src/megatron/bridge/models/glm/glm45_bridge.py`	GatedMLPMapping: new helper to detect HF expert-name patterns and flag allow_hf_name_mismatch. GLM45Bridge: new methods to slice per-expert HF weights on load and pack per-expert weights for HF export, plus related helpers and caching.
Tests `tests/functional_tests/models/glm_moe_dsa/__init__.py`, `tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py`	Adds functional tests that build a toy GLM-5 MoE model, run roundtrip conversions across TP/PP/EP combinations, and assert presence/structure of MoE config and weight artifacts.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Bridge as GLM5Bridge
    participant Config as ConfigBuilder
    participant Registry as MappingRegistry
    participant Weights as WeightHandler
    participant Target as MegatronModel

    User->>Bridge: start HF→Megatron conversion
    Bridge->>Config: _get_glm5_configs(hf_pretrained)
    Config-->>Bridge: provider_config
    Bridge->>Bridge: provider_bridge() -> GLM5ModelProvider
    Bridge->>Registry: mapping_registry()
    Registry-->>Bridge: param mappings
    Bridge->>Weights: maybe_modify_loaded_hf_weight(hf_param, hf_state_dict)
    Weights-->>Bridge: sliced per-expert tensors
    Bridge->>Target: apply mapped weights and provider config
    Target-->>User: converted Megatron-Core model

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

chore: Change submodule pointer for release #2191: Updates the same 3rdparty/Megatron-LM submodule pointer.
Nemotron-3-Nano Model Support #1914: Also extends megatron.bridge.models exports / provider/bridge surface.

Suggested labels

Run CICD

Suggested reviewers

erhoo82
malay-nagda
yaoyu-33

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR adds major GLM5 MoE support but lacks documented test results, numeric validation, or convergence verification despite complex weight conversion logic.	Document and include test execution results, numeric validation, and convergence characteristics in the PR description to verify the changes work correctly without regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title 'Add GLM5 support' accurately and concisely summarizes the main objective of the pull request, which is to introduce GLM5 model support with a new bridge and provider.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (8)

tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py-331-332 (1)
331-332: ⚠️ Potential issue | 🟡 Minor

Commented-out # try: is a leftover debug artifact.

As per coding guidelines: "If code is commented out, include a comment describing its usage and why it is commented out; otherwise remove it as debug code before merging."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` around
lines 331 - 332, Remove the leftover commented-out debug token "# try:" above
the subprocess.run call in test_glm5_conversion.py (the block where result =
subprocess.run(cmd, capture_output=True, text=True, cwd=repo_root) is invoked);
either delete the commented marker entirely or replace it with a proper
try/except around subprocess.run (handling exceptions and failing the test) if
you intended to catch errors — ensure the change updates the surrounding test
logic (the test function that invokes subprocess.run) consistently and does not
leave unexplained commented-out code.
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py-194-197 (1)
194-197: ⚠️ Potential issue | 🟡 Minor

Docstring mentions "GLM 4.5" but this is the GLM 5 bridge test class.
🐛 Proposed fix
-    """
-    Test GLM 4.5 MoE model conversion from local HuggingFace model with different parallelism configurations.
-    """
+    """Test GLM 5 MoE model conversion from local HuggingFace model with different parallelism configurations."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` around
lines 194 - 197, The class docstring for TestGLM5Conversion incorrectly
references "GLM 4.5"; update the docstring in the TestGLM5Conversion class to
mention "GLM 5" (or a neutral phrase like "GLM 5 bridge") so it accurately
describes the test coverage for the GLM 5 MoE model conversion; locate the
TestGLM5Conversion class and edit its triple-quoted docstring accordingly.
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py-335-338 (1)
335-338: ⚠️ Potential issue | 🟡 Minor

assert False is stripped under python -O; use raise AssertionError (B011).

This pattern applies to both lines 338 and 434.
🐛 Proposed fix (applies similarly to line 434)
-            assert False, f"GLM 5 MoE {test_name} conversion failed with return code {result.returncode}"
+            raise AssertionError(f"GLM 5 MoE {test_name} conversion failed with return code {result.returncode}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` around
lines 335 - 338, Replace the use of "assert False" which is ignored under
optimized Python with explicit raises: in the failure branches that inspect
result.returncode (using the result variable and test_name) change the "assert
False, f'GLM 5 MoE {test_name} conversion failed with return code
{result.returncode}'" to "raise AssertionError(...)" so the failure always
raises regardless of optimization; make the same change for the similar
assertion at the other occurrence later in the file (the second block around
line 434).
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py-293-293 (1)
293-293: ⚠️ Potential issue | 🟡 Minor

Line exceeds the 119-character maximum.

The print(...) at line 293 is a very long line and also contains debug output that should be removed before merging.

As per coding guidelines: "Maximum line length is 119 characters (matching ruff configuration)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` at line
293, Remove the long debug print statement that exceeds 119 chars: delete the
print(f"second_layer mlp experts: {second_layer.mlp.experts} and type:
{type(second_layer.mlp.experts)} and size: {total_size} and shapes:
{total_shapes}") in
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py; if you need
to keep information for debugging, replace it with a concise logger.debug or a
focused assertion that checks specific properties of second_layer.mlp.experts,
total_size, or total_shapes and keep any kept message well under 119 characters.
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py-404-404 (1)
404-404: ⚠️ Potential issue | 🟡 Minor

Unused method argument tmp_path (ARG002).

test_glm5_conversion_parallelism_local_model declares tmp_path but never uses it; all output goes under local_root_path.
♻️ Proposed fix
-    def test_glm5_conversion_parallelism_local_model(self, tmp_path, tp, pp, ep, test_name):
+    def test_glm5_conversion_parallelism_local_model(self, tp, pp, ep, test_name):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` at line
404, The test function test_glm5_conversion_parallelism_local_model declares an
unused pytest fixture parameter tmp_path; either remove tmp_path from the
function signature or rename it to _tmp_path to silence the unused-argument
warning. Update the signature of test_glm5_conversion_parallelism_local_model
accordingly (and any direct references if present) so the test uses only
local_root_path, leaving other parameters tp, pp, ep, test_name unchanged.
src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py-108-142 (1)
108-142: ⚠️ Potential issue | 🟡 Minor

make_vocab_size_divisible_by is set twice in provider_bridge.

Line 116 sets it in the configs dict, and line 141 sets it again as a direct attribute on the constructed provider. The second assignment is redundant.
🐛 Proposed fix
-        configs["make_vocab_size_divisible_by"] = 1280
         configs["moe_router_score_function"] = "sigmoid"
         ...
-        provider.make_vocab_size_divisible_by = 1280
Keep only the constructor-time assignment (configs["make_vocab_size_divisible_by"] = 1280) since the provider is constructed at line 122 with **configs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 108 -
142, The provider_bridge function sets make_vocab_size_divisible_by twice
(configs["make_vocab_size_divisible_by"] and
provider.make_vocab_size_divisible_by) which is redundant; remove the later
direct attribute assignment on the provider (the
provider.make_vocab_size_divisible_by = 1280 line) and keep the constructor-time
setting in configs so GLM5ModelProvider(**configs) receives the value once.
src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py-51-58 (1)
51-58: ⚠️ Potential issue | 🟡 Minor

Docstring contains two inaccuracies.

Line 52: "Glm5MoeForCausalLM" — the registered source class (line 47) is GlmMoeDsaForCausalLM.

Line 57: The example shows zai-org/GLM-4.5 instead of a GLM-5 model path.
🐛 Proposed fix
-    This bridge handles the conversion between HuggingFace Glm5MoeForCausalLM
-    (used for GLM 5 models) and Megatron-Core GPTModel formats.
+    This bridge handles the conversion between HuggingFace GlmMoeDsaForCausalLM
+    (used for GLM 5 models) and Megatron-Core GPTModel formats.

     Example:
         >>> from megatron.bridge import AutoBridge
-        >>> bridge = AutoBridge.from_hf_pretrained("zai-org/GLM-4.5")
+        >>> bridge = AutoBridge.from_hf_pretrained("zai-org/GLM-5")
         >>> provider = bridge.to_megatron_provider()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 51 - 58,
Update the docstring to correct two inaccuracies: replace the incorrect source
class name "Glm5MoeForCausalLM" with the registered class name
GlmMoeDsaForCausalLM, and change the example model path from "zai-org/GLM-4.5"
to a GLM-5 example such as "zai-org/GLM-5" in the AutoBridge.from_hf_pretrained
usage so the example and description correctly reflect GLM-5 and the
GlmMoeDsaForCausalLM source class (references: GlmMoeDsaForCausalLM,
AutoBridge.from_hf_pretrained, to_megatron_provider).
src/megatron/bridge/models/glm/glm45_bridge.py-326-326 (1)
326-326: ⚠️ Potential issue | 🟡 Minor

Import WeightConversionTask directly instead of using a forward reference string.

Line 326 uses a string forward reference task: "WeightConversionTask", but this class should be imported directly from megatron.bridge.models.conversion.model_bridge like other bridge implementations (gpt_oss_bridge.py, deepseek_v3_bridge.py, qwen3_vl_bridge.py). Update the import on line 26 to include WeightConversionTask and remove the quotes from the type hint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm/glm45_bridge.py` at line 326, Replace the
string-forward type hint for WeightConversionTask with a direct class import:
add WeightConversionTask to the module imports (import it from
megatron.bridge.models.conversion.model_bridge alongside the other bridge
imports) and change the annotated field from task: "WeightConversionTask" to
task: WeightConversionTask so the type is a real symbol rather than a
forward-reference string.

🧹 Nitpick comments (2)

src/megatron/bridge/models/glm_moe_dsa/__init__.py (1)
21-24: Sort __all__ alphabetically.

Ruff (RUF022) flags the unsorted order; "GLM5Bridge" should precede "GLM5ModelProvider".
♻️ Proposed fix
 __all__ = [
-    "GLM5ModelProvider",
     "GLM5Bridge",
+    "GLM5ModelProvider",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm_moe_dsa/__init__.py` around lines 21 - 24, The
__all__ list is not alphabetized which triggers RUF022; update the list in
__init__.py so entries are sorted alphabetically, placing "GLM5Bridge" before
"GLM5ModelProvider" (i.e., change the order of the symbols GLM5Bridge and
GLM5ModelProvider in the __all__ declaration).
src/megatron/bridge/models/glm/glm45_bridge.py (1)
18-18: Replace deprecated typing generics with Python 3.10+ built-ins.

Dict, Optional, and Tuple should be replaced with their built-in equivalents; Mapping should come from collections.abc.
♻️ Proposed fix
-from typing import Dict, Mapping, Optional, Tuple
+from collections.abc import Mapping
Then update all usages in the changed lines:

Dict[str, torch.Tensor] → dict[str, torch.Tensor]

Optional[...] → ... | None

Tuple[str, int, Optional[str]] → tuple[str, int, str | None]
As per coding guidelines: "Use built-in generics (list, dict, tuple) instead of typing equivalents", "Use 'T | None' for nullable types instead of 'Optional[T]'".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm/glm45_bridge.py` at line 18, The import and
type annotations in glm45_bridge.py use deprecated typing generics; replace
"from typing import Dict, Mapping, Optional, Tuple" with "from collections.abc
import Mapping" (only) and use built-in generics and PEP 604 unions in
annotations: change Dict[...] → dict[...], Tuple[...] → tuple[...], Optional[T]
→ T | None, and ensure any Mapping stays typed as collections.abc.Mapping;
update all occurrences such as Dict[str, torch.Tensor] → dict[str,
torch.Tensor], Optional[...] → ... | None, and Tuple[str, int, Optional[str]] →
tuple[str, int, str | None] (adjust other similar annotations accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/conversion/param_mapping.py`:
- Around line 1985-1995: Update the _should_allow_hf_name_mismatch signature to
use built-in generics and PEP 604 union syntax: change the parameter annotation
from Union[str, Dict[str, str]] to str | dict[str, str] and add the return type
bool; add type hints to the nested helpers has_expert_wildcard(param_name: str)
-> bool and has_resolved_expert(param_name: str) -> bool so all functions in the
_should_allow_hf_name_mismatch scope have explicit parameter and return type
annotations (keep logic unchanged).

In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py`:
- Around line 385-397: Add an explicit __init__ to GLM5Bridge that initializes
the instance attributes _hf_expert_cache, _hf_expert_expected and _hf_config
(e.g., to empty dicts or None as appropriate) so their lifecycle is
deterministic; then remove the lazy hasattr guards that create those attributes
(the check in maybe_modify_converted_hf_weight that sets
_hf_expert_cache/_hf_expert_expected and the guard that sets _hf_config
elsewhere) and rely on the constructor-initialized members instead, updating any
usages to handle their initialized (possibly empty/None) values.
- Line 18: The import and type hints use typing generics; change those to Python
3.10+ built-ins and appropriate stdlib ABCs: remove Dict, Optional, Tuple (and
Mapping) from the typing import line in glm5_bridge.py and update all
annotations—use dict[...] instead of Dict[...], tuple[...] instead of
Tuple[...], and use X | None instead of Optional[X]; for Mapping use
collections.abc.Mapping (import it) or replace with dict if a concrete mapping
is intended; update all function/method signatures and return annotations (e.g.,
"-> Dict[str, torch.Tensor]" → "-> dict[str, torch.Tensor]",
"Optional[torch.Tensor]" → "torch.Tensor | None", "Tuple[str, int,
Optional[str]]" → "tuple[str, int, str | None]") and remove the now-unused
typing imports.
- Around line 414-428: Replace the broad "except Exception: pass" in
_should_pack_experts with a narrow exception handler (e.g., catch
AttributeError/TypeError) and log the caught error instead of silencing it;
specifically handle cases where accessing hf_state_dict attributes or methods
may fail, and include the error and base_key in the log message. Apply the same
change to the analogous try/except in _expected_experts, and prefer checking for
hasattr/getattr safely before calling has_glob to avoid exceptions where
possible.
- Around line 385-390: The annotated type "WeightConversionTask" used in
maybe_modify_converted_hf_weight is not imported; add an import for
WeightConversionTask from megatron.bridge.models.conversion.model_bridge (or, if
circular imports are possible, import it inside a TYPE_CHECKING block from
typing import TYPE_CHECKING and then from
megatron.bridge.models.conversion.model_bridge import WeightConversionTask
within that block) so the annotation resolves for type checkers while leaving
runtime behavior unchanged.

In `@src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py`:
- Around line 24-32: Override GLM5ModelProvider.__post_init__ to suppress the
inherited deprecation message: implement a __post_init__ in GLM5ModelProvider
that temporarily replaces DeepSeekV3ModelProvider._warn_deprecated with a no-op,
calls super().__post_init__ (so any other base init runs), and then restores the
original _warn_deprecated. This prevents DeepSeekV3ModelProvider.__post_init__
from emitting the misleading warning for GLM5ModelProvider while preserving the
base initialization behavior.

In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Around line 407-425: The assembly logic in glm45_bridge.py (using
self._hf_expert_cache, base_key, expert_id, expected, output_base_key) assumes
cached keys are 0..expected-1 and so does torch.stack([cache[i] for i in
range(expected)]), which breaks under expert-parallel shards whose global expert
IDs are a non-zero-based range; update the assembly to (a) check that all
expected global expert IDs are present in cache (e.g., compare set(cache.keys())
to the expected global ID set), then sort the actual global IDs and stack using
that sorted order so tensors align to their global expert index, or (b) if you
intend to support only EP==1, add an explicit assertion that cached expert IDs
are contiguous and start at 0 and fail fast; apply the same fix for both the
single-tensor path and the per-kind ("gate"/"up") path that currently uses
cache[i].
- Around line 334-336: The attributes _hf_config, _hf_expert_cache and
_hf_expert_expected are being created lazily inside
maybe_modify_converted_hf_weight, which can cause AttributeError if accessed
earlier; add an __init__ override on the class to call super().__init__(*args,
**kwargs) and initialize self._hf_config = None, self._hf_expert_cache = {} and
self._hf_expert_expected = {} so these members always exist, and then remove the
hasattr guard around _hf_expert_cache in maybe_modify_converted_hf_weight.
- Around line 353-367: The helper _should_pack_experts swallows all exceptions
around the hf_state_dict membership checks, hiding real bugs; replace the broad
"except Exception: pass" with a narrow handler that only catches expected errors
from the membership test (e.g., TypeError for unhashable keys) and lets other
exceptions propagate; update the try/except around "if base_key in
hf_state_dict" and the ".weight" variant to catch TypeError (or ValueError if
applicable) only, and re-raise any other exception so unexpected
AttributeError/TypeError logic bugs in _should_pack_experts or hf_state_dict
surface.
- Around line 252-254: The return type of maybe_modify_loaded_hf_weight is
wrong: it currently declares -> torch.Tensor but the branch where hf_param is a
dict returns a dict[str, torch.Tensor]; update the annotation to reflect both
possibilities (e.g., -> torch.Tensor | dict[str, torch.Tensor] or use
typing.Union[torch.Tensor, dict[str, torch.Tensor]]), and ensure any necessary
typing imports (from __future__ import annotations or typing.Union/Dict) are
added so type checkers accept the union; keep the existing logic in
maybe_modify_loaded_hf_weight and only change the return type to match both the
tensor and dict branches.
- Around line 373-379: In _expected_experts, replace the broad "except
Exception" around the .shape[0] access for base_key/hf_state_dict with a narrow
except that only catches the anticipated errors (e.g., KeyError, IndexError,
AttributeError) so programming errors are not swallowed; specifically, in the
block that reads hf_state_dict[base_key].shape[0] (and the .weight fallback
using base_key[:-len(".weight")]) catch only those exceptions and set expected =
None in those cases.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py`:
- Around line 273-278: The test currently calls
GlmMoeDsaForCausalLM.from_pretrained(..., trust_remote_code=True); remove the
security risk by deleting the trust_remote_code argument or setting
trust_remote_code=False when loading from the local glm5_toy_model_path in
test_glm5_conversion.py so the known local class is used without executing
remote code.
- Around line 119-152: The test fixture _create_glm5_toy_model uses
AutoConfig.from_pretrained("zai-org/GLM-5") and
AutoTokenizer.from_pretrained("zai-org/GLM-5") which creates a network
dependency; replace these with a hermetic local setup by constructing the config
with AutoConfig.for_model("glm_moe_dsa", **HF_GLM5_TOY_MODEL_CONFIG) (so the
test uses HF_GLM5_TOY_MODEL_CONFIG directly and still sets
config.torch_dtype=torch.bfloat16) and stop calling
AutoTokenizer.from_pretrained; instead create a minimal tokenizer
stub/tokenizer_config.json in model_dir (matching patterns used in other tests)
and save that file before calling model.save_pretrained, ensuring
tokenizer.save_pretrained is not used to fetch from HF Hub and the local
config.json is written from HF_GLM5_TOY_MODEL_CONFIG as already done.

---

Minor comments:
In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py`:
- Around line 108-142: The provider_bridge function sets
make_vocab_size_divisible_by twice (configs["make_vocab_size_divisible_by"] and
provider.make_vocab_size_divisible_by) which is redundant; remove the later
direct attribute assignment on the provider (the
provider.make_vocab_size_divisible_by = 1280 line) and keep the constructor-time
setting in configs so GLM5ModelProvider(**configs) receives the value once.
- Around line 51-58: Update the docstring to correct two inaccuracies: replace
the incorrect source class name "Glm5MoeForCausalLM" with the registered class
name GlmMoeDsaForCausalLM, and change the example model path from
"zai-org/GLM-4.5" to a GLM-5 example such as "zai-org/GLM-5" in the
AutoBridge.from_hf_pretrained usage so the example and description correctly
reflect GLM-5 and the GlmMoeDsaForCausalLM source class (references:
GlmMoeDsaForCausalLM, AutoBridge.from_hf_pretrained, to_megatron_provider).

In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Line 326: Replace the string-forward type hint for WeightConversionTask with a
direct class import: add WeightConversionTask to the module imports (import it
from megatron.bridge.models.conversion.model_bridge alongside the other bridge
imports) and change the annotated field from task: "WeightConversionTask" to
task: WeightConversionTask so the type is a real symbol rather than a
forward-reference string.

In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py`:
- Around line 331-332: Remove the leftover commented-out debug token "# try:"
above the subprocess.run call in test_glm5_conversion.py (the block where result
= subprocess.run(cmd, capture_output=True, text=True, cwd=repo_root) is
invoked); either delete the commented marker entirely or replace it with a
proper try/except around subprocess.run (handling exceptions and failing the
test) if you intended to catch errors — ensure the change updates the
surrounding test logic (the test function that invokes subprocess.run)
consistently and does not leave unexplained commented-out code.
- Around line 194-197: The class docstring for TestGLM5Conversion incorrectly
references "GLM 4.5"; update the docstring in the TestGLM5Conversion class to
mention "GLM 5" (or a neutral phrase like "GLM 5 bridge") so it accurately
describes the test coverage for the GLM 5 MoE model conversion; locate the
TestGLM5Conversion class and edit its triple-quoted docstring accordingly.
- Around line 335-338: Replace the use of "assert False" which is ignored under
optimized Python with explicit raises: in the failure branches that inspect
result.returncode (using the result variable and test_name) change the "assert
False, f'GLM 5 MoE {test_name} conversion failed with return code
{result.returncode}'" to "raise AssertionError(...)" so the failure always
raises regardless of optimization; make the same change for the similar
assertion at the other occurrence later in the file (the second block around
line 434).
- Line 293: Remove the long debug print statement that exceeds 119 chars: delete
the print(f"second_layer mlp experts: {second_layer.mlp.experts} and type:
{type(second_layer.mlp.experts)} and size: {total_size} and shapes:
{total_shapes}") in
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py; if you need
to keep information for debugging, replace it with a concise logger.debug or a
focused assertion that checks specific properties of second_layer.mlp.experts,
total_size, or total_shapes and keep any kept message well under 119 characters.
- Line 404: The test function test_glm5_conversion_parallelism_local_model
declares an unused pytest fixture parameter tmp_path; either remove tmp_path
from the function signature or rename it to _tmp_path to silence the
unused-argument warning. Update the signature of
test_glm5_conversion_parallelism_local_model accordingly (and any direct
references if present) so the test uses only local_root_path, leaving other
parameters tp, pp, ep, test_name unchanged.

---

Nitpick comments:
In `@src/megatron/bridge/models/glm_moe_dsa/__init__.py`:
- Around line 21-24: The __all__ list is not alphabetized which triggers RUF022;
update the list in __init__.py so entries are sorted alphabetically, placing
"GLM5Bridge" before "GLM5ModelProvider" (i.e., change the order of the symbols
GLM5Bridge and GLM5ModelProvider in the __all__ declaration).

In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Line 18: The import and type annotations in glm45_bridge.py use deprecated
typing generics; replace "from typing import Dict, Mapping, Optional, Tuple"
with "from collections.abc import Mapping" (only) and use built-in generics and
PEP 604 unions in annotations: change Dict[...] → dict[...], Tuple[...] →
tuple[...], Optional[T] → T | None, and ensure any Mapping stays typed as
collections.abc.Mapping; update all occurrences such as Dict[str, torch.Tensor]
→ dict[str, torch.Tensor], Optional[...] → ... | None, and Tuple[str, int,
Optional[str]] → tuple[str, int, str | None] (adjust other similar annotations
accordingly).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cbcf4a and d308a0b.

📒 Files selected for processing (9)

3rdparty/Megatron-LM
src/megatron/bridge/models/__init__.py
src/megatron/bridge/models/conversion/param_mapping.py
src/megatron/bridge/models/glm/glm45_bridge.py
src/megatron/bridge/models/glm_moe_dsa/__init__.py
src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py
src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py
tests/functional_tests/models/glm_moe_dsa/__init__.py
tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py

coderabbitai · 2026-02-24T22:38:48Z

src/megatron/bridge/models/conversion/param_mapping.py

+    @staticmethod
+    def _should_allow_hf_name_mismatch(hf_param: Union[str, Dict[str, str]]) -> bool:
+        """Return True when HF params may be synthesized (e.g., expert slices)."""
+        def has_expert_wildcard(param_name: str) -> bool:
+            return ".experts.*." in param_name or ".experts.**." in param_name
+        def has_resolved_expert(param_name: str) -> bool:
+            return re.search(r"\.experts\.\d+\.", param_name) is not None
+
+        if isinstance(hf_param, str):
+            return has_expert_wildcard(hf_param) or has_resolved_expert(hf_param)
+        return any(has_expert_wildcard(name) or has_resolved_expert(name) for name in hf_param.values())


🛠️ Refactor suggestion | 🟠 Major

Apply type hints and use built-in generics per coding guidelines.

The hf_param annotation uses Union[str, Dict[str, str]] — should be str | dict[str, str].

The nested helpers has_expert_wildcard and has_resolved_expert lack type hints.

♻️ Proposed fix

`@staticmethod` - def _should_allow_hf_name_mismatch(hf_param: Union[str, Dict[str, str]]) -> bool: + def _should_allow_hf_name_mismatch(hf_param: str | dict[str, str]) -> bool: """Return True when HF params may be synthesized (e.g., expert slices).""" - def has_expert_wildcard(param_name: str) -> bool: + def has_expert_wildcard(param_name: str) -> bool: # noqa: E306 return ".experts.*." in param_name or ".experts.**." in param_name - def has_resolved_expert(param_name: str) -> bool: + def has_resolved_expert(param_name: str) -> bool: return re.search(r"\.experts\.\d+\.", param_name) is not None if isinstance(hf_param, str): return has_expert_wildcard(hf_param) or has_resolved_expert(hf_param) return any(has_expert_wildcard(name) or has_resolved_expert(name) for name in hf_param.values())

As per coding guidelines: "Use type hints for function arguments and return types" and "Use built-in generics (list, dict, tuple) instead of typing equivalents" and "Use 'X | Y' for union types instead of 'Union[X, Y]'".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/param_mapping.py` around lines 1985 - 1995, Update the _should_allow_hf_name_mismatch signature to use built-in generics and PEP 604 union syntax: change the parameter annotation from Union[str, Dict[str, str]] to str | dict[str, str] and add the return type bool; add type hints to the nested helpers has_expert_wildcard(param_name: str) -> bool and has_resolved_expert(param_name: str) -> bool so all functions in the _should_allow_hf_name_mismatch scope have explicit parameter and return type annotations (keep logic unchanged).

coderabbitai · 2026-02-24T22:38:48Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+import logging
+import re
+from functools import partial
+from typing import Dict, Mapping, Optional, Tuple


🛠️ Refactor suggestion | 🟠 Major

Replace typing generics with built-in equivalents per coding guidelines.

Dict, Optional, and Tuple from typing should be dict, X | None, and tuple respectively (Python 3.10+).

♻️ Proposed fix

-from typing import Dict, Mapping, Optional, Tuple +from collections.abc import Mapping +from typing import Optional, Tuple # remove once all usages are updated

Then update method signatures throughout the file, e.g.:

-> Dict[str, torch.Tensor] → -> dict[str, torch.Tensor]

Optional[torch.Tensor] → torch.Tensor | None

Tuple[str, int, Optional[str]] → tuple[str, int, str | None]

As per coding guidelines: "Use built-in generics (list, dict, tuple) instead of typing equivalents" and "Use 'T | None' for nullable types instead of 'Optional[T]'".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` at line 18, The import and type hints use typing generics; change those to Python 3.10+ built-ins and appropriate stdlib ABCs: remove Dict, Optional, Tuple (and Mapping) from the typing import line in glm5_bridge.py and update all annotations—use dict[...] instead of Dict[...], tuple[...] instead of Tuple[...], and use X | None instead of Optional[X]; for Mapping use collections.abc.Mapping (import it) or replace with dict if a concrete mapping is intended; update all function/method signatures and return annotations (e.g., "-> Dict[str, torch.Tensor]" → "-> dict[str, torch.Tensor]", "Optional[torch.Tensor]" → "torch.Tensor | None", "Tuple[str, int, Optional[str]]" → "tuple[str, int, str | None]") and remove the now-unused typing imports.

coderabbitai · 2026-02-24T22:38:48Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def maybe_modify_converted_hf_weight(
+        self,
+        task: "WeightConversionTask",
+        converted_weights_dict: Dict[str, torch.Tensor],
+        hf_state_dict: Mapping[str, torch.Tensor],
+    ) -> Dict[str, torch.Tensor]:
+        """Pack per-expert weights into GLM MoE tensors when exporting to HF."""
+        if not converted_weights_dict:
+            return converted_weights_dict
+
+        if not hasattr(self, "_hf_expert_cache"):
+            self._hf_expert_cache = {}
+            self._hf_expert_expected = {}


🛠️ Refactor suggestion | 🟠 Major

Instance state should be initialized in __init__, not lazily via hasattr.

_hf_expert_cache, _hf_expert_expected (lines 396-397) and _hf_config (line 147) are all created via hasattr guards scattered across methods. This violates the coding guideline "Initialize all externally visible members of a class in the constructor" and makes the lifecycle of these caches opaque. If maybe_modify_converted_hf_weight is called concurrently or build_conversion_tasks is skipped, the state is silently absent.

Add an __init__ to GLM5Bridge that initialises these to None / empty dicts.

♻️ Proposed fix

class GLM5Bridge(MegatronModelBridge): + def __init__(self): + super().__init__() + self._hf_config = None + self._hf_expert_cache: dict = {} + self._hf_expert_expected: dict = {} +

Then remove the if not hasattr(self, "_hf_expert_cache"): block at lines 395-397 and the if not hasattr(self, "_hf_config"): guard at line 241.

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 387-387: undefined name 'WeightConversionTask'

(F821)

🪛 Ruff (0.15.2)

[error] 387-387: Undefined name WeightConversionTask

(F821)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 385 - 397, Add an explicit __init__ to GLM5Bridge that initializes the instance attributes _hf_expert_cache, _hf_expert_expected and _hf_config (e.g., to empty dicts or None as appropriate) so their lifecycle is deterministic; then remove the lazy hasattr guards that create those attributes (the check in maybe_modify_converted_hf_weight that sets _hf_expert_cache/_hf_expert_expected and the guard that sets _hf_config elsewhere) and rely on the constructor-initialized members instead, updating any usages to handle their initialized (possibly empty/None) values.

coderabbitai · 2026-02-24T22:38:48Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def maybe_modify_converted_hf_weight(
+        self,
+        task: "WeightConversionTask",
+        converted_weights_dict: Dict[str, torch.Tensor],
+        hf_state_dict: Mapping[str, torch.Tensor],
+    ) -> Dict[str, torch.Tensor]:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "WeightConversionTask" --type py -C 3

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 37586

🏁 Script executed:

head -100 src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 5084

🏁 Script executed:

rg "from.*import" src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py | head -30

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 768

🏁 Script executed:

# Check if glm5_bridge.py has a TYPE_CHECKING block grep -n "TYPE_CHECKING" src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Check the full imports section of glm5_bridge.py to see if WeightConversionTask should be there sed -n '1,50p' src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1793

Import WeightConversionTask or use a TYPE_CHECKING block to resolve the type annotation.

The method signature on line 387 uses task: "WeightConversionTask" as a string annotation, but WeightConversionTask is never imported. While Python evaluates string annotations lazily, type checkers cannot resolve the annotation without the type being in scope. This violates the coding guideline to use proper type hints. Compare with other bridge implementations (e.g., deepseek_v3_bridge.py, qwen3_vl_bridge.py) that import WeightConversionTask directly from megatron.bridge.models.conversion.model_bridge.

Import WeightConversionTask from the existing import on line 26, or conditionally import it within a TYPE_CHECKING block if circular imports are a concern.

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 387-387: undefined name 'WeightConversionTask'

(F821)

🪛 Ruff (0.15.2)

[error] 387-387: Undefined name WeightConversionTask

(F821)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 385 - 390, The annotated type "WeightConversionTask" used in maybe_modify_converted_hf_weight is not imported; add an import for WeightConversionTask from megatron.bridge.models.conversion.model_bridge (or, if circular imports are possible, import it inside a TYPE_CHECKING block from typing import TYPE_CHECKING and then from megatron.bridge.models.conversion.model_bridge import WeightConversionTask within that block) so the annotation resolves for type checkers while leaving runtime behavior unchanged.

coderabbitai · 2026-02-24T22:38:48Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+        def _should_pack_experts(base_key: str) -> bool:
+            try:
+                if base_key in hf_state_dict:
+                    return True
+                if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict:
+                    return True
+            except Exception:
+                pass
+            has_glob = getattr(hf_state_dict, "has_glob", None)
+            if callable(has_glob):
+                if has_glob(base_key):
+                    return True
+                if base_key.endswith(".weight") and has_glob(base_key[: -len(".weight")]):
+                    return True
+            return False


⚠️ Potential issue | 🟠 Major

except Exception: pass silently swallows all errors.

_should_pack_experts catches a broad Exception and suppresses it without logging (lines 420-421). This makes diagnosing failures during HF weight export very difficult. At minimum the exception should be logged.

♻️ Proposed fix

try: if base_key in hf_state_dict: return True if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict: return True - except Exception: - pass + except Exception as e: + logger.debug("Exception while probing hf_state_dict for %s: %s", base_key, e)

The same pattern applies to lines 439-440 in _expected_experts. Based on coding guidelines: "When using try-except blocks, limit the except clause to the smallest set of errors possible."

🧰 Tools

🪛 Ruff (0.15.2)

[error] 420-421: try-except-pass detected, consider logging the exception

(S110)

[warning] 420-420: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py` around lines 414 - 428, Replace the broad "except Exception: pass" in _should_pack_experts with a narrow exception handler (e.g., catch AttributeError/TypeError) and log the caught error instead of silencing it; specifically handle cases where accessing hf_state_dict attributes or methods may fail, and include the error and base_key in the log message. Apply the same change to the analogous try/except in _expected_experts, and prefer checking for hasattr/getattr safely before calling has_glob to avoid exceptions where possible.

coderabbitai · 2026-02-24T22:38:49Z

src/megatron/bridge/models/glm/glm45_bridge.py

+        def _should_pack_experts(base_key: str) -> bool:
+            try:
+                if base_key in hf_state_dict:
+                    return True
+                if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict:
+                    return True
+            except Exception:
+                pass
+            has_glob = getattr(hf_state_dict, "has_glob", None)
+            if callable(has_glob):
+                if has_glob(base_key):
+                    return True
+                if base_key.endswith(".weight") and has_glob(base_key[: -len(".weight")]):
+                    return True
+            return False


🛠️ Refactor suggestion | 🟠 Major

Swallow-all except Exception: pass masks real failures in _should_pack_experts.

Lines 354–360 catch every possible exception silently, including bugs like attribute errors or type errors that should surface immediately. The outer try body only performs a dict membership test (in hf_state_dict) which should raise at most a TypeError if the key is unhashable.

🐛 Proposed fix

- def _should_pack_experts(base_key: str) -> bool: - try: - if base_key in hf_state_dict: - return True - if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict: - return True - except Exception: - pass + def _should_pack_experts(base_key: str) -> bool: + try: + if base_key in hf_state_dict: + return True + if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict: + return True + except TypeError: + pass

As per coding guidelines: "When using try-except blocks, limit the except clause to the smallest set of errors possible".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _should_pack_experts(base_key: str) -> bool:

try:

if base_key in hf_state_dict:

return True

if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict:

return True

except Exception:

pass

has_glob = getattr(hf_state_dict, "has_glob", None)

if callable(has_glob):

if has_glob(base_key):

return True

if base_key.endswith(".weight") and has_glob(base_key[: -len(".weight")]):

return True

return False

def _should_pack_experts(base_key: str) -> bool:

try:

if base_key in hf_state_dict:

return True

if base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict:

return True

except TypeError:

pass

has_glob = getattr(hf_state_dict, "has_glob", None)

if callable(has_glob):

if has_glob(base_key):

return True

if base_key.endswith(".weight") and has_glob(base_key[: -len(".weight")]):

return True

return False

🧰 Tools

🪛 Ruff (0.15.2)

[error] 359-360: try-except-pass detected, consider logging the exception

(S110)

[warning] 359-359: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm/glm45_bridge.py` around lines 353 - 367, The helper _should_pack_experts swallows all exceptions around the hf_state_dict membership checks, hiding real bugs; replace the broad "except Exception: pass" with a narrow handler that only catches expected errors from the membership test (e.g., TypeError for unhashable keys) and lets other exceptions propagate; update the try/except around "if base_key in hf_state_dict" and the ".weight" variant to catch TypeError (or ValueError if applicable) only, and re-raise any other exception so unexpected AttributeError/TypeError logic bugs in _should_pack_experts or hf_state_dict surface.

coderabbitai · 2026-02-24T22:38:49Z

src/megatron/bridge/models/glm/glm45_bridge.py

+            try:
+                if base_key in hf_state_dict:
+                    expected = int(hf_state_dict[base_key].shape[0])
+                elif base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict:
+                    expected = int(hf_state_dict[base_key[: -len(".weight")]].shape[0])
+            except Exception:
+                expected = None


🛠️ Refactor suggestion | 🟠 Major

Broad except Exception in _expected_experts hides shape-access bugs.

The try body accesses .shape[0] on a tensor. The only expected exceptions are KeyError (key absent) and potentially IndexError (scalar tensor). Catching all Exception silently discards programming errors.

🐛 Proposed fix

- try: - if base_key in hf_state_dict: - expected = int(hf_state_dict[base_key].shape[0]) - elif base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict: - expected = int(hf_state_dict[base_key[: -len(".weight")]].shape[0]) - except Exception: - expected = None + try: + if base_key in hf_state_dict: + expected = int(hf_state_dict[base_key].shape[0]) + elif base_key.endswith(".weight") and base_key[: -len(".weight")] in hf_state_dict: + expected = int(hf_state_dict[base_key[: -len(".weight")]].shape[0]) + except (KeyError, IndexError, TypeError): + expected = None

As per coding guidelines: "When using try-except blocks, limit the except clause to the smallest set of errors possible".

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 378-378: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm/glm45_bridge.py` around lines 373 - 379, In _expected_experts, replace the broad "except Exception" around the .shape[0] access for base_key/hf_state_dict with a narrow except that only catches the anticipated errors (e.g., KeyError, IndexError, AttributeError) so programming errors are not swallowed; specifically, in the block that reads hf_state_dict[base_key].shape[0] (and the .weight fallback using base_key[:-len(".weight")]) catch only those exceptions and set expected = None in those cases.

coderabbitai · 2026-02-24T22:38:49Z

src/megatron/bridge/models/glm/glm45_bridge.py

+            cache = self._hf_expert_cache.setdefault(base_key, {})
+            if kind is None:
+                cache[expert_id] = tensor
+                if len(cache) == expected:
+                    stacked = torch.stack([cache[i] for i in range(expected)], dim=0)
+                    output[output_base_key] = stacked
+                    del self._hf_expert_cache[base_key]
+            else:
+                per_kind = cache.setdefault(expert_id, {})
+                per_kind[kind] = tensor
+                if all(
+                    (i in cache and "gate" in cache[i] and "up" in cache[i]) for i in range(expected)
+                ):
+                    stacked = torch.stack(
+                        [torch.cat([cache[i]["gate"], cache[i]["up"]], dim=0) for i in range(expected)],
+                        dim=0,
+                    )
+                    output[output_base_key] = stacked
+                    del self._hf_expert_cache[base_key]


⚠️ Potential issue | 🟠 Major

Expert packing assumes contiguous global expert IDs starting at 0 — breaks under expert parallelism.

When packing per-expert slices back into the fused [num_experts, ...] tensor at lines 411 and 420–422, the code iterates range(expected) and looks up cache[i]. Under expert parallelism (EP > 1), the Megatron→HF conversion task on a given rank will only produce weights for the local expert shard, whose global expert IDs are a non-zero-based contiguous range (e.g., experts 64–127 on rank 1). The local cache will contain those global IDs as keys, so cache[i] for i in range(expected) will KeyError or silently assemble the wrong tensor.

The fix is to either (a) reassemble only when all expected global expert IDs are present and use the sorted keys, or (b) document that this path is only safe for single-rank EP=1 conversion and add an assertion.

As per coding guidelines: "Handle tensor parallel and pipeline parallel distribution correctly in weight conversion".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/glm/glm45_bridge.py` around lines 407 - 425, The assembly logic in glm45_bridge.py (using self._hf_expert_cache, base_key, expert_id, expected, output_base_key) assumes cached keys are 0..expected-1 and so does torch.stack([cache[i] for i in range(expected)]), which breaks under expert-parallel shards whose global expert IDs are a non-zero-based range; update the assembly to (a) check that all expected global expert IDs are present in cache (e.g., compare set(cache.keys()) to the expected global ID set), then sort the actual global IDs and stack using that sorted order so tensors align to their global expert index, or (b) if you intend to support only EP==1, add an explicit assertion that cached expert IDs are contiguous and start at 0 and fail fast; apply the same fix for both the single-tensor path and the per-kind ("gate"/"up") path that currently uses cache[i].

coderabbitai · 2026-02-24T22:38:49Z

tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py

+def _create_glm5_toy_model(model_dir: Path) -> None:
+    model_dir.mkdir(parents=True, exist_ok=True)
+
+    # Create GLM 4.5 config from the toy model config using AutoConfig
+    config = AutoConfig.from_pretrained("zai-org/GLM-5")
+
+    # Override with toy model config
+    for key, value in HF_GLM5_TOY_MODEL_CONFIG.items():
+        setattr(config, key, value)
+
+    config.torch_dtype = torch.bfloat16  # Explicitly set the torch_dtype in config
+
+    # Create model with random weights and convert to bfloat16
+    from transformers import GlmMoeDsaForCausalLM
+
+    model = GlmMoeDsaForCausalLM(config)
+
+    model = model.bfloat16()  # Use .bfloat16() method instead of .to()
+    for k, v in model.named_buffers():
+        if "e_score_correction_bias" in k:
+            v.data = v.data.to(torch.float32)
+
+    # Download and save tokenizer from a reference GLM model
+    tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-5")
+    tokenizer.save_pretrained(model_dir)
+
+    # Save model and config to directory
+    model.save_pretrained(model_dir, safe_serialization=True)
+
+    # Also save config.json explicitly to ensure compatibility with correct torch_dtype
+    config_to_save = HF_GLM5_TOY_MODEL_CONFIG.copy()
+    config_path = model_dir / "config.json"
+    with open(config_path, "w") as f:
+        json.dump(config_to_save, f, indent=2)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "test_glm5_conversion.py" -type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 138

🏁 Script executed:

cat -n tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py | sed -n '119,152p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1652

🏁 Script executed:

rg "GlmMoeDsaConfig|AutoConfig\.for_model" --type py -A 2 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 604

🏁 Script executed:

rg "from transformers import" tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py -A 5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 717

🏁 Script executed:

rg "save_pretrained|tokenizer" tests/functional_tests/models/olmoe/test_olmoe_conversion.py -B 2 -A 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1484

🏁 Script executed:

rg "class.*Config|PretrainedConfig" tests/functional_tests/ -t py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 166

🏁 Script executed:

rg "PretrainedConfig|from_config" tests/functional_tests/ -t py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 490

🏁 Script executed:

grep -r "tokenizer_config.json" tests/functional_tests/ --include="*.py" -B 3 -A 3

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 30162

🏁 Script executed:

rg "vocab_size|vocab\\.json|merges\\.txt" tests/functional_tests/ --include="*.py" -B 2 -A 2 | head -30

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 512

🏁 Script executed:

cat -n tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py | sed -n '1,30p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1229

🏁 Script executed:

cat -n tests/functional_tests/models/llama_nemotron/test_llama_nemotron_conversion.py | sed -n '50,100p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2448

Use local config construction and tokenizer stub to eliminate network dependency.

Lines 123 and 142 call AutoConfig.from_pretrained("zai-org/GLM-5") and AutoTokenizer.from_pretrained("zai-org/GLM-5"), creating a non-hermetic test fixture that fails in offline CI environments or when HuggingFace Hub is unreachable and requires credentials.

Replace with:

AutoConfig.for_model("glm_moe_dsa", **HF_GLM5_TOY_MODEL_CONFIG) to construct config directly from the toy model config (pattern already used in tests/functional_tests/models/olmoe/test_olmoe_conversion.py)

Create a minimal tokenizer_config.json locally instead of downloading (pattern already used in several test files like llama_nemotron and qwen_vl)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` around lines 119 - 152, The test fixture _create_glm5_toy_model uses AutoConfig.from_pretrained("zai-org/GLM-5") and AutoTokenizer.from_pretrained("zai-org/GLM-5") which creates a network dependency; replace these with a hermetic local setup by constructing the config with AutoConfig.for_model("glm_moe_dsa", **HF_GLM5_TOY_MODEL_CONFIG) (so the test uses HF_GLM5_TOY_MODEL_CONFIG directly and still sets config.torch_dtype=torch.bfloat16) and stop calling AutoTokenizer.from_pretrained; instead create a minimal tokenizer stub/tokenizer_config.json in model_dir (matching patterns used in other tests) and save that file before calling model.save_pretrained, ensuring tokenizer.save_pretrained is not used to fetch from HF Hub and the local config.json is written from HF_GLM5_TOY_MODEL_CONFIG as already done.

coderabbitai · 2026-02-24T22:38:49Z

tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py

+        model = GlmMoeDsaForCausalLM.from_pretrained(
+            glm5_toy_model_path,
+            torch_dtype=torch.bfloat16,
+            low_cpu_mem_usage=False,  # Ensure full loading
+            trust_remote_code=True,
+        )


⚠️ Potential issue | 🟠 Major

trust_remote_code=True is a security risk in tests.

Loading a model with trust_remote_code=True executes arbitrary code from the HuggingFace Hub. Since the toy model is created locally from a known class (GlmMoeDsaForCausalLM), the flag is unnecessary here.

🐛 Proposed fix

model = GlmMoeDsaForCausalLM.from_pretrained( glm5_toy_model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=False, - trust_remote_code=True, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model = GlmMoeDsaForCausalLM.from_pretrained(

glm5_toy_model_path,

torch_dtype=torch.bfloat16,

low_cpu_mem_usage=False, # Ensure full loading

trust_remote_code=True,

)

model = GlmMoeDsaForCausalLM.from_pretrained(

glm5_toy_model_path,

torch_dtype=torch.bfloat16,

low_cpu_mem_usage=False, # Ensure full loading

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional_tests/models/glm_moe_dsa/test_glm5_conversion.py` around lines 273 - 278, The test currently calls GlmMoeDsaForCausalLM.from_pretrained(..., trust_remote_code=True); remove the security risk by deleting the trust_remote_code argument or setting trust_remote_code=False when loading from the local glm5_toy_model_path in test_glm5_conversion.py so the known local class is used without executing remote code.

… tests and cleaning Signed-off-by: pengdurice <pengduhit@gmail.com>

Signed-off-by: pengdurice <pengduhit@gmail.com>

) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: maanug-nv <109391026+maanug-nv@users.noreply.github.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Signed-off-by: pengdurice <pengduhit@gmail.com>

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice · 2026-02-26T17:48:59Z

@malay-nagda hi, nice to e-meet you. Do you mind reviewing this PR? thanks! This is to add GLM5 support.

wuyaoxuehun · 2026-02-27T03:27:02Z

@malay-nagda hi, nice to e-meet you. Do you mind reviewing this PR? thanks! This is to add GLM5 support.

@HollowMan6 @malay-nagda
+1

malay-nagda · 2026-02-27T08:53:57Z

@malay-nagda hi, nice to e-meet you. Do you mind reviewing this PR? thanks! This is to add GLM5 support.

@HollowMan6 @malay-nagda +1

Hi. For base recipes, I think @yaoyu-33 can help review?

pengdurice · 2026-02-27T17:17:39Z

@malay-nagda hi, nice to e-meet you. Do you mind reviewing this PR? thanks! This is to add GLM5 support.

@HollowMan6 @malay-nagda +1

Hi. For base recipes, I think @yaoyu-33 can help review?

Hi @malay-nagda thank you for the reply. @yaoyu-33 , nice to meet you, do you have time for reviewing this PR? Thanks!

pengdurice · 2026-03-02T16:23:09Z

@yaoyu-33, quickly reminder of this PR, thank you! Have a nice week ahead!

HollowMan6

I checked this PR and looks like the current implementation doesn't properly make GLM5 use DSA but wrongly MLA:

After applying all the fixes I propose (Suggested changes) below, it seems to work fine (tested with LoRA enabled):

Note that it also needs dependency NVIDIA/Megatron-LM#3026, and for now mcore doesn't support CP and thd format for DSA.

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py

HollowMan6 · 2026-03-03T12:49:56Z

Probably we can also integrate the tilelang fused kernels for DSA, (https://github.com/THUDM/slime/tree/14413cf7c5c0d1c9854a28f471d3ab575461aa51/slime_plugins/models/glm5) to accelerate the training speed.

Note that it also needs dependency NVIDIA/Megatron-LM#3026, and for now mcore doesn't support CP and thd format for DSA.

Now I opened NVIDIA/Megatron-LM#3674 to integrate the tilelang fused kernels for DSA, as well as supporting CP and thd format. Please feel free to try it out and let me know if you find any issues.

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

HollowMan6

Enable and fix MTP mappings support

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

wuyaoxuehun · 2026-03-04T11:55:43Z

@pengdurice @HollowMan6 hi, is this pr ready to merge then?
Thanks a lot!

HollowMan6 · 2026-03-04T12:41:33Z

@wuyaoxuehun I haven't tested the convergence yet, just using a dummy model for now due to constrained resources on my side. Other than that, it looks good to me.

HollowMan6

Some more optional changes so that we don't need to further override build_adapter_conversion_tasks (need to wait until #2644 is merged)

HollowMan6 · 2026-03-04T14:51:59Z

src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py

+    dsa_indexer_topk: int = 2048
+    dsa_indexer_loss_coeff: float = 0.001
+    dsa_indexer_use_sparse_loss: bool = True
+    mtp_num_layers: int = 1


Maybe not pass this param here so that users can decide whether to enable MTP.

Suggested change

mtp_num_layers: int = 1

HollowMan6 · 2026-03-04T14:52:20Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+            "dsa_indexer_loss_coeff": 0.001,
+            "dsa_indexer_use_sparse_loss": True,
+            # MTP params
+            "mtp_num_layers": hf_config.num_nextn_predict_layers,


Maybe not pass this param here so that users can decide whether to enable MTP.

Suggested change

"mtp_num_layers": hf_config.num_nextn_predict_layers,

HollowMan6 · 2026-03-04T14:59:34Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+    def build_conversion_tasks(self, hf_pretrained, megatron_model):
+        """Override to store config before mapping_registry is called."""
+        # Store config on instance for use in mapping_registry
+        self._hf_config = hf_pretrained.config
+        return super().build_conversion_tasks(hf_pretrained, megatron_model)
+


need to wait until #2644 is merged

Suggested change

def build_conversion_tasks(self, hf_pretrained, megatron_model):

"""Override to store config before mapping_registry is called."""

# Store config on instance for use in mapping_registry

self._hf_config = hf_pretrained.config

return super().build_conversion_tasks(hf_pretrained, megatron_model)

HollowMan6 · 2026-03-04T15:01:11Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+        # optionally add MTP mappings
+        if not hasattr(self, "_hf_config"):
+            logger.warning("No HF config found, skipping MTP mappings.")
+            return MegatronMappingRegistry(*mapping_list)
+        hf_config = self._hf_config


need to wait until #2644 is merged

Suggested change

# optionally add MTP mappings

if not hasattr(self, "_hf_config"):

logger.warning("No HF config found, skipping MTP mappings.")

return MegatronMappingRegistry(*mapping_list)

hf_config = self._hf_config

hf_config = self.hf_config

HollowMan6 · 2026-03-04T19:23:52Z

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

+            for layer_prefix in ("transformer_layer", "mtp_model_layer"):
+                for megatron_param, hf_param in (param_mappings | layer_specific_mappings).items():
+                    megatron_param = (
+                        megatron_param.replace(".*", f".*.{layer_prefix}")
+                        .replace("decoder", "mtp")
+                        .replace(".*", f".{mtp_layer}")
+                    )
+                    hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")
+                    mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))
+
+                # MTP specific mappings
+                mapping_list.extend(
+                    [
+                        AutoMapping(
+                            megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",
+                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",
+                        ),
+                        AutoMapping(
+                            megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",
+                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",
+                        ),
+                        AutoMapping(
+                            megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",
+                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",
+                        ),
+                        AutoMapping(
+                            megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",
+                            hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",
+                        ),
+                    ]
+                )


Move mappings that don't use layer_prefix out of the loop

Suggested change

for layer_prefix in ("transformer_layer", "mtp_model_layer"):

for megatron_param, hf_param in (param_mappings | layer_specific_mappings).items():

megatron_param = (

megatron_param.replace(".*", f".*.{layer_prefix}")

.replace("decoder", "mtp")

.replace(".*", f".{mtp_layer}")

)

hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")

mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))

# MTP specific mappings

mapping_list.extend(

[

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",

),

]

)

# MTP specific mappings

mapping_list.extend(

[

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.enorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.enorm.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.hnorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.hnorm.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.eh_proj.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.eh_proj.weight",

),

AutoMapping(

megatron_param=f"mtp.layers.{mtp_layer}.final_layernorm.weight",

hf_param=f"model.layers.{mtp_layer + num_transformer_layers}.shared_head.norm.weight",

),

]

)

for layer_prefix in ("transformer_layer", "mtp_model_layer"):

for megatron_param, hf_param in (param_mappings | layer_specific_mappings).items():

megatron_param = (

megatron_param.replace(".*", f".*.{layer_prefix}")

.replace("decoder", "mtp")

.replace(".*", f".{mtp_layer}")

)

hf_param = hf_param.replace("layers.*", f"layers.{mtp_layer + num_transformer_layers}")

mapping_list.append(AutoMapping(megatron_param=megatron_param, hf_param=hf_param))

pengdurice · 2026-03-05T01:19:25Z

@wuyaoxuehun I haven't tested the convergence yet, just using a dummy model for now due to constrained resources on my side. Other than that, it looks good to me.

@HollowMan6 thank you so much for your contribution to this PR. Due to resource constraints, I was able to only do local unit tests. I am doing some more testing on my side as well. Hope we get to a workable stage soon!

github-actions bot added the community-request label Feb 20, 2026

HollowMan6 reviewed Feb 23, 2026

View reviewed changes

pengdurice marked this pull request as ready for review February 24, 2026 18:32

pengdurice changed the title ~~[Draft] Add GLM5 support~~ Add GLM5 support Feb 24, 2026

pengdurice changed the title ~~Add GLM5 support~~ [Draft]Add GLM5 support Feb 24, 2026

pengdurice force-pushed the peng-add-glm5 branch from d0bc00a to b19f84c Compare February 24, 2026 18:40

pengdurice requested review from a team, erhoo82 and malay-nagda as code owners February 24, 2026 18:40

pengdurice marked this pull request as draft February 24, 2026 18:41

pengdurice force-pushed the peng-add-glm5 branch from b19f84c to 6cafd46 Compare February 24, 2026 19:58

pengdurice marked this pull request as ready for review February 24, 2026 22:31

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

pengdurice changed the title ~~[Draft]Add GLM5 support~~ Add GLM5 support Feb 25, 2026

pengdurice mentioned this pull request Feb 25, 2026

[Feature]GLM-5 support? #2343

Open

pengdurice force-pushed the peng-add-glm5 branch from d892a2d to c6b909f Compare February 26, 2026 16:07

pengdurice and others added 10 commits February 26, 2026 16:11

very first version, local test passed, save as draft and then do more…

681278c

… tests and cleaning Signed-off-by: pengdurice <pengduhit@gmail.com>

revert 4.5 test, move code changes to subclass

4f034ac

Signed-off-by: pengdurice <pengduhit@gmail.com>

add changes on the test case

23c7c38

Signed-off-by: pengdurice <pengduhit@gmail.com>

revert uv lock and pyproject changes

5aac517

Signed-off-by: pengdurice <pengduhit@gmail.com>

chore(beep boop 🤖): Bump uv.lock (main) (2026-02-17) (NVIDIA-NeMo#2399

bfec311

) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: maanug-nv <109391026+maanug-nv@users.noreply.github.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

chore(beep boop 🤖): Bump (main) (2026-02-23)

d3f0bc1

Signed-off-by: pengdurice <pengduhit@gmail.com>

additional fix

429ee59

Signed-off-by: pengdurice <pengduhit@gmail.com>

revert uv lock

0ad4aac

Signed-off-by: pengdurice <pengduhit@gmail.com>

revert 45 change

42187a9

Signed-off-by: pengdurice <pengduhit@gmail.com>

remove deprecated MP params (NVIDIA-NeMo#2511)

8a71314

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice force-pushed the peng-add-glm5 branch from c6b909f to 8a71314 Compare February 26, 2026 16:16

checkout main and revert changes

14730ae

Signed-off-by: pengdurice <pengduhit@gmail.com>

HollowMan6 suggested changes Mar 2, 2026

View reviewed changes

pengdurice and others added 5 commits March 3, 2026 13:38

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

724323e

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

280914b

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py

d256c4c

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

c65acee

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

5f43b19

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

HollowMan6 suggested changes Mar 3, 2026

View reviewed changes

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py Show resolved Hide resolved

src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py Show resolved Hide resolved

HollowMan6 suggested changes Mar 3, 2026

View reviewed changes

src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py Outdated Show resolved Hide resolved

pengdurice and others added 3 commits March 3, 2026 14:57

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

efdf59d

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_provider.py

7aea2cc

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update src/megatron/bridge/models/glm_moe_dsa/glm5_bridge.py

5ff9d13

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <j88437182@hotmail.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

HollowMan6 reviewed Mar 4, 2026

View reviewed changes

Conversation

pengdurice commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

HollowMan6 left a comment

Choose a reason for hiding this comment

Uh oh!

pengdurice commented Feb 23, 2026

Uh oh!

pengdurice commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

pengdurice commented Feb 26, 2026

Uh oh!

wuyaoxuehun commented Feb 27, 2026

Uh oh!

malay-nagda commented Feb 27, 2026

Uh oh!

pengdurice commented Feb 27, 2026

Uh oh!

pengdurice commented Mar 2, 2026

Uh oh!

HollowMan6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengdurice commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

pengdurice commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

HollowMan6 left a comment •

edited

Loading

HollowMan6 Mar 4, 2026 •

edited

Loading

HollowMan6 Mar 4, 2026 •

edited

Loading