Fix mbridge inference and use dynamic inference from mcore by oyilmaz-nvidia · Pull Request #627 · NVIDIA-NeMo/Export-Deploy

oyilmaz-nvidia · 2026-03-03T21:42:04Z

… dynamic inference

Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no dependency on the nemo package, and re-exports the complex NeMo types (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a single HAVE_NEMO guard.
Remove direct from nemo.* imports from inference_base.py and tron_utils.py; both files now import from the local nemo_utils module instead.
Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called with (model, inference_context) but the deployed Megatron-LM API expects (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig built from model.config attributes; MCoreEngine then internally creates a DynamicInferenceContext and switches to DynamicInferenceEngine.

… dynamic inference - Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no dependency on the nemo package, and re-exports the complex NeMo types (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a single HAVE_NEMO guard. - Remove direct from nemo.* imports from inference_base.py and tron_utils.py; both files now import from the local nemo_utils module instead. - Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called with (model, inference_context) but the deployed Megatron-LM API expects (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig built from model.config attributes; MCoreEngine then internally creates a DynamicInferenceContext and switches to DynamicInferenceEngine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

copy-pr-bot · 2026-03-03T21:42:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

oyilmaz-nvidia · 2026-03-03T21:44:03Z

/ok to test c4dcc44

- Remove unused StaticInferenceContext import - Use inner model config for hidden_size/params_dtype instead of outer model - Add buffer_size_gb param to create_mcore_engine and MegatronLLMDeployable Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

oyilmaz-nvidia · 2026-03-03T23:39:21Z

/ok to test ce646ce

athitten · 2026-03-04T00:51:23Z

nemo_deploy/llm/inference/inference_base.py

+    HAVE_NEMO,
+    MCoreTokenizerWrappper,
+    ckpt_to_context_subdir,
+    ckpt_to_weights_subdir,


@oyilmaz-nvidia do we want to move nemo 2.0 functionality here ? Can't we just remove it since nemo 2.0 deployment code is already removed anyway

So that's for importing the nemo and I'll have another PR to remove it. It's actually a lot more challenging than just adding here.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia · 2026-03-05T12:37:22Z

/ok to test 3b99d12

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

chtruong814 · 2026-03-06T03:25:43Z

/ok to test c5fdd40

oyilmaz-nvidia requested review from athitten, meatybobby and pthombre as code owners March 3, 2026 21:42

github-actions bot added deploy LLM labels Mar 3, 2026

oyilmaz-nvidia changed the title ~~inference: remove direct nemo imports, add InferenceWrapperConfig for to fix mbridge inference~~ Add InferenceWrapperConfig for to fix mbridge inference Mar 3, 2026

copy-pr-bot bot temporarily deployed to test March 3, 2026 21:44 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 21:47 Failure

copy-pr-bot bot temporarily deployed to test March 3, 2026 23:40 Inactive

oyilmaz-nvidia changed the title ~~Add InferenceWrapperConfig for to fix mbridge inference~~ Fix mbridge inference and use dynamic inference from mcore Mar 3, 2026

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 23:46 Failure

athitten reviewed Mar 4, 2026

View reviewed changes

oyilmaz-nvidia had a problem deploying to public March 4, 2026 23:39 — with GitHub Actions Failure

Update mbridge commit

3b99d12

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia requested a review from a team as a code owner March 5, 2026 12:36

copy-pr-bot bot temporarily deployed to test March 5, 2026 12:38 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 5, 2026 12:45 Failure

chtruong814 added 2 commits March 6, 2026 02:23

Fix megatron-bridge install

21862a6

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Set cryptography to < 47

c5fdd40

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

copy-pr-bot bot deployed to test March 6, 2026 03:26 Active

copy-pr-bot bot had a problem deploying to nemo-ci March 6, 2026 03:28 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mbridge inference and use dynamic inference from mcore #627

Fix mbridge inference and use dynamic inference from mcore #627
oyilmaz-nvidia wants to merge 5 commits intomainfrom
remove-direct-nemo-imports-in-inference

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

athitten Mar 4, 2026

Uh oh!

oyilmaz-nvidia Mar 4, 2026

Uh oh!

oyilmaz-nvidia commented Mar 5, 2026

Uh oh!

chtruong814 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

oyilmaz-nvidia commented Mar 3, 2026

Uh oh!

athitten Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

oyilmaz-nvidia Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

oyilmaz-nvidia commented Mar 5, 2026

Uh oh!

chtruong814 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants