Skip to content

Fix mbridge inference and use dynamic inference from mcore #627

Open
oyilmaz-nvidia wants to merge 5 commits intomainfrom
remove-direct-nemo-imports-in-inference
Open

Fix mbridge inference and use dynamic inference from mcore #627
oyilmaz-nvidia wants to merge 5 commits intomainfrom
remove-direct-nemo-imports-in-inference

Conversation

@oyilmaz-nvidia
Copy link
Contributor

… dynamic inference

  • Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no dependency on the nemo package, and re-exports the complex NeMo types (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a single HAVE_NEMO guard.
  • Remove direct from nemo.* imports from inference_base.py and tron_utils.py; both files now import from the local nemo_utils module instead.
  • Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called with (model, inference_context) but the deployed Megatron-LM API expects (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig built from model.config attributes; MCoreEngine then internally creates a DynamicInferenceContext and switches to DynamicInferenceEngine.

… dynamic inference

- Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo
  utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no
  dependency on the nemo package, and re-exports the complex NeMo types
  (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a
  single HAVE_NEMO guard.
- Remove direct from nemo.* imports from inference_base.py and tron_utils.py;
  both files now import from the local nemo_utils module instead.
- Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called
  with (model, inference_context) but the deployed Megatron-LM API expects
  (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig
  built from model.config attributes; MCoreEngine then internally creates a
  DynamicInferenceContext and switches to DynamicInferenceEngine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oyilmaz-nvidia oyilmaz-nvidia changed the title inference: remove direct nemo imports, add InferenceWrapperConfig for to fix mbridge inference Add InferenceWrapperConfig for to fix mbridge inference Mar 3, 2026
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test c4dcc44

- Remove unused StaticInferenceContext import
- Use inner model config for hidden_size/params_dtype instead of outer model
- Add buffer_size_gb param to create_mcore_engine and MegatronLLMDeployable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test ce646ce

@oyilmaz-nvidia oyilmaz-nvidia changed the title Add InferenceWrapperConfig for to fix mbridge inference Fix mbridge inference and use dynamic inference from mcore Mar 3, 2026
HAVE_NEMO,
MCoreTokenizerWrappper,
ckpt_to_context_subdir,
ckpt_to_weights_subdir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oyilmaz-nvidia do we want to move nemo 2.0 functionality here ? Can't we just remove it since nemo 2.0 deployment code is already removed anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's for importing the nemo and I'll have another PR to remove it. It's actually a lot more challenging than just adding here.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia oyilmaz-nvidia requested a review from a team as a code owner March 5, 2026 12:36
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test 3b99d12

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Contributor

/ok to test c5fdd40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants