Highlights
- vLLM support for Megatron-Bridge LLM checkpoints.
- Remove NeMo 2.0 support.
- Deployment of Megatron-Bridge VLM checkpoints
Changelog Details
- Eval logprob benchmarks support for HF via vLLM with Ray by @athitten :: PR: #479
- feat: add labeler by @pablo-garay :: PR: #483
- Support apply_chat_template in NeMo MM in-framework deployment by @meatybobby :: PR: #440
- NeMo-Export-Deploy 0.2.1 changelog by @pablo-garay :: PR: #489
- Add torch_dtype and default values by @oyilmaz-nvidia :: PR: #466
- Fix max token input by @oyilmaz-nvidia :: PR: #478
- Remove scheduled cron job from release workflow by @pablo-garay :: PR: #494
- feat: Add anchor by @pablo-garay :: PR: #495
- [Eval] Fixes for compatibility between Pytriton, Ray deployments with nemo-run by @athitten :: PR: #501
- New script path by @oyilmaz-nvidia :: PR: #487
- Update trt-llm doc for nemo 2 by @oyilmaz-nvidia :: PR: #506
- Change type for --runtime_env in ray in-fw deployment script by @athitten :: PR: #505
- fix : New peft release adjust fix by @pablo-garay :: PR: #514
- fix: ensure vLLM receives valid params regardless of env changes by @pablo-garay :: PR: #516
- Fix minor doc issue by @oyilmaz-nvidia :: PR: #521
- Update changelog for release 0.3.0 by @oyilmaz-nvidia :: PR: #522
- Update nvidia-sphinx-theme by @chtruong814 :: PR: #528
- Update changelog for version 0.3.1 by @pablo-garay :: PR: #537
- Minor fixes for MBridge nemotron deployment by @athitten :: PR: #518
- docs: Update docs version to latest by @chtruong814 :: PR: #553
- docs: Fixing version1.json by @aschilling-nv :: PR: #554
- Properly Handle DynamicInferenceRequestRecord with latest Mcore by @chtruong814 :: PR: #559
- Add vllm support for mbridge by @oyilmaz-nvidia :: PR: #555
- Temp fix for k8s issue by @ko3n1g :: PR: #565
- ci: Enable AWS runners by @chtruong814 :: PR: #557
- docs: Release docs by @ko3n1g :: PR: #566
- Remove nemo from in-framework deployment by @oyilmaz-nvidia :: PR: #568
- Fix chat endpoint support for Ray in-framework MBridge deployment by @athitten :: PR: #572
- build: Update dependencies for 26.02 by @chtruong814 :: PR: #567
- Remove nemo2 vllm support by @oyilmaz-nvidia :: PR: #571
- Update multimodal in-framework FastAPI from NeMo to Megatron Bridge by @meatybobby :: PR: #511
- Fix chat endpoint support for HF deployment with Ray by @athitten :: PR: #575
- Add Ray Serve Deployment Support for Multimodal Models by @meatybobby :: PR: #574
- cp:
Add apply_chat_template to HF vllm Ray deployment (581)intor0.4.0by @ko3n1g :: PR: #582 - cp:
Remove more nemo2 and unused code. (584)intor0.4.0by @ko3n1g :: PR: #587 - cp:
docs: Remove uv sync with uv_args (586)intor0.4.0by @ko3n1g :: PR: #591 - cp:
Add inference_max_seq_len to ray mbridge deployment path (588)intor0.4.0by @ko3n1g :: PR: #593 - cp: Fix wheel build test and publish (#595) in r0.4.0 by @chtruong814 :: PR: #596
- cp: Re-enable onnx test (#597) in r0.4.0 by @chtruong814 :: PR: #598
- cp:
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 (599)intor0.4.0by @ko3n1g :: PR: #601 - cp:
ci: Update release workflows to include changelog and docs (604)intor0.4.0by @ko3n1g :: PR: #607 - cp:
build: Remove torchao (606)intor0.4.0by @ko3n1g :: PR: #610 - cp: build: Upgrade vllm to 0.14.1 (#609) into r0.4.0 by @chtruong814 :: PR: #611
- docs: Update docs for 0.4.0 by @chtruong814 :: PR: #612
- cp:
Update CI docker image and set vllm eager enforce_eager to False (614)intor0.4.0by @svcnvidia-nemo-ci :: PR: #617 - docs: Update docs version for 0.4.0 release by @chtruong814 :: PR: #620