| title | vLLM |
|---|
Dynamo vLLM integrates vLLM engines into Dynamo's distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with vLLM's native engine arguments. Dynamo leverages vLLM's native KV cache events, NIXL-based transfer mechanisms, and metric reporting to enable KV-aware routing and P/D disaggregation.
We recommend using uv to install:
uv venv --python 3.12 --seed
uv pip install "ai-dynamo[vllm]"This installs Dynamo with the compatible vLLM version.
We have public images available on NGC Catalog:
docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:<version>
./container/run.sh -it --framework VLLM --image nvcr.io/nvidia/ai-dynamo/vllm-runtime:<version>python container/render.py --framework vllm --output-short-filename
docker build -f container/rendered.Dockerfile -t dynamo:latest-vllm ../container/run.sh -it --framework VLLM [--mount-workspace]For development, use the devcontainer which has all dependencies pre-installed.
| Feature | Status | Notes |
|---|---|---|
| Disaggregated Serving | ✅ | Prefill/decode separation with NIXL KV transfer |
| KV-Aware Routing | ✅ | |
| SLA-Based Planner | ✅ | |
| KVBM | ✅ | |
| LMCache | ✅ | |
| FlexKV | ✅ | |
| Multimodal Support | ✅ | Via vLLM-Omni integration |
| Observability | ✅ | Metrics and monitoring |
| WideEP | ✅ | Support for DeepEP |
| DP Rank Routing | ✅ | Hybrid load balancing via external DP rank control |
| LoRA | ✅ | Dynamic loading/unloading from S3-compatible storage |
| GB200 Support | ✅ | Container functional on main |
Start infrastructure services for local development:
docker compose -f deploy/docker-compose.yml up -dLaunch an aggregated serving deployment:
cd $DYNAMO_HOME/examples/backends/vllm
bash launch/agg.sh- Reference Guide: Configuration, arguments, and operational details
- Examples: All deployment patterns with launch scripts
- KV Cache Offloading: KVBM, LMCache, and FlexKV integrations
- Observability: Metrics and monitoring
- vLLM-Omni: Multimodal model serving
- Kubernetes Deployment: Kubernetes deployment guide
- vLLM Documentation: Upstream vLLM serve arguments