title	vLLM

LLM Deployment using vLLM

Dynamo vLLM integrates vLLM engines into Dynamo's distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with vLLM's native engine arguments. Dynamo leverages vLLM's native KV cache events, NIXL-based transfer mechanisms, and metric reporting to enable KV-aware routing and P/D disaggregation.

Installation

Install Latest Release

We recommend using uv to install:

uv venv --python 3.12 --seed
uv pip install "ai-dynamo[vllm]"

This installs Dynamo with the compatible vLLM version.

Container

We have public images available on NGC Catalog:

docker pull nvcr.io/nvidia/ai-dynamo/vllm-runtime:<version>
./container/run.sh -it --framework VLLM --image nvcr.io/nvidia/ai-dynamo/vllm-runtime:<version>

python container/render.py --framework vllm --output-short-filename
docker build -f container/rendered.Dockerfile -t dynamo:latest-vllm .

./container/run.sh -it --framework VLLM [--mount-workspace]

Development Setup

For development, use the devcontainer which has all dependencies pre-installed.

Feature Support Matrix

Feature	Status	Notes
Disaggregated Serving	✅	Prefill/decode separation with NIXL KV transfer
KV-Aware Routing	✅
SLA-Based Planner	✅
KVBM	✅
LMCache	✅
FlexKV	✅
Multimodal Support	✅	Via vLLM-Omni integration
Observability	✅	Metrics and monitoring
WideEP	✅	Support for DeepEP
DP Rank Routing	✅	Hybrid load balancing via external DP rank control
LoRA	✅	Dynamic loading/unloading from S3-compatible storage
GB200 Support	✅	Container functional on main

Quick Start

Start infrastructure services for local development:

docker compose -f deploy/docker-compose.yml up -d

Launch an aggregated serving deployment:

cd $DYNAMO_HOME/examples/backends/vllm
bash launch/agg.sh

Next Steps

Reference Guide: Configuration, arguments, and operational details
Examples: All deployment patterns with launch scripts
KV Cache Offloading: KVBM, LMCache, and FlexKV integrations
Observability: Metrics and monitoring
vLLM-Omni: Multimodal model serving
Kubernetes Deployment: Kubernetes deployment guide
vLLM Documentation: Upstream vLLM serve arguments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Deployment using vLLM

Installation

Install Latest Release

Container

Development Setup

Feature Support Matrix

Quick Start

Next Steps

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LLM Deployment using vLLM

Installation

Install Latest Release

Container

Development Setup

Feature Support Matrix

Quick Start

Next Steps