Skip to content

Bump vllm from 0.11.2 to 0.15.0#432

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-0.15.0
Closed

Bump vllm from 0.11.2 to 0.15.0#432
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-0.15.0

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jan 29, 2026

Bumps vllm from 0.11.2 to 0.15.0.

Release notes

Sourced from vllm's releases.

v0.15.0

Highlights

This release features 335 commits from 158 contributors (39 new)!

Model Support

  • New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B VLM (#32456).
  • LoRA expansion: Nemotron-H (#30802), InternVL2 (#32397), MiniMax M2 (#32763).
  • Speculative decoding: EAGLE3 for Pixtral/LlavaForConditionalGeneration (#32542), Qwen3 VL MoE (#32048), draft model support (#24322).
  • Embeddings: BGE-M3 sparse embeddings and ColBERT embeddings (#14526).
  • Model enhancements: Voxtral streaming architecture (#32861), SharedFusedMoE for Qwen3MoE (#32082), dynamic resolution for Nemotron Nano VL (#32121), Molmo2 vision backbone quantization (#32385).

Engine Core

  • Async scheduling + Pipeline Parallelism: --async-scheduling now works with pipeline parallelism (#32359).
  • Mamba prefix caching: Block-aligned prefix caching for Mamba/hybrid models with --enable-prefix-caching --mamba-cache-mode align. Achieves ~2x speedup by caching Mamba states directly (#30877).
  • Session-based streaming input: New incremental input support for interactive workloads like ASR. Accepts async generators producing StreamingInput objects while maintaining KV cache alignment (#28973).
  • Model Runner V2: VLM support (#32546), architecture improvements.
  • LoRA: Inplace loading for memory efficiency (#31326).
  • AOT compilation: torch.compile inductor artifacts support (#25205).
  • Performance: KV cache offloading redundant load prevention (#29087), FlashAttn attention/cache update separation (#25954).

Hardware & Performance

NVIDIA

  • Blackwell defaults: FlashInfer MLA is now the default MLA backend on Blackwell, with TRTLLM as default prefill (#32615).
  • MoE performance: 1.2-2% E2E throughput improvement via grouped topk kernel fusion (#32058), NVFP4 small-batch decoding improvement (#30885), faster cold start for MoEs with torch.compile (#32805).
  • FP4 kernel optimization: Up to 65% faster FP4 quantization on Blackwell (SM100F) using 256-bit loads, ~4% E2E throughput improvement (#32520).
  • Kernel improvements: topk_sigmoid kernel for MoE routing (#31246), atomics reduce counting for SplitK skinny GEMMs (#29843), fused cat+quant for FP8 KV cache in MLA (#32950).
  • torch.compile: SiluAndMul and QuantFP8 CustomOp compilation (#32806), Triton prefill attention performance (#32403).

AMD ROCm

  • MoRI EP: High-performance all2all backend for Expert Parallel (#28664).
  • Attention improvements: Shuffle KV cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887).
  • FP4 support: MLA projection GEMMs with dynamic quantization (#32238).
  • Consumer GPU support: Flash Attention Triton backend on RDNA3/RDNA4 (#32944).

Other Platforms

  • TPU: Pipeline parallelism support (#28506), backend option (#32438).
  • Intel XPU: AgRsAll2AllManager for distributed communication (#32654).
  • CPU: NUMA-aware acceleration for TP/DP inference on ARM (#32792), PyTorch 2.10 (#32869).
  • Whisper: torch.compile support (#30385).
  • WSL: Platform compatibility fix for Windows Subsystem for Linux (#32749).

Quantization

  • MXFP4: W4A16 support for compressed-tensors MoE models (#32285).
  • Non-gated MoE: Quantization support with Marlin, NVFP4 CUTLASS, FP8, INT8, and compressed-tensors (#32257).
  • Intel: Quantization Toolkit integration (#31716).
  • FP8 KV cache: Per-tensor and per-attention-head quantization via llmcompressor (#30141).

API & Frontend

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.11.2 to 0.15.0.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.11.2...v0.15.0)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.15.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jan 29, 2026
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Feb 5, 2026

Superseded by #435.

@dependabot dependabot bot closed this Feb 5, 2026
@dependabot dependabot bot deleted the dependabot/pip/vllm-0.15.0 branch February 5, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants