Bump vllm from 0.11.2 to 0.15.0 by dependabot[bot] · Pull Request #432 · allenai/olmocr

dependabot · 2026-01-29T22:23:41Z

Bumps vllm from 0.11.2 to 0.15.0.

Release notes

v0.15.0

Highlights

This release features 335 commits from 158 contributors (39 new)!

Model Support

New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B VLM (#32456).

LoRA expansion: Nemotron-H (#30802), InternVL2 (#32397), MiniMax M2 (#32763).

Speculative decoding: EAGLE3 for Pixtral/LlavaForConditionalGeneration (#32542), Qwen3 VL MoE (#32048), draft model support (#24322).

Embeddings: BGE-M3 sparse embeddings and ColBERT embeddings (#14526).

Model enhancements: Voxtral streaming architecture (#32861), SharedFusedMoE for Qwen3MoE (#32082), dynamic resolution for Nemotron Nano VL (#32121), Molmo2 vision backbone quantization (#32385).

Engine Core

Async scheduling + Pipeline Parallelism: --async-scheduling now works with pipeline parallelism (#32359).

Mamba prefix caching: Block-aligned prefix caching for Mamba/hybrid models with --enable-prefix-caching --mamba-cache-mode align. Achieves ~2x speedup by caching Mamba states directly (#30877).

Session-based streaming input: New incremental input support for interactive workloads like ASR. Accepts async generators producing StreamingInput objects while maintaining KV cache alignment (#28973).

Model Runner V2: VLM support (#32546), architecture improvements.

LoRA: Inplace loading for memory efficiency (#31326).

AOT compilation: torch.compile inductor artifacts support (#25205).

Performance: KV cache offloading redundant load prevention (#29087), FlashAttn attention/cache update separation (#25954).

Hardware & Performance

NVIDIA

Blackwell defaults: FlashInfer MLA is now the default MLA backend on Blackwell, with TRTLLM as default prefill (#32615).

MoE performance: 1.2-2% E2E throughput improvement via grouped topk kernel fusion (#32058), NVFP4 small-batch decoding improvement (#30885), faster cold start for MoEs with torch.compile (#32805).

FP4 kernel optimization: Up to 65% faster FP4 quantization on Blackwell (SM100F) using 256-bit loads, ~4% E2E throughput improvement (#32520).

Kernel improvements: topk_sigmoid kernel for MoE routing (#31246), atomics reduce counting for SplitK skinny GEMMs (#29843), fused cat+quant for FP8 KV cache in MLA (#32950).

torch.compile: SiluAndMul and QuantFP8 CustomOp compilation (#32806), Triton prefill attention performance (#32403).

AMD ROCm

MoRI EP: High-performance all2all backend for Expert Parallel (#28664).

Attention improvements: Shuffle KV cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887).

FP4 support: MLA projection GEMMs with dynamic quantization (#32238).

Consumer GPU support: Flash Attention Triton backend on RDNA3/RDNA4 (#32944).

Other Platforms

TPU: Pipeline parallelism support (#28506), backend option (#32438).

Intel XPU: AgRsAll2AllManager for distributed communication (#32654).

CPU: NUMA-aware acceleration for TP/DP inference on ARM (#32792), PyTorch 2.10 (#32869).

Whisper: torch.compile support (#30385).

WSL: Platform compatibility fix for Windows Subsystem for Linux (#32749).

Quantization

MXFP4: W4A16 support for compressed-tensors MoE models (#32285).

Non-gated MoE: Quantization support with Marlin, NVFP4 CUTLASS, FP8, INT8, and compressed-tensors (#32257).

Intel: Quantization Toolkit integration (#31716).

FP8 KV cache: Per-tensor and per-attention-head quantization via llmcompressor (#30141).

API & Frontend

... (truncated)

Commits

f176443 [Release] [CI] Optim release pipeline (#33156)
fe18ce4 Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
5f7f9ea Relax protobuf library version constraints (#33202)
7779de3 [BugFix] Fix P/D with non-MoE DP (#33037)
0d8ce32 [Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 (#33090)
d51e1f8 [Bugfix] Disable CG for Whisper+FA2 (#33164)
5042815 [Models] Kimi-K2.5 (#33131)
afb390a [CI] Fix AssertionError: MCP tool call not found in output_messages (#33093)
cf1167e [Bugfix] Fix Dtypes for Pynccl Wrapper (#33030)
11b5568 [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.11.2 to 0.15.0. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md) - [Commits](vllm-project/vllm@v0.11.2...v0.15.0) --- updated-dependencies: - dependency-name: vllm dependency-version: 0.15.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-02-05T22:24:08Z

Superseded by #435.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jan 29, 2026

dependabot bot mentioned this pull request Jan 29, 2026

Bump vllm from 0.11.2 to 0.14.1 #431

Closed

dependabot bot closed this Feb 5, 2026

dependabot bot deleted the dependabot/pip/vllm-0.15.0 branch February 5, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump vllm from 0.11.2 to 0.15.0#432

Bump vllm from 0.11.2 to 0.15.0#432
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-0.15.0

dependabot bot commented on behalf of github Jan 29, 2026

Uh oh!

dependabot bot commented on behalf of github Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot bot commented on behalf of github Jan 29, 2026

v0.15.0

Highlights

Model Support

Engine Core

Hardware & Performance

NVIDIA

AMD ROCm

Other Platforms

Quantization

API & Frontend

Uh oh!

dependabot bot commented on behalf of github Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants