Skip to content

Bump vllm from 0.11.2 to 0.13.0#411

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-0.13.0
Closed

Bump vllm from 0.11.2 to 0.13.0#411
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-0.13.0

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Dec 19, 2025

Bumps vllm from 0.11.2 to 0.13.0.

Release notes

Sourced from vllm's releases.

v0.13.0

vLLM v0.13.0 Release Notes Highlights

Highlights

This release features 442 commits from 207 contributors (61 new contributors)!

Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and attention configuration changes from environment variables to CLI arguments. Please review the breaking changes section carefully before upgrading.

Model Support

  • New models: BAGEL (AR only) (#28439), AudioFlamingo3 (#30539), JAIS 2 (#30188), latent MoE architecture support (#30203).
  • Tool parsers: DeepSeek-V3.2 (#29848), Gigachat 3 (#29905), Holo2 reasoning (#30048).
  • Model enhancements: Qwen3-VL embeddings support (#30037), Qwen3-VL EVS (Efficient Video Sampling) (#29752), DeepSeek V3.2 proper drop_thinking logic (#30490), DeepSeek V3.2 top-k fix (#27568).
  • Task expansion: Automatic TokenClassification model conversion (#30666), Ultravox v0.7 transformer projector (#30089).
  • Quantization: BitsAndBytes for Qwen3-Omni-MoE (#29896).
  • Speculative decoding: Eagle/Eagle3 Transformers backend (#30340), Mamba selective_state_update spec decode (#29488).

Engine Core

  • Compilation: Conditional compilation via compile_ranges for selective kernel compilation (#24252).
  • Prefix caching: xxHash high-performance hash option (#29163).
  • Attention: PrefixLM support for FlexAttention (#27938) and TritonAttention (#30386), CUDA graphs for 3D Triton attention (#28306), TRITON_MLA without prefix-caching (#29125).
  • Batch invariance: FA2 and LoRA batch-invariant support (#30018).
  • Pooling: Chunked prefill for ALL pooling tasks (#27145), multi-vector retrieval API (#26686).
  • Model Runner V2: Min-p sampling (#30171), NaN detection in logits (#30187).
  • Speculative decoding: Medusa GPU-CPU sync avoidance (#29723), async spec-decode improvements (#29624).
  • Whisper: Encoder batching (#29421), FULL_DECODE_ONLY CUDA graph (#30072).
  • Performance: Fused blockwise quant RMS norm (#27883), MoE LoRA loading reduction (#30243), encoder cache optimization (#30475), CPU KV offloading streams (#29013).

Hardware & Performance

  • NVIDIA Blackwell Ultra: SM103 (GB300) support with CUDA 13 (#30484).
  • DeepSeek optimizations (benchmarked on DeepSeek-V3.1):
    • DeepEP High-Throughput CUDA graph enabled by default: 5.3% throughput, 4.4% TTFT improvement (#29558)
    • DeepGEMM fused layout kernel: 4.3% throughput, 10.7% TTFT improvement (#29546)
    • DeepGEMM experts initialization: 3.9% TTFT improvement (#30494)
    • group_topk kernel: 1.9% throughput, 2.1% TPOT improvement (#30159)
    • Sparse prefill kernel for FP8 KV-cache in DeepSeek-V3.2 (#27532)
    • MLA FP8 optimization with ReduceScatterSum (#29795), direct k_nope/k_pe copy (#29710)
  • CPU: Whisper support (#30062), Arm Optimized Routines vectorized exp (#30068), x86 CPU wheel pipeline (#28848).
  • AMD ROCm: Aiter quantization kernels (#25552), torch.compile layernorm/silu + FP8 quant (#25693), Triton ScaledMM fallback (#26668), MXFP4 w4a4 inference (#29775).
  • Intel XPU: wNa16 compressed tensors (#29484).
  • Build: CUDA 13 aarch64 wheels (#30341), Docker kernel build stage (#29452), Ascend NPU Docker (#30015).

Large Scale Serving & Disaggregated Prefill/Decode

  • KV connectors: Mooncake Transfer Engine (#24718), cache reset via /reset_prefix_cache (#27170), KV events (#28309), failure recovery config (#26813).
  • NIXL: Compatibility checking in handshake (#29503), large batch proxy support (#28782).
  • EPLB: NVFP4 support (#29804), algorithm abstraction (#26471).
  • Multi-node: External launcher mode (#29833).
  • Hybrid allocator: Optional KV connector integration (#29805).
  • Performance: silu_mul_per_token_group_quant_fp8 kernel for DP/EP (#29470).

... (truncated)

Commits
  • 72506c9 Check for truthy rope_parameters not the existence of it (#30983)
  • b2eb84d [Bugfix] Remove tile_size=64 for mm_prefix triton attention (#30973)
  • ac43367 adds jais 2 support (#30188)
  • 30fe765 [Fix][FlexAttention] return max logical block index to handle reused blocks (...
  • 2c0ee0f [BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
  • 55f1fc1 [v1] Add PrefixLM support to TritonAttention backend (#30386)
  • 17f3988 [BugFix] Workspace allocation during profile run : DeepEPHighThroughput + Dee...
  • 682c385 [CI][Bugfix] Fix flaky `tests/entrypoints/openai/test_audio.py::test_chat_str...
  • f124b56 [XPU] fix broken fp8 online quantization for XPU platform (#30831)
  • d78e128 [Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models (#30829)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.11.2 to 0.13.0.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.11.2...v0.13.0)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.13.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Dec 19, 2025
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Jan 20, 2026

Superseded by #425.

@dependabot dependabot bot closed this Jan 20, 2026
@dependabot dependabot bot deleted the dependabot/pip/vllm-0.13.0 branch January 20, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants