Skip to content

skip target model mm emb computation when draft is text-only#1

Open
kkt-cohere wants to merge 102 commits intomainfrom
spec-mm-1
Open

skip target model mm emb computation when draft is text-only#1
kkt-cohere wants to merge 102 commits intomainfrom
spec-mm-1

Conversation

@kkt-cohere
Copy link
Owner

@kkt-cohere kkt-cohere commented Jan 29, 2026

Purpose

This PR makes the mm embedding gather in draft proposal step conditional not only on the target model's mm capabilities, but also the draft model. We dont need mm embeddings when draft model doesn't support multimodal inputs. Specifically, when self.drafter.supports_mm_inputs is False, the mm_embed_inputs assigned here and passed here to the drafter is not used at all (i.e., this block is skipped).

Test Plan

We rely on existing CI tests. In addition, we ran the following offline spec dec MM bench to sanity check backward compatibility.

VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --model-dir meta-llama/Llama-4-Scout-17B-16E-Instruct --eagle-dir morgendave/EAGLE-Llama-4-Scout-17B-16E-Instruct --num_spec_tokens 3 --tp 4 --num-prompts 12 --custom-mm-prompts

We saw MM cache misses leading to model crashing in prod traffic without this change.

Test Result

After this change, prod traffic is stable. And nothing existing breaks.

kkt-cohere and others added 30 commits January 29, 2026 14:25
Signed-off-by: kkt-cohere <komal@cohere.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: angelayi <yiangela7@gmail.com>
…t#33324)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…#32954)

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
…m-project#33326)

Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
…-project#32849)

Signed-off-by: Aidan Reilly <aireilly@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…llm-project#33359)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…ect#33352)

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
…oject#33239)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…t#33282)

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
…33372)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
…roject#33396)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…oject#33187)

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…project#33323)

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…#32286)

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
cmunley1 and others added 30 commits January 31, 2026 06:04
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ng kv cache update to splitting ops (vllm-project#33441)

Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
…capacity (vllm-project#33110)

Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
…t#33477)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…vllm-project#33473)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
)

Signed-off-by: yang.xiao <yang.xiao@daocloud.io>
)

Signed-off-by: linhaifeng <1371675203@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: smashyalts <smashyalts@gmail.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
…oE kernels (vllm-project#33417)

Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>
…uantFP8` class. (vllm-project#33047)

Signed-off-by: maral <maralbahari.98@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ing allow inference Omni on ROCM (vllm-project#33077)

Signed-off-by: JartX <sagformas@epdcenter.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.