Skip to content

Upgrade vLLM to v0.16.0#279

Open
junya-takayama wants to merge 6 commits intomainfrom
update_vllm_0.16.0
Open

Upgrade vLLM to v0.16.0#279
junya-takayama wants to merge 6 commits intomainfrom
update_vllm_0.16.0

Conversation

@junya-takayama
Copy link
Collaborator

@junya-takayama junya-takayama commented Feb 26, 2026

Update the dependency on vLLM to v0.16.0, including support for deprecations and removals.

@junya-takayama junya-takayama changed the title Upgrade vLLM to v0.16.0 [WIP] Upgrade vLLM to v0.16.0 Feb 26, 2026
@junya-takayama
Copy link
Collaborator Author

かなりのバージョンアップなので、念のためいくつかのモデル・ベンチマークで顕著なスコア差が出ないか確認。

このスクリプトを現行(vllm==0.10.2)環境と vllm==0.16.0 環境で実行

実行スクリプト
: ${SAVE_BASE_DIR:?"Need to set SAVE_BASE_DIR environment variable"}

eval_setups=(
    examples/sarashina_2_2_evaluation/configs/pretrained_evals/gsm8k.jsonnet
    examples/sarashina_2_2_evaluation/configs/pretrained_evals/niilcqa.jsonnet
)
models=(
    examples/sarashina_2_2_evaluation/configs/pretrained_models/Llama-3.2-3B.jsonnet
    examples/sarashina_2_2_evaluation/configs/pretrained_models/Qwen2.5-3B.jsonnet
    examples/sarashina_2_2_evaluation/configs/pretrained_models/sarashina2-7b.jsonnet
    examples/sarashina_2_2_evaluation/configs/pretrained_models/sarashina2-70b.jsonnet
    examples/sarashina_2_2_evaluation/configs/pretrained_models/sarashina2.2-3b.jsonnet
)

for eval_setup in ${eval_setups[@]}; do
    for language_model in ${models[@]}; do
        model_name=$(basename ${language_model%.*})
        eval_setup_name=$(basename ${eval_setup%.*})
        echo "Evaluating ${model_name} on ${eval_setup_name}"
        save_dir=${SAVE_BASE_DIR}/pretrained/${model_name}/${eval_setup_name}
        if [ -f ${save_dir}/metrics.json ]; then
            echo "results already exist. Skipping evaluation to avoid overwriting results."
            continue
        fi
        flexeval_lm --eval_setup ${eval_setup} --language_model ${language_model} --save_dir ${save_dir} --eval_setup.batch_size 10000 --force true
    done
done

eval_setups=(
    examples/sarashina_2_2_evaluation/configs/instruction_evals/elyza_tasks_100.jsonnet
)
models=(
    examples/sarashina_2_2_evaluation/configs/instruction_models/Llama-3.1-Swallow-8B-Instruct-v0.3.jsonnet
    examples/sarashina_2_2_evaluation/configs/instruction_models/llm-jp-3-7.2b-instruct3.jsonnet
    examples/sarashina_2_2_evaluation/configs/instruction_models/Qwen2.5-1.5B-Instruct.jsonnet
    examples/sarashina_2_2_evaluation/configs/instruction_models/Qwen2.5-7B-Instruct.jsonnet
    examples/sarashina_2_2_evaluation/configs/instruction_models/sarashina2.2-3b-instruct-v0.1.jsonnet
)

for eval_setup in ${eval_setups[@]}; do
    for language_model in ${models[@]}; do
        model_name=$(basename ${language_model%.*})
        eval_setup_name=$(basename ${eval_setup%.*})
        echo "Evaluating ${model_name} on ${eval_setup_name}"
        save_dir=${SAVE_BASE_DIR}/instruction/${model_name}/${eval_setup_name}
        if [ -f ${save_dir}/metrics.json ]; then
            echo "results already exist. Skipping evaluation to avoid overwriting results."
            continue
        fi
        flexeval_lm --eval_setup ${eval_setup} --language_model ${language_model} --save_dir ${save_dir} --eval_setup.batch_size 10000 --cleanup_after_generation true --force true
    done
done

niilcqa (pretrained)

model char_f1__v0.10.2 char_f1__v0.16.0 Δchar_f1
sarashina2.2-3b 0.7872 0.7872 0.0000
sarashina2-7b 0.7723 0.7723 0.0000
Llama-3.2-3B 0.3850 0.3841 -0.0009
Qwen2.5-3B 0.3213 0.3175 -0.0038
sarashina2-70b 0.8000 0.8000 0.0000

gsm8k (pretrained)

model math_verify_accuracy__v0.10.2 math_verify_accuracy__v0.16.0 Δmath_verify_accuracy
sarashina2.2-3b 0.6118 0.6133 0.0015
sarashina2-7b 0.1259 0.1251 -0.0008
Llama-3.2-3B 0.2995 0.3002 0.0008
Qwen2.5-3B 0.7817 0.7794 -0.0023
sarashina2-70b 0.7134 0.7172 0.0038

elyza-tasks-100 (chat)

model llm_score__v0.10.2 llm_score__v0.16.0 Δllm_score
Qwen2.5-7B-Instruct 3.0404 3.0300 -0.0104
Qwen2.5-1.5B-Instruct 1.9200 1.8700 -0.0500
sarashina2.2-3b-instruct-v0.1 3.4600 3.4000 -0.0600
Llama-3.1-Swallow-8B-Instruct-v0.3 3.4200 3.4400 0.0200
llm-jp-3-7.2b-instruct3 3.0000 2.9300 -0.0700



@pytest.fixture(scope="module")
def chat_lm_with_system_message() -> VLLM:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this function was originally placed here, but since it was causing errors, I moved it to test_vllm_specific.py .

vllm_log_probs = chat_lm.compute_log_probs(text_list)
hf_log_probs = hf_lm.compute_log_probs(text_list)
assert vllm_log_probs == pytest.approx(hf_log_probs, abs=1e-2)
assert vllm_log_probs == pytest.approx(hf_log_probs, abs=0.5)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that differences of this magnitude can occur depending on the seed (or the environment), so I widen the acceptable tolerance range.
The values are roughly around -33.2 (Qwen) and -47.2 (Sarashina), so I think allowing an error margin of about ±0.5 is reasonable in practice.

from vllm import RequestOutput, SamplingParams
from vllm.inputs import TokensPrompt
from vllm.sequence import Logprob
from vllm.logprobs import Logprob
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.llm = LLM(self.model_name, **self.model_kwargs)
if self.model_limit_tokens == "default":
self.model_limit_tokens = self.llm.llm_engine.get_model_config().max_model_len
self.model_limit_tokens = self.llm.llm_engine.model_config.max_model_len
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junya-takayama junya-takayama changed the title [WIP] Upgrade vLLM to v0.16.0 Upgrade vLLM to v0.16.0 Feb 27, 2026
]

max_length = self.llm.llm_engine.get_model_config().max_seq_len_to_capture
max_length = self.llm.llm_engine.model_config.max_model_len
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junya-takayama junya-takayama marked this pull request as ready for review March 2, 2026 03:10
@junya-takayama junya-takayama requested a review from a team March 2, 2026 06:33
Copy link
Contributor

@hwichan0720 hwichan0720 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants