Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
bf53e0c
Support torchrun and SPMD-style offline inference (#12071)
youkaichao Jan 16, 2025
92e793d
[core] LLM.collective_rpc interface and RLHF example (#12084)
youkaichao Jan 16, 2025
874f7c2
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
ywang96 Jan 16, 2025
5fd24ec
[misc] Add LoRA kernel micro benchmarks (#11579)
varun-sundar-rabindranath Jan 16, 2025
62b06ba
[Model] Add support for deepseek-vl2-tiny model (#12068)
Isotr0py Jan 16, 2025
d06e824
[Bugfix] Set enforce_eager automatically for mllama (#12127)
heheda12345 Jan 16, 2025
ebc73f2
[Bugfix] Fix a path bug in disaggregated prefill example script. (#12…
KuntaiDu Jan 17, 2025
fead53b
[CI]add genai-perf benchmark in nightly benchmark (#10704)
jikunshang Jan 17, 2025
1475847
[Doc] Add instructions on using Podman when SELinux is active (#12136)
terrytangyuan Jan 17, 2025
b8bfa46
[Bugfix] Fix issues in CPU build Dockerfile (#12135)
terrytangyuan Jan 17, 2025
d1adb9b
[BugFix] add more `is not None` check in VllmConfig.__post_init__ (#1…
heheda12345 Jan 17, 2025
d75ab55
[Misc] Add deepseek_vl2 chat template (#12143)
Isotr0py Jan 17, 2025
8027a72
[ROCm][MoE] moe tuning support for rocm (#12049)
divakar-amd Jan 17, 2025
69d765f
[V1] Move more control of kv cache initialization from model_executor…
heheda12345 Jan 17, 2025
07934cc
[Misc][LoRA] Improve the readability of LoRA error messages (#12102)
jeejeelee Jan 17, 2025
d4e6194
[CI/Build][CPU][Bugfix] Fix CPU CI (#12150)
bigPYJ1151 Jan 17, 2025
87a0c07
[core] allow callable in collective_rpc (#12151)
youkaichao Jan 17, 2025
58fd57f
[Bugfix] Fix score api for missing max_model_len validation (#12119)
wallashss Jan 17, 2025
54cacf0
[Bugfix] Mistral tokenizer encode accept list of str (#12149)
jikunshang Jan 17, 2025
b5b57e3
[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134)
gshtras Jan 17, 2025
7b98a65
[torch.compile] disable logging when cache is disabled (#12043)
youkaichao Jan 17, 2025
2b83503
[misc] fix cross-node TP (#12166)
youkaichao Jan 18, 2025
c09503d
[AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172)
hongxiayang Jan 18, 2025
da02cb4
[core] further polish memory profiling (#12126)
youkaichao Jan 18, 2025
813f249
[Docs] Fix broken link in SECURITY.md (#12175)
russellb Jan 18, 2025
02798ec
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Isotr0py Jan 18, 2025
6d0e3d3
[core] clean up executor class hierarchy between v1 and v0 (#12171)
youkaichao Jan 18, 2025
32eb0da
[Misc] Support register quantization method out-of-tree (#11969)
ice-tong Jan 19, 2025
7a8a48d
[V1] Collect env var for usage stats (#12115)
simon-mo Jan 19, 2025
4e94951
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu (#…
madamczyk-intel Jan 19, 2025
630eb5b
[Bugfix] Fix multi-modal processors for transformers 4.48 (#12187)
DarkLight1337 Jan 19, 2025
e66faf4
[torch.compile] store inductor compiled Python file (#12182)
youkaichao Jan 19, 2025
936db11
benchmark_serving support --served-model-name param (#12109)
gujingit Jan 19, 2025
edaae19
[Misc] Add BNB support to GLM4-V model (#12184)
Isotr0py Jan 19, 2025
81763c5
[V1] Add V1 support of Qwen2-VL (#12128)
ywang96 Jan 19, 2025
bbe5f9d
[Model] Support for fairseq2 Llama (#11442)
MartinGleize Jan 19, 2025
df450aa
[Bugfix] Fix num_heads value for simple connector when tp enabled (#1…
ShangmingCai Jan 20, 2025
51ef828
[torch.compile] fix sym_tensor_indices (#12191)
youkaichao Jan 20, 2025
3ea7b94
Move linting to `pre-commit` (#11975)
hmellor Jan 20, 2025
c5c0620
[DOC] Fix typo in docstring and assert message (#12194)
terrytangyuan Jan 20, 2025
d264312
[DOC] Add missing docstring in LLMEngine.add_request() (#12195)
terrytangyuan Jan 20, 2025
0974c9b
[Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196)
terrytangyuan Jan 20, 2025
8360979
[Model] Add Qwen2 PRM model support (#12202)
Isotr0py Jan 20, 2025
59a0192
[Core] Interface for accessing model from `VllmRunner` (#10353)
DarkLight1337 Jan 20, 2025
5c89a29
[misc] add placeholder format.sh (#12206)
youkaichao Jan 20, 2025
4001ea1
[CI/Build] Remove dummy CI steps (#12208)
DarkLight1337 Jan 20, 2025
bfa00ad
Merge NPU related pr to apply_plugin
MengqingCao Jan 20, 2025
146dc8b
Merge branch 'apply_plugin' into apply_plugin0120
MengqingCao Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/nightly-benchmarks/scripts/nightly-annotate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ main() {



# The figures should be genereated by a separate process outside the CI/CD pipeline
# The figures should be generated by a separate process outside the CI/CD pipeline

# # generate figures
# python3 -m pip install tabulate pandas matplotlib
Expand Down
107 changes: 107 additions & 0 deletions .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,104 @@ run_serving_tests() {
kill_gpu_processes
}

run_genai_perf_tests() {
# run genai-perf tests

# $1: a json file specifying genai-perf test cases
local genai_perf_test_file
genai_perf_test_file=$1

# Iterate over genai-perf tests
jq -c '.[]' "$genai_perf_test_file" | while read -r params; do
# get the test name, and append the GPU type back to it.
test_name=$(echo "$params" | jq -r '.test_name')

# if TEST_SELECTOR is set, only run the test cases that match the selector
if [[ -n "$TEST_SELECTOR" ]] && [[ ! "$test_name" =~ $TEST_SELECTOR ]]; then
echo "Skip test case $test_name."
continue
fi

# prepend the current serving engine to the test name
test_name=${CURRENT_LLM_SERVING_ENGINE}_${test_name}

# get common parameters
common_params=$(echo "$params" | jq -r '.common_parameters')
model=$(echo "$common_params" | jq -r '.model')
tp=$(echo "$common_params" | jq -r '.tp')
dataset_name=$(echo "$common_params" | jq -r '.dataset_name')
dataset_path=$(echo "$common_params" | jq -r '.dataset_path')
port=$(echo "$common_params" | jq -r '.port')
num_prompts=$(echo "$common_params" | jq -r '.num_prompts')
reuse_server=$(echo "$common_params" | jq -r '.reuse_server')

# get client and server arguments
server_params=$(echo "$params" | jq -r ".${CURRENT_LLM_SERVING_ENGINE}_server_parameters")
qps_list=$(echo "$params" | jq -r '.qps_list')
qps_list=$(echo "$qps_list" | jq -r '.[] | @sh')
echo "Running over qps list $qps_list"

# check if there is enough GPU to run the test
if [[ $gpu_count -lt $tp ]]; then
echo "Required num-shard $tp but only $gpu_count GPU found. Skip testcase $test_name."
continue
fi

if [[ $reuse_server == "true" ]]; then
echo "Reuse previous server for test case $test_name"
else
kill_gpu_processes
bash "$VLLM_SOURCE_CODE_LOC/.buildkite/nightly-benchmarks/scripts/launch-server.sh" \
"$server_params" "$common_params"
fi

if wait_for_server; then
echo ""
echo "$CURRENT_LLM_SERVING_ENGINE server is up and running."
else
echo ""
echo "$CURRENT_LLM_SERVING_ENGINE failed to start within the timeout period."
break
fi

# iterate over different QPS
for qps in $qps_list; do
# remove the surrounding single quote from qps
if [[ "$qps" == *"inf"* ]]; then
echo "qps was $qps"
qps=$num_prompts
echo "now qps is $qps"
fi

new_test_name=$test_name"_qps_"$qps
backend=$CURRENT_LLM_SERVING_ENGINE

if [[ "$backend" == *"vllm"* ]]; then
backend="vllm"
fi
#TODO: add output dir.
client_command="genai-perf profile \
-m $model \
--service-kind openai \
--backend vllm \
--endpoint-type chat \
--streaming \
--url localhost:$port \
--request-rate $qps \
--num-prompts $num_prompts \
"

echo "Client command: $client_command"

eval "$client_command"

#TODO: process/record outputs
done
done

kill_gpu_processes

}

prepare_dataset() {

Expand Down Expand Up @@ -328,12 +426,17 @@ main() {

pip install -U transformers

pip install -r requirements-dev.txt
which genai-perf

# check storage
df -h

ensure_installed wget
ensure_installed curl
ensure_installed jq
# genai-perf dependency
ensure_installed libb64-0d

prepare_dataset

Expand All @@ -345,6 +448,10 @@ main() {
# run the test
run_serving_tests "$BENCHMARK_ROOT/tests/nightly-tests.json"

# run genai-perf tests
run_genai_perf_tests "$BENCHMARK_ROOT/tests/genai-perf-tests.json"
mv artifacts/ $RESULTS_FOLDER/

# upload benchmark results to buildkite
python3 -m pip install tabulate pandas
python3 "$BENCHMARK_ROOT/scripts/summary-nightly-results.py"
Expand Down
23 changes: 23 additions & 0 deletions .buildkite/nightly-benchmarks/tests/genai-perf-tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[
{
"test_name": "llama8B_tp1_genai_perf",
"qps_list": [4,8,16,32],
"common_parameters": {
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"tp": 1,
"port": 8000,
"num_prompts": 500,
"reuse_server": false
},
"vllm_server_parameters": {
"disable_log_stats": "",
"disable_log_requests": "",
"gpu_memory_utilization": 0.9,
"num_scheduler_steps": 10,
"max_num_seqs": 512,
"dtype": "bfloat16"
},
"genai_perf_input_parameters": {
}
}
]
4 changes: 2 additions & 2 deletions .buildkite/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,6 @@ function cpu_tests() {
tests/lora/test_qwen2vl.py"
}

# All of CPU tests are expected to be finished less than 25 mins.
# All of CPU tests are expected to be finished less than 40 mins.
export -f cpu_tests
timeout 30m bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
timeout 40m bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
10 changes: 8 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ steps:
- tests/worker
- tests/standalone_tests/lazy_torch_compile.py
commands:
- pip install git+https://github.com/Isotr0py/DeepSeek-VL2.git # Used by multimoda processing test
- python3 standalone_tests/lazy_torch_compile.py
- pytest -v -s mq_llm_engine # MQLLMEngine
- pytest -v -s async_engine # AsyncLLMEngine
Expand Down Expand Up @@ -107,7 +106,7 @@ steps:
source_file_dependencies:
- vllm/
commands:
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py --ignore=entrypoints/llm/test_collective_rpc.py
- pytest -v -s entrypoints/llm/test_lazy_outlines.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate_multiple_loras.py # it needs a clean process
Expand All @@ -126,11 +125,15 @@ steps:
- tests/distributed
- tests/spec_decode/e2e/test_integration_dist_tp4
- tests/compile
- examples/offline_inference/rlhf.py
commands:
- pytest -v -s distributed/test_utils.py
- pytest -v -s compile/test_basic_correctness.py
- pytest -v -s distributed/test_pynccl.py
- pytest -v -s spec_decode/e2e/test_integration_dist_tp4.py
# TODO: create a dedicated test section for multi-GPU example tests
# when we have multiple distributed example tests
- python3 ../examples/offline_inference/rlhf.py

- label: Metrics, Tracing Test # 10min
num_gpus: 2
Expand Down Expand Up @@ -462,7 +465,10 @@ steps:
- vllm/worker/worker_base.py
- vllm/worker/worker.py
- vllm/worker/model_runner.py
- entrypoints/llm/test_collective_rpc.py
commands:
- pytest -v -s entrypoints/llm/test_collective_rpc.py
- torchrun --nproc-per-node=2 distributed/test_torchrun_example.py
- pytest -v -s ./compile/test_basic_correctness.py
- pytest -v -s ./compile/test_wrapper.py
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep 'Same node test passed'
Expand Down
40 changes: 0 additions & 40 deletions .github/workflows/actionlint.yml

This file was deleted.

53 changes: 0 additions & 53 deletions .github/workflows/clang-format.yml

This file was deleted.

45 changes: 0 additions & 45 deletions .github/workflows/codespell.yml

This file was deleted.

32 changes: 0 additions & 32 deletions .github/workflows/doc-lint.yml

This file was deleted.

Loading
Loading