-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
what's the last 3 lines function in Readme?
"--swap-space
--max-model-len
--gpu-memory-utilization 0.99"
it seems duplicated with the above.
python /app/vllm/benchmarks/benchmark_throughput.py
--model /data/llm/Meta-Llama-3.1-405B-Instruct
--dtype float16
--gpu-memory-utilization 0.9
--num-prompts 2000
--distributed-executor-backend mp
--num-scheduler-steps 10
--tensor-parallel-size 8
--input-len 128
--output-len 128
--swap-space 16
--max-model-len 8192
--max-num-batched-tokens 65536
--swap-space
--max-model-len
--gpu-memory-utilization 0.99
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels