GVM

GVM is an OS-level GPU virtualization layer which achieves hardware-like performance isolation while preserving the flexibility of software-based sharing GVM provides cgroup-like APIs for GPU applications so you can check and operate GPU applications like what you did on CPU applications. For details, please check here.

API	Description
memory.limit	Check or set the maximum amount of memory that the application can allocate on GPU
memory.current	Get the current memory usage of the application on GPU
memory.swap.current	Get the current amount of memory swapped to host of the application on GPU
compute.priority	Get or set the compute priority of the application on GPU (0-15. lower is higher priority)
compute.freeze	Freeze or unfreeze the application on GPU
gcgroup.stat	Get statistics about the application

Performance

The figure shows the performance benefits of GVM when colocating high priority task vllm and low priority task diffusion on A100-40G GPU. GVM can achieve 59x better p99 TTFT in high priority task compared to second best baseline while still get the highert throughput on low priority task. Thanks to @boyuan for decorating figure.

Requirements

GVM NVIDIA GPU Driver installed
GVM CUDA Driver Intercept Layer installed
Dependencies:
1. python3 python3-pip python3-venv
2. gcc g++ make cmake
3. cuda-toolkit nvidia-open

Install applications

./setup {llama.cpp|diffusion|llamafactory|vllm|sglang}

Example

diffuser

Launch your diffuser:

source diffuser/bin/activate
export LD_LIBRARY_PATH=<GVM Intercept Layer install dir>:$LD_LIBRARY_PATH
python3 diffuser/diffusion.py --dataset_path=diffuser/vidprom.txt --log_file=diffuser/stats.txt

Get pid of diffuser:

export pid=<pid of diffuser showed on nvidia-smi>

Check kernel submission stats:

cat /sys/kernel/debug/nvidia-uvm/processes/$pid/0/gcgroup.stat

Check memory stats:

cat /sys/kernel/debug/nvidia-uvm/processes/$pid/0/memory.current
cat /sys/kernel/debug/nvidia-uvm/processes/$pid/0/memory.swap.current

Limit memory usage:

echo <memory limit in bytes> | sudo tee /sys/kernel/debug/nvidia-uvm/processes/$pid/0/memory.limit

vllm + diffuser

Launch your vllm:

source vllm/bin/activate
export LD_LIBRARY_PATH=<GVM Intercept Layer install dir>:$LD_LIBRARY_PATH
vllm serve meta-llama/Llama-3.2-3B --gpu-memory-utilization 0.8 --disable-log-requests --enforce-eager

Launch your diffuser:

source diffuser/bin/activate
export LD_LIBRARY_PATH=<GVM Intercept Layer install dir>:$LD_LIBRARY_PATH
python3 diffuser/diffusion.py --dataset_path=diffuser/vidprom.txt --log_file=diffuser/stats.txt

Get pid of diffuser and vllm:

export diffuserpid=<pid of diffuser showed on nvidia-smi>
export vllmpid=<pid of vllm showed on nvidia-smi>

Check compute priority of vllm:

cat /sys/kernel/debug/nvidia-uvm/processes/$vllmpid/0/compute.priority

Set compute priority of vllm to 2 to use a larger timeslice:

echo 2 | sudo tee /sys/kernel/debug/nvidia-uvm/processes/$vllmpid/0/compute.priority

Limit memory usage of diffuser to ~6GB to make enough room for vllm to run:

echo 6000000000 | sudo tee /sys/kernel/debug/nvidia-uvm/processes/$diffuserpid/0/memory.limit

Generate workloads for vllm:

source vllm/bin/activate
vllm bench serve \
    --model meta-llama/Llama-3.2-3B \
    --dataset-name random \
    --random-input-len 256 \
    --random-output-len 256 \
    --num-prompts 512 \
    --request-rate 32

Preempt diffuser for even higher vllm performance:

echo 1 | sudo tee /sys/kernel/debug/nvidia-uvm/processes/$diffuserpid/0/compute.freeze

After vllm workloads stop, reschedule diffuser:

echo 0 | sudo tee /sys/kernel/debug/nvidia-uvm/processes/$diffuserpid/0/compute.freeze

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
assets		assets
gvm-cuda-driver @ a813077		gvm-cuda-driver @ a813077
gvm-nvidia-driver-modules @ b1b0c38		gvm-nvidia-driver-modules @ b1b0c38
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
setup		setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GVM

Performance

Requirements

Install applications

Example

diffuser

vllm + diffuser

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GVM

Performance

Requirements

Install applications

Example

diffuser

vllm + diffuser

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages