feat: AMD GPU Support by jiarong0907 · Pull Request #248 · ovg-project/kvcached

jiarong0907 · 2026-02-21T06:18:20Z

Summary

Add AMD GPU (ROCm/HIP) support to kvcached by abstracting over CUDA and HIP at compile time, allowing the same codebase to build and run on both NVIDIA and AMD GPUs.

New GPU abstraction layer (`csrc/inc/gpu_vmm.hpp`)

Introduces kvcached::gpu_vmm namespace that wraps CUDA Driver API (cuMem*) and HIP VMM API (hipMem*) behind a unified interface
Selected at compile time via -DKVCACHED_USE_CUDA or -DKVCACHED_USE_HIP
Provides type aliases (allocation_handle_t, allocation_prop_t, access_desc_t), error handling, and all VMM operations (reserve, create, map, unmap, set_access, release)

C++ header cleanup

Replaced heavyweight #include <torch/extension.h> with targeted c10/ATen headers (c10::ScalarType, c10::Device, at::Tensor) to reduce build times
Removed hardcoded #include <cuda_runtime.h> / #include <cuda.h> from all files except the abstraction layer
Renamed cuda_utils.hpp → gpu_utils.hpp; routed error-checking macros through gpu_vmm.hpp
Renamed init_cuda_() → init_gpu_()

Build system (`setup.py`)

Auto-detects backend via torch.version.hip / torch.version.cuda
HIP builds use CppExtension (avoids PyTorch's hipify step which is unnecessary since the code handles HIP natively)
CUDA builds use CUDAExtension as before
Sets appropriate defines and link libraries per backend (-lamdhip64 for HIP, -lcuda for CUDA)

Python integration

Updated SGLang and vLLM integration patches for device string handling ("cuda" and "hip" prefixes)
Added warning log when GPU is unavailable in page_allocator.py
Generalized assertion messages to be backend-agnostic

Benchmark (`benchmarks/bench_vmm/`)

Ported to use gpu_vmm.hpp abstraction — now builds for both CUDA and HIP
Removed local cuda_utils.hpp duplicate; uses the main project's gpu_utils.hpp
Updated Makefile: make for CUDA, make KVCACHED_BACKEND=hip for ROCm

Test Plan

Build with CUDA (-DKVCACHED_USE_CUDA) on NVIDIA GPU and run two vLLMs with kvcached
Build with HIP (-DKVCACHED_USE_HIP) on AMD GPU (MI300X) and two vLLMs with kvcached
Run VMM benchmark on AMD platforms (make KVCACHED_BACKEND=hip)
Run VMM benchmark on both platforms (make)
Verify SGLang integration works on both backends

jiarong0907 · 2026-02-22T22:42:20Z

/gemini review

gemini-code-assist

Code Review

The pull request successfully introduces a GPU abstraction layer (gpu_vmm.hpp) to support both NVIDIA (CUDA) and AMD (HIP/ROCm) backends. This is a well-structured refactoring that generalizes memory management operations. The use of CppExtension for HIP builds to avoid unnecessary hipify steps is a clever choice for code that is already cross-platform. However, there are a few issues that need attention: the benchmark code was not fully abstracted and will fail to compile on AMD platforms, and there are potential API mismatches in setup.py regarding include_paths and library_paths arguments.

setup.py

benchmarks/bench_vmm/bench_vmm.cpp

jiarong0907 · 2026-02-22T23:36:23Z

@ivanium @cui36 @shenrunzhang I reviewed the code myself, and finished the testing plan. Feel free to take a look and let me know if there is any issues.

ivanium · 2026-02-22T23:54:46Z

@ivanium @cui36 @shenrunzhang I reviewed the code myself, and finished the testing plan. Feel free to take a look and let me know if there is any issues.

Impressive work! Will take a look shortly

ivanium · 2026-02-23T01:03:12Z

Haven't finished reading the code, but one QQ: what's the performance we get on AMD GPUs and VMM APIs there? Can we also attach some benchmark results (if we have already had them).

jiarong0907 · 2026-02-23T02:01:51Z

Haven't finished reading the code, but one QQ: what's the performance we get on AMD GPUs and VMM APIs there? Can we also attach some benchmark results (if we have already had them).

Just made the AMD version worked. @shenrunzhang will do the benchmark.

jiarong0907 added 7 commits February 20, 2026 17:19

update ignorance of vscode settings

68b3047

can working version with messy code

741515c

fix pre-commit

2edf2aa

Merge branch 'main' into amd-support-init

92ac0c2

rename cuda_utils to gpu_utils

7e4d3d6

add comment for is_cuda

fe72a34

warning for non-gpu device

81a3914

jiarong0907 marked this pull request as ready for review February 22, 2026 22:41

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

setup.py Show resolved Hide resolved

setup.py Show resolved Hide resolved

benchmarks/bench_vmm/bench_vmm.cpp Show resolved Hide resolved

jiarong0907 and others added 3 commits February 22, 2026 23:00

update bench_vmm to support amd gpus

745660e

fix precommit

0f7a4c3

Merge branch 'main' into amd-support-init

11cb96b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AMD GPU Support#248

feat: AMD GPU Support#248
jiarong0907 wants to merge 10 commits intomainfrom
amd-support-init

jiarong0907 commented Feb 21, 2026 •

edited

Loading

Uh oh!

jiarong0907 commented Feb 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiarong0907 commented Feb 22, 2026

Uh oh!

ivanium commented Feb 22, 2026

Uh oh!

ivanium commented Feb 23, 2026

Uh oh!

jiarong0907 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiarong0907 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New GPU abstraction layer (csrc/inc/gpu_vmm.hpp)

C++ header cleanup

Build system (setup.py)

Python integration

Benchmark (benchmarks/bench_vmm/)

Test Plan

Uh oh!

jiarong0907 commented Feb 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiarong0907 commented Feb 22, 2026

Uh oh!

ivanium commented Feb 22, 2026

Uh oh!

ivanium commented Feb 23, 2026

Uh oh!

jiarong0907 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiarong0907 commented Feb 21, 2026 •

edited

Loading

New GPU abstraction layer (`csrc/inc/gpu_vmm.hpp`)

Build system (`setup.py`)

Benchmark (`benchmarks/bench_vmm/`)