Skip to content

feat: AMD GPU Support#248

Open
jiarong0907 wants to merge 10 commits intomainfrom
amd-support-init
Open

feat: AMD GPU Support#248
jiarong0907 wants to merge 10 commits intomainfrom
amd-support-init

Conversation

@jiarong0907
Copy link
Collaborator

@jiarong0907 jiarong0907 commented Feb 21, 2026

Summary

Add AMD GPU (ROCm/HIP) support to kvcached by abstracting over CUDA and HIP at compile time, allowing the same codebase to build and run on both NVIDIA and AMD GPUs.

New GPU abstraction layer (csrc/inc/gpu_vmm.hpp)

  • Introduces kvcached::gpu_vmm namespace that wraps CUDA Driver API (cuMem*) and HIP VMM API (hipMem*) behind a unified interface
  • Selected at compile time via -DKVCACHED_USE_CUDA or -DKVCACHED_USE_HIP
  • Provides type aliases (allocation_handle_t, allocation_prop_t, access_desc_t), error handling, and all VMM operations (reserve, create, map, unmap, set_access, release)

C++ header cleanup

  • Replaced heavyweight #include <torch/extension.h> with targeted c10/ATen headers (c10::ScalarType, c10::Device, at::Tensor) to reduce build times
  • Removed hardcoded #include <cuda_runtime.h> / #include <cuda.h> from all files except the abstraction layer
  • Renamed cuda_utils.hppgpu_utils.hpp; routed error-checking macros through gpu_vmm.hpp
  • Renamed init_cuda_()init_gpu_()

Build system (setup.py)

  • Auto-detects backend via torch.version.hip / torch.version.cuda
  • HIP builds use CppExtension (avoids PyTorch's hipify step which is unnecessary since the code handles HIP natively)
  • CUDA builds use CUDAExtension as before
  • Sets appropriate defines and link libraries per backend (-lamdhip64 for HIP, -lcuda for CUDA)

Python integration

  • Updated SGLang and vLLM integration patches for device string handling ("cuda" and "hip" prefixes)
  • Added warning log when GPU is unavailable in page_allocator.py
  • Generalized assertion messages to be backend-agnostic

Benchmark (benchmarks/bench_vmm/)

  • Ported to use gpu_vmm.hpp abstraction — now builds for both CUDA and HIP
  • Removed local cuda_utils.hpp duplicate; uses the main project's gpu_utils.hpp
  • Updated Makefile: make for CUDA, make KVCACHED_BACKEND=hip for ROCm

Test Plan

  • Build with CUDA (-DKVCACHED_USE_CUDA) on NVIDIA GPU and run two vLLMs with kvcached
  • Build with HIP (-DKVCACHED_USE_HIP) on AMD GPU (MI300X) and two vLLMs with kvcached
  • Run VMM benchmark on AMD platforms (make KVCACHED_BACKEND=hip)
  • Run VMM benchmark on both platforms (make)
  • Verify SGLang integration works on both backends

@jiarong0907 jiarong0907 marked this pull request as ready for review February 22, 2026 22:41
@jiarong0907
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully introduces a GPU abstraction layer (gpu_vmm.hpp) to support both NVIDIA (CUDA) and AMD (HIP/ROCm) backends. This is a well-structured refactoring that generalizes memory management operations. The use of CppExtension for HIP builds to avoid unnecessary hipify steps is a clever choice for code that is already cross-platform. However, there are a few issues that need attention: the benchmark code was not fully abstracted and will fail to compile on AMD platforms, and there are potential API mismatches in setup.py regarding include_paths and library_paths arguments.

@jiarong0907
Copy link
Collaborator Author

@ivanium @cui36 @shenrunzhang I reviewed the code myself, and finished the testing plan. Feel free to take a look and let me know if there is any issues.

@ivanium
Copy link
Collaborator

ivanium commented Feb 22, 2026

@ivanium @cui36 @shenrunzhang I reviewed the code myself, and finished the testing plan. Feel free to take a look and let me know if there is any issues.

Impressive work! Will take a look shortly

@ivanium
Copy link
Collaborator

ivanium commented Feb 23, 2026

Haven't finished reading the code, but one QQ: what's the performance we get on AMD GPUs and VMM APIs there? Can we also attach some benchmark results (if we have already had them).

@jiarong0907
Copy link
Collaborator Author

Haven't finished reading the code, but one QQ: what's the performance we get on AMD GPUs and VMM APIs there? Can we also attach some benchmark results (if we have already had them).

Just made the AMD version worked. @shenrunzhang will do the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants