QEMU + gem5 MI300X Co-simulation

Co-simulation framework that pairs QEMU (host CPU/system via KVM) with gem5 (cycle-accurate MI300X GPU model) to run real AMD ROCm/HIP workloads on a simulated GPU without physical hardware.

+-----------------------------+       +----------------------------+
|  QEMU  (Q35 + KVM)          |       |  gem5  (Docker)            |
|  +-----------------------+  |       |  +----------------------+  |
|  | Guest Linux           |  |       |  | MI300X GPU Model     |  |
|  | amdgpu driver         |  |       |  |  Shader / CU / SDMA  |  |
|  | ROCm 7.0 / HIP        |  |       |  |  PM4 / Ruby caches   |  |
|  +----------+------------+  |       |  +---------+------------+  |
|  +----------v------------+  |       |  +---------v------------+  |
|  | vfio-user-pci         |<-------->|  | MI300XVfioUser       |  |
|  | (QEMU built-in)       |  |vfio-  |  | (libvfio-user)       |  |
|  +-----------------------+  |user   |  +----------------------+  |
+-----------------------------+       +----------------------------+
        |                                       |
        v                                       v
  /dev/shm/cosim-guest-ram            /dev/shm/mi300x-vram
  (shared guest RAM)                  (shared GPU VRAM)

Features

Full amdgpu driver load — DRM initialized, 7 XCP partitions, gfx942
HIP compute verified — hipMalloc, kernel dispatch, hipDeviceSynchronize
MSI-X interrupts — gem5 → QEMU interrupt forwarding, IH ring buffer
Shared memory DMA — zero-copy VRAM and guest RAM via /dev/shm
Auto driver load — systemd service for ip_block_mask=0x67 discovery=2

Prerequisites

Requirement	Details
Host OS	Linux x86_64 with KVM (tested on WSL2 6.6.x)
Docker	Running daemon, user in `docker` group
KVM	`/dev/kvm` accessible
Disk space	~120 GB (55G disk image + build artifacts)
RAM	16 GB+ recommended

Quick Start

Option A: Script-based

git clone --recurse-submodules git@github.com:zevorn/cosim.git
cd cosim

# Build gem5 + QEMU + disk image (~2h total, needs KVM + Docker + ~60GB disk)
GEM5_BUILD_IMAGE=ghcr.io/gem5/gpu-fs:latest ./scripts/run_mi300x_fs.sh build-all

# Build runtime Docker image (for running gem5 inside Docker)
cd scripts && docker build -t gem5-run:local -f Dockerfile.run . && cd ..

# Launch co-simulation
./scripts/cosim_launch.sh

Option B: Manual step-by-step

# 1. Clone with submodules
git clone --recurse-submodules git@github.com:zevorn/cosim.git
cd cosim

# 2. Build gem5 (in Docker, ~30min; use -j1 if OOM-killed during linking)
cd gem5
docker run --rm -v "$(pwd):/gem5" -w /gem5 \
    -e PYTHONPATH=/usr/lib/python3.12/lib-dynload \
    ghcr.io/gem5/gpu-fs:latest \
    bash -c "scons build/VEGA_X86/gem5.opt -j4 GOLD_LINKER=True --linker=gold"
cd ..

# 3. Build runtime Docker image (for running gem5 inside Docker)
cd scripts && docker build -t gem5-run:local -f Dockerfile.run . && cd ..

# 4. Build QEMU (stock build; vfio-user-pci is built-in since QEMU 10.0)
cd qemu && mkdir -p build && cd build
../configure --target-list=x86_64-softmmu
make -j$(nproc)
cd ../..

# 5. Pre-build m5 utility (recommended — avoids git clone inside guest VM)
docker run --rm -v "$(pwd)/gem5:/gem5" -w /gem5 \
    ghcr.io/gem5/gpu-fs:latest \
    bash -c "cd util/m5 && scons build/x86/out/m5"
cp gem5/util/m5/build/x86/out/m5 gem5-resources/src/x86-ubuntu-gpu-ml/files/

# 6. Build disk image (Ubuntu 24.04 + ROCm 7.0, ~40min, needs KVM + ~60GB disk)
./scripts/run_mi300x_fs.sh build-disk

# 7. Launch co-simulation
./scripts/cosim_launch.sh

After guest boots (auto-login as root), the GPU driver loads automatically via cosim-gpu-setup.service (~40s). Verify:

rocm-smi          # should show device 0x74a0
rocminfo          # should show gfx942

# Manual setup (if the systemd service is not installed):
dd if=/root/roms/mi300.rom of=/dev/mem bs=1k seek=768 count=128
modprobe amdgpu ip_block_mask=0x67 discovery=2 ras_enable=0

# Run a HIP test
cat > /tmp/test.cpp << 'EOF'
#include <hip/hip_runtime.h>
#include <cstdio>
__global__ void add(int *a, int *b, int *c, int n) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}
int main() {
    const int N = 4;
    int ha[] = {1,2,3,4}, hb[] = {10,20,30,40}, hc[4] = {};
    int *da, *db, *dc;
    hipMalloc(&da, N*4); hipMalloc(&db, N*4); hipMalloc(&dc, N*4);
    hipMemcpy(da, ha, N*4, hipMemcpyHostToDevice);
    hipMemcpy(db, hb, N*4, hipMemcpyHostToDevice);
    add<<<1, N>>>(da, db, dc, N);
    hipMemcpy(hc, dc, N*4, hipMemcpyDeviceToHost);
    printf("Result: %d %d %d %d\n", hc[0], hc[1], hc[2], hc[3]);
    printf("%s\n", (hc[0]==11&&hc[1]==22&&hc[2]==33&&hc[3]==44) ? "PASSED!" : "FAILED!");
    hipFree(da); hipFree(db); hipFree(dc);
}
EOF
/opt/rocm/bin/hipcc --offload-arch=gfx942 -o /tmp/test /tmp/test.cpp
/tmp/test
# Expected: Result: 11 22 33 44
#           PASSED!

Repository Structure

cosim/
|-- gem5/                    # gem5 simulator (submodule, cosim-gpu branch)
|   |-- src/dev/amdgpu/      # MI300X GPU device model & vfio-user bridge
|   |-- ext/libvfio-user/    # libvfio-user library (Nutanix)
|   `-- configs/example/gpufs/mi300_cosim.py  # cosim configuration
|-- qemu/                    # QEMU emulator (submodule, stock QEMU 10.0+)
|-- gem5-resources/          # disk images, kernels, GPU apps (submodule)
|-- scripts/                 # build & launch scripts
|   |-- cosim_launch.sh      # one-click cosim launcher
|   |-- run_mi300x_fs.sh     # build orchestration
|   |-- cosim_guest_setup.sh # guest-side GPU setup
|   |-- cosim_test_client.py # socket test client
|   `-- Dockerfile.run       # gem5 runtime Docker image
|-- docs/                    # technical documentation (zh + en)
|   |-- en/                  # English
|   `-- zh/                  # Chinese
|-- LICENSE                  # Apache 2.0
`-- README.md                # this file

Key Components

QEMU Side (stock `vfio-user-pci`)

Uses QEMU's built-in vfio-user-pci device (no custom QEMU code). QEMU connects to gem5's vfio-user server and maps all BARs through the standard vfio-user protocol:

BAR0+1: VRAM (64-bit, prefetchable, shared memory backed)
BAR2+3: Doorbell (64-bit, forwarded to gem5 via vfio-user)
BAR4: MSI-X (256 vectors, eventfd-based KVM injection)
BAR5: MMIO registers (32-bit, forwarded to gem5 via vfio-user)

gem5 Side (`gem5/src/dev/amdgpu/`)

MI300XVfioUser — vfio-user server (libvfio-user), BAR/config/IRQ dispatch
AMDGPUDevice — MI300X GPU device model (MMIO, doorbell, config space)
PM4PacketProcessor — command processor with VRAM-aware fence routing
SDMAEngine — DMA engine with VRAM write-back support
AMDGPUVM — GART translation with cosim fallback (shared VRAM PTE reads)

Version Matrix

Component	Version
Guest OS	Ubuntu 24.04.2 LTS
Guest kernel	6.8.0-79-generic
ROCm	7.0.0
GPU device	MI300X (gfx942, DeviceID 0x74A0)
gem5 build	VEGA_X86, GPU_VIPER coherence

Documentation

Detailed technical documentation is available in docs/:

Complete Usage Guide — build, run, test
Technical Notes — architecture, pitfalls, fixes
MI300X Memory Management — GART, address translation
GPU FS Guide — gem5 standalone GPU full-system simulation
Guest GPU Init — driver initialization flow
Memory Architecture — shared memory, VRAM routing, DMA
Debugging Pitfalls — common issues and solutions
Development Story — how this project was built in one day with Claude

License

This project is licensed under the Apache License 2.0.

Note: The gem5 and qemu submodules are governed by their own respective licenses (gem5: BSD-3-Clause, QEMU: GPL-2.0).

Acknowledgments

gem5 — modular computer architecture simulator
QEMU — open-source machine emulator
ROCm — AMD GPU computing platform

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
docs		docs
gem5 @ 0018bde		gem5 @ 0018bde
gem5-resources @ b4726da		gem5-resources @ b4726da
qemu @ 357667d		qemu @ 357667d
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QEMU + gem5 MI300X Co-simulation

Features

Prerequisites

Quick Start

Option A: Script-based

Option B: Manual step-by-step

Repository Structure

Key Components

QEMU Side (stock `vfio-user-pci`)

gem5 Side (`gem5/src/dev/amdgpu/`)

Version Matrix

Documentation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QEMU + gem5 MI300X Co-simulation

Features

Prerequisites

Quick Start

Option A: Script-based

Option B: Manual step-by-step

Repository Structure

Key Components

QEMU Side (stock vfio-user-pci)

gem5 Side (gem5/src/dev/amdgpu/)

Version Matrix

Documentation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

QEMU Side (stock `vfio-user-pci`)

gem5 Side (`gem5/src/dev/amdgpu/`)

Packages