RISC-V GPU Research Repository

This repository is a research and learning workspace for building a RISC-V based GPU architecture that avoids custom ISA extensions (the primary exception being the RISC-V Vector extension where appropriate). The goal is to explore GPU building blocks, accelerator micro-architectures, and end-to-end flows using open tools and standard RISC-V features.

Status: initial work focused on systolic-array based matrix-multiply building blocks (2x2 and 4x4 example designs), RTL testbenches, and simple simulation flows.

Contents

Overview: goals and design philosophy.
systolic_array/: starting point — example systolic array RTL for matmul and simulation flows.
Roadmap: planned next steps for the GPU stack.

Why this repo

GPUs are complex systems built from well-understood building blocks: vector/SIMD units, memory hierarchies, data-movement engines, and computation accelerators. This repo aims to:

Study those building blocks in isolation and in combination.
Build hardware blocks using synthesizable RTL and open-source tools.
Keep the ISA standard (RISC-V + Vector extension) to maximize portability.

What you'll find here

Top-level structure (high-level):

systolic_array/ — Systolic-array examples and simulation flows.
- 2x2_matmul/ — 2x2 processing-element (PE) matrix multiply example.
  - rtl/ — Verilog sources for the PE and top-level systolic array.
  - simulation/ — Makefile and helper files to run iverilog/vvp simulations and produce waveforms.
    - systolic_array/2x2_matmul/simulation/Makefile
- 4x4_matmul/ — 4x4 PE example and testbench.

Getting started

Prerequisites (recommended):

iverilog + vvp (simulation)
yosys (synthesis / linting)
gtkwave or any VCD viewer (optional)

On macOS (Homebrew):

brew install icarus-verilog yosys gtkwave

Running the provided simulations

Each example has a simulation/ directory with a Makefile to run the RTL tests and generate waveforms. Example:

# 2x2 example
cd systolic_array/2x2_matmul/simulation
make

# 4x4 example
cd ../../4x4_matmul/simulation
make

The Makefiles typically run iverilog to produce a sim.vvp binary and then run vvp sim.vvp producing wave.vcd; open that VCD with gtkwave wave.vcd.

Design notes

Systolic arrays are used here as a compute substrate for dense matrix multiply (GEMM). They demonstrate dataflow-style mapping of compute to constant-mesh PE arrays and are a common accelerator building block in GPUs.
The long-term plan is to integrate these accelerators with a RISC-V CPU and expose work to them using standard mechanisms (memory-mapped queues, DMA engines, or offload semantics) while keeping the ISA itself standard-compliant.
The RISC-V Vector extension is considered the primary ISA-level mechanism for expressing wide-data parallelism; accelerators (like systolic arrays) are complementary and can be targeted from vectorized code or from higher-level runtimes.

Roadmap: Building the RISC-V GPU

The goal is to build a GPU where the compute cores execute standard RISC-V instructions (RV32IMV or RV64GV), avoiding custom ISA extensions beyond the Vector extension.

Architecture Overview

A GPU consists of:

Many parallel execution units — SIMT lanes organized into warps
A warp/thread scheduler — manages thousands of concurrent threads
Memory hierarchy — registers, shared memory, caches, global memory
Command frontend — receives work from host, dispatches to compute units

Phase 1: Single RISC-V Vector Core

Integrate or build a minimal RV32IM core (candidates: PicoRV32, VexRiscv, or custom)
Add Vector extension support (study Ara, Vicuna, or build minimal vector ALU)
Test with simple vector programs: vector add, dot product, small matmul
Establish simulation and verification flow

Phase 2: Multi-lane SIMT Execution

Instantiate multiple scalar pipelines sharing instruction fetch (warp)
Build warp scheduler (round-robin or scoreboard-based)
Implement divergence handling (masked execution for branches)
Add warp-level synchronization primitives

Phase 3: Memory Hierarchy

Design banked register file (GPUs need large register files)
Add shared memory / scratchpad (fast, software-managed, per-workgroup)
Implement L1 data cache or texture-cache style access
Build global memory interface (AXI or similar to external DRAM)

Phase 4: Compute Dispatch & Host Interface

Command processor: read work descriptors from memory
Thread block / workgroup dispatcher: assign blocks to compute units
Barrier / sync support (__syncthreads() equivalent)
DMA engine for bulk data movement

Phase 5: Software Stack

Minimal runtime to launch kernels from host
Use LLVM RISC-V backend with vector intrinsics for compilation
Write example GPU kernels in C with RVV intrinsics
Simple benchmarks: SAXPY, GEMM, reduction

Reference Projects

Project	Description
Vortex	Open-source RISC-V GPGPU — closest reference architecture
Ara	Full RVV 1.0 vector unit from ETH Zurich
Vicuna	Lightweight RVV core
VexRiscv	Configurable RISC-V in SpinalHDL
PicoRV32	Tiny RV32 core, easy to understand

Current Status

✅ Done: Systolic array building blocks (2×2, 4×4 matmul PEs) — understanding dataflow compute
🔄 Next: Phase 1 — integrate a base RISC-V core and add vector support

Contributing

Contributions are welcome. Suggested workflow:

Open an issue describing the change or feature.
Create a branch and send a PR with clear description and tests where applicable.

Licensing

This repository does not include an explicit license file yet. If you want to add a license, consider a permissive license such as MIT or Apache-2.0.

Contact / Questions

Open an issue or create a discussion in this repository for questions, design discussions, or coordination.

Acknowledgements

This work uses open-source tools such as Icarus Verilog and Yosys for simulation and synthesis research.

--

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
systolic_array		systolic_array
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RISC-V GPU Research Repository

Why this repo

What you'll find here

Getting started

Running the provided simulations

Design notes

Roadmap: Building the RISC-V GPU

Architecture Overview

Phase 1: Single RISC-V Vector Core

Phase 2: Multi-lane SIMT Execution

Phase 3: Memory Hierarchy

Phase 4: Compute Dispatch & Host Interface

Phase 5: Software Stack

Reference Projects

Current Status

Contributing

Licensing

Contact / Questions

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

cupx0j0e/riscv-gpu

Folders and files

Latest commit

History

Repository files navigation

RISC-V GPU Research Repository

Why this repo

What you'll find here

Getting started

Running the provided simulations

Design notes

Roadmap: Building the RISC-V GPU

Architecture Overview

Phase 1: Single RISC-V Vector Core

Phase 2: Multi-lane SIMT Execution

Phase 3: Memory Hierarchy

Phase 4: Compute Dispatch & Host Interface

Phase 5: Software Stack

Reference Projects

Current Status

Contributing

Licensing

Contact / Questions

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages