Awesome ROCm

A curated list of awesome open source projects, tools, and resources for AMD GPUs via ROCm — covering AI/ML, inference, creative workloads, HPC, and developer tooling.

Maintained by @mikeroysoft · PRs welcome · See CONTRIBUTING.md

🏁 Getting Started

New to AMD GPUs and ROCm? Start here.

ROCm Documentation — Official docs. Start with "What is ROCm?" and the install guide.
ROCm Installation Guide (Linux) — Supported distros, package manager install, and post-install verification.
ROCm GPU & OS Compatibility Matrix — Check if your GPU and OS are supported before you do anything else.
ROCm Docker Hub — Official Docker images. The fastest way to get a working PyTorch + ROCm environment without touching your system Python.
GPUOpen — AMD's developer portal: ISA docs, SDKs, libraries, and tools from AMD and partners.

Quick smoke test after install:

rocminfo                  # list detected GPUs
rocm-smi                  # GPU stats (like nvidia-smi)
python3 -c "import torch; print(torch.cuda.is_available())"  # should be True

🤖 AI & ML Frameworks

Core frameworks with first-class or near-first-class ROCm support.

PyTorch (ROCm) ⭐ 90k+ — Select "ROCm" on the install matrix. The de facto standard. AMD actively contributes upstream.
TensorFlow-ROCm ⭐ — AMD's maintained TensorFlow fork with ROCm support.
JAX on ROCm — Google JAX ported to ROCm. Good for research workloads and differentiable programming.
DeepSpeed ⭐ 36k+ — Microsoft's distributed training library. ROCm support is built-in and actively tested.
PEFT (Hugging Face) ⭐ 17k+ — Parameter-Efficient Fine-Tuning (LoRA, QLoRA, etc.). Works on ROCm via PyTorch.
Transformers (Hugging Face) ⭐ 140k+ — The model hub CLI and training library. ROCm works via PyTorch backend.
Unsloth ⭐ 25k+ — Fast fine-tuning (2–5x speedup, 70% less VRAM). ROCm support added in 2024.

🚀 Inference & Serving

The hottest space in the ROCm ecosystem right now.

vLLM ⭐ 50k+ — High-throughput LLM inference server. ROCm is a first-class target; AMD contributes upstream. See the AMD install guide.
Ollama ⭐ 130k+ — Run LLMs locally. ROCm support is built in — just install and run on a supported AMD GPU.
llama.cpp ⭐ 80k+ — Lightweight inference with a HIP/ROCm backend (-DGGML_HIPBLAS=ON). Broad model support.
SGLang ⭐ 15k+ — Structured generation and high-performance serving runtime. ROCm support active and growing.
LMDeploy ⭐ 5k+ — Efficient inference and serving toolkit from Shanghai AI Lab. ROCm-compatible via PyTorch.
MLC-LLM ⭐ 20k+ — Compile and deploy LLMs natively. Supports ROCm via Apache TVM backend.
TGI (Text Generation Inference) ⭐ 9k+ — Hugging Face's production inference server. AMD/ROCm Docker image available.

🎨 Creative & Image Generation

Consumer Radeon and Instinct GPUs both shine here. Often overlooked in other ROCm lists.

AUTOMATIC1111 Stable Diffusion WebUI ⭐ 145k+ — The classic. ROCm works on Linux; use --precision full --no-half if you hit precision issues on RDNA2.
ComfyUI ⭐ 70k+ — Node-based SD UI. Generally better ROCm compatibility than A1111. Actively tested on RX 7000 series.
InvokeAI ⭐ 23k+ — Professional-grade image generation. ROCm-compatible via PyTorch.
Fooocus ⭐ 42k+ — Simplified Flux/SDXL UI. Works on ROCm with PyTorch.
Flux (Black Forest Labs) ⭐ 20k+ — State-of-the-art image generation model. Runs on ROCm via standard PyTorch.

⚡ Performance & Math Libraries

The building blocks — what makes everything above fast.

rocBLAS — BLAS on AMD GPUs. The foundation for most linear algebra in ML workloads.
hipBLASLt — Lightweight, flexible GEMM library. Used heavily in transformer attention layers.
MIOpen — AMD's deep learning primitives library (convolutions, batch norm, RNN). The cuDNN equivalent.
rocFFT — Fast Fourier Transform on ROCm. Used in scientific computing and signal processing.
Flash Attention (ROCm) — AMD's port of Tri Dao's Flash Attention. Critical for long-context LLM training and inference.
Triton (ROCm backend) ⭐ 14k+ — OpenAI's GPU programming language. ROCm backend is upstream and increasingly production-quality.
hipSPARSELt — Sparse GEMM library optimized for MI300X structured sparsity.
rocWMMA — Wave Matrix Multiply Accumulate — low-level access to AMD matrix cores.
composable_kernel — High-performance fused kernels for ML ops. MIOpen is built on top of this.
RCCL — ROCm Communication Collectives Library (all-reduce, broadcast, etc.). The NCCL equivalent for multi-GPU/multi-node training.

🔧 Developer Tools & Profiling

ROCProfiler — GPU performance profiling and hardware counter access. The foundation for most profiling tools in the ecosystem.
Omniperf — System-level GPU performance profiling. Designed for MI-series Instinct GPUs; great roofline analysis.
Omnitrace — Full-system tracing (CPU + GPU + MPI). Replaces rocTracer for most workflows.
HIPIFY — Translates CUDA source to HIP. Best first step when porting a CUDA project to ROCm.
HIP — The C++ GPU programming API that runs on both AMD (via ROCm) and NVIDIA (via CUDA). Write once, run on both.
hipcc — ROCm's compiler driver. Based on LLVM/Clang with AMD GPU codegen.
rocm-cmake — CMake modules and utilities for ROCm projects. Use this to properly set up HIP builds.
rocDecode — Hardware video decode library for AMD GPUs. Useful for CV and video ML pipelines.

📦 Containers & Orchestration

ROCm Docker Images — Official images: bare ROCm, PyTorch, TensorFlow, JAX. Start here for reproducible environments.
AMD GPU Operator (Kubernetes) — Deploy and manage AMD GPUs in k8s clusters. Based on the NVIDIA GPU Operator pattern.
AMD Device Plugin for Kubernetes — Exposes AMD GPUs as schedulable resources in k8s.
ROCm on WSL2 — Windows Subsystem for Linux 2 support. RDNA2/3 only; check compatibility matrix first.

🏛️ HPC & Scientific Computing

OpenMPI + ROCm — UCX and OpenMPI both support ROCm-aware MPI (GPU-to-GPU transfers without CPU round-trips).
GROMACS (AMD build) — Molecular dynamics. ROCm-accelerated kernels for MI-series GPUs.
LAMMPS ⭐ 2.5k+ — Molecular dynamics simulator with HIP/ROCm support via the KOKKOS package.
CP2K ⭐ 2.5k+ — Quantum chemistry and MD. ROCm GPU offload available.
OpenCL on ROCm — ROCm includes a full OpenCL 2.0 implementation. Legacy HPC codes can often run without porting.
rocRAND — GPU random number generation. Used in Monte Carlo and stochastic simulation workloads.

🔄 CUDA Migration

Porting from CUDA? These tools and guides are your starting point.

HIPIFY — Automated CUDA → HIP source translation. Handles most common patterns; check the unsupported features list.
HIP Programming Guide — The definitive guide to writing HIP code and understanding CUDA equivalencies.
CUDA to ROCm Porting Guide — AMD's official porting walkthrough with common patterns and gotchas.
PyTorch CUDA to ROCm Migration — Notes on CUDA/HIP compatibility in PyTorch — most CUDA PyTorch code runs on ROCm unchanged.

🌐 Community & Learning

ROCm GitHub Organization — The canonical source for all ROCm components.
AMD ROCm Blogs — Technical deep-dives from the AMD engineering team.
AMD Instinct YouTube Channel — Webinars, tutorials, and product talks.
r/ROCm — Community troubleshooting and news. Good signal-to-noise ratio.
AMD Developer Forums — Official support forums for Instinct and ROCm.
ROCm Discord — Real-time community help and dev discussion.

Contributing

See CONTRIBUTING.md. In short: open source, demonstrably ROCm-compatible, and worth including based on quality or popularity. File an issue or open a PR.

License

To the extent possible under law, Michael Roy has waived all copyright and related or neighboring rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome ROCm

Contents

🏁 Getting Started

🤖 AI & ML Frameworks

🚀 Inference & Serving

🎨 Creative & Image Generation

⚡ Performance & Math Libraries

🔧 Developer Tools & Profiling

📦 Containers & Orchestration

🏛️ HPC & Scientific Computing

🔄 CUDA Migration

🌐 Community & Learning

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome ROCm

Contents

🏁 Getting Started

🤖 AI & ML Frameworks

🚀 Inference & Serving

🎨 Creative & Image Generation

⚡ Performance & Math Libraries

🔧 Developer Tools & Profiling

📦 Containers & Orchestration

🏛️ HPC & Scientific Computing

🔄 CUDA Migration

🌐 Community & Learning

Contributing

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages