Skip to content

mikeroySoft/awesome-ROCm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Awesome ROCm Awesome

A curated list of awesome open source projects, tools, and resources for AMD GPUs via ROCm — covering AI/ML, inference, creative workloads, HPC, and developer tooling.

Maintained by @mikeroysoft · PRs welcome · See CONTRIBUTING.md


Contents


🏁 Getting Started

New to AMD GPUs and ROCm? Start here.

  • ROCm Documentation — Official docs. Start with "What is ROCm?" and the install guide.
  • ROCm Installation Guide (Linux) — Supported distros, package manager install, and post-install verification.
  • ROCm GPU & OS Compatibility Matrix — Check if your GPU and OS are supported before you do anything else.
  • ROCm Docker Hub — Official Docker images. The fastest way to get a working PyTorch + ROCm environment without touching your system Python.
  • GPUOpen — AMD's developer portal: ISA docs, SDKs, libraries, and tools from AMD and partners.

Quick smoke test after install:

rocminfo                  # list detected GPUs
rocm-smi                  # GPU stats (like nvidia-smi)
python3 -c "import torch; print(torch.cuda.is_available())"  # should be True

🤖 AI & ML Frameworks

Core frameworks with first-class or near-first-class ROCm support.

  • PyTorch (ROCm) ⭐ 90k+ — Select "ROCm" on the install matrix. The de facto standard. AMD actively contributes upstream.
  • TensorFlow-ROCm ⭐ — AMD's maintained TensorFlow fork with ROCm support.
  • JAX on ROCm — Google JAX ported to ROCm. Good for research workloads and differentiable programming.
  • DeepSpeed ⭐ 36k+ — Microsoft's distributed training library. ROCm support is built-in and actively tested.
  • PEFT (Hugging Face) ⭐ 17k+ — Parameter-Efficient Fine-Tuning (LoRA, QLoRA, etc.). Works on ROCm via PyTorch.
  • Transformers (Hugging Face) ⭐ 140k+ — The model hub CLI and training library. ROCm works via PyTorch backend.
  • Unsloth ⭐ 25k+ — Fast fine-tuning (2–5x speedup, 70% less VRAM). ROCm support added in 2024.

🚀 Inference & Serving

The hottest space in the ROCm ecosystem right now.

  • vLLM ⭐ 50k+ — High-throughput LLM inference server. ROCm is a first-class target; AMD contributes upstream. See the AMD install guide.
  • Ollama ⭐ 130k+ — Run LLMs locally. ROCm support is built in — just install and run on a supported AMD GPU.
  • llama.cpp ⭐ 80k+ — Lightweight inference with a HIP/ROCm backend (-DGGML_HIPBLAS=ON). Broad model support.
  • SGLang ⭐ 15k+ — Structured generation and high-performance serving runtime. ROCm support active and growing.
  • LMDeploy ⭐ 5k+ — Efficient inference and serving toolkit from Shanghai AI Lab. ROCm-compatible via PyTorch.
  • MLC-LLM ⭐ 20k+ — Compile and deploy LLMs natively. Supports ROCm via Apache TVM backend.
  • TGI (Text Generation Inference) ⭐ 9k+ — Hugging Face's production inference server. AMD/ROCm Docker image available.

🎨 Creative & Image Generation

Consumer Radeon and Instinct GPUs both shine here. Often overlooked in other ROCm lists.

  • AUTOMATIC1111 Stable Diffusion WebUI ⭐ 145k+ — The classic. ROCm works on Linux; use --precision full --no-half if you hit precision issues on RDNA2.
  • ComfyUI ⭐ 70k+ — Node-based SD UI. Generally better ROCm compatibility than A1111. Actively tested on RX 7000 series.
  • InvokeAI ⭐ 23k+ — Professional-grade image generation. ROCm-compatible via PyTorch.
  • Fooocus ⭐ 42k+ — Simplified Flux/SDXL UI. Works on ROCm with PyTorch.
  • Flux (Black Forest Labs) ⭐ 20k+ — State-of-the-art image generation model. Runs on ROCm via standard PyTorch.

⚡ Performance & Math Libraries

The building blocks — what makes everything above fast.

  • rocBLAS — BLAS on AMD GPUs. The foundation for most linear algebra in ML workloads.
  • hipBLASLt — Lightweight, flexible GEMM library. Used heavily in transformer attention layers.
  • MIOpen — AMD's deep learning primitives library (convolutions, batch norm, RNN). The cuDNN equivalent.
  • rocFFT — Fast Fourier Transform on ROCm. Used in scientific computing and signal processing.
  • Flash Attention (ROCm) — AMD's port of Tri Dao's Flash Attention. Critical for long-context LLM training and inference.
  • Triton (ROCm backend) ⭐ 14k+ — OpenAI's GPU programming language. ROCm backend is upstream and increasingly production-quality.
  • hipSPARSELt — Sparse GEMM library optimized for MI300X structured sparsity.
  • rocWMMA — Wave Matrix Multiply Accumulate — low-level access to AMD matrix cores.
  • composable_kernel — High-performance fused kernels for ML ops. MIOpen is built on top of this.
  • RCCL — ROCm Communication Collectives Library (all-reduce, broadcast, etc.). The NCCL equivalent for multi-GPU/multi-node training.

🔧 Developer Tools & Profiling

  • ROCProfiler — GPU performance profiling and hardware counter access. The foundation for most profiling tools in the ecosystem.
  • Omniperf — System-level GPU performance profiling. Designed for MI-series Instinct GPUs; great roofline analysis.
  • Omnitrace — Full-system tracing (CPU + GPU + MPI). Replaces rocTracer for most workflows.
  • HIPIFY — Translates CUDA source to HIP. Best first step when porting a CUDA project to ROCm.
  • HIP — The C++ GPU programming API that runs on both AMD (via ROCm) and NVIDIA (via CUDA). Write once, run on both.
  • hipcc — ROCm's compiler driver. Based on LLVM/Clang with AMD GPU codegen.
  • rocm-cmake — CMake modules and utilities for ROCm projects. Use this to properly set up HIP builds.
  • rocDecode — Hardware video decode library for AMD GPUs. Useful for CV and video ML pipelines.

📦 Containers & Orchestration


🏛️ HPC & Scientific Computing

  • OpenMPI + ROCm — UCX and OpenMPI both support ROCm-aware MPI (GPU-to-GPU transfers without CPU round-trips).
  • GROMACS (AMD build) — Molecular dynamics. ROCm-accelerated kernels for MI-series GPUs.
  • LAMMPS ⭐ 2.5k+ — Molecular dynamics simulator with HIP/ROCm support via the KOKKOS package.
  • CP2K ⭐ 2.5k+ — Quantum chemistry and MD. ROCm GPU offload available.
  • OpenCL on ROCm — ROCm includes a full OpenCL 2.0 implementation. Legacy HPC codes can often run without porting.
  • rocRAND — GPU random number generation. Used in Monte Carlo and stochastic simulation workloads.

🔄 CUDA Migration

Porting from CUDA? These tools and guides are your starting point.

  • HIPIFY — Automated CUDA → HIP source translation. Handles most common patterns; check the unsupported features list.
  • HIP Programming Guide — The definitive guide to writing HIP code and understanding CUDA equivalencies.
  • CUDA to ROCm Porting Guide — AMD's official porting walkthrough with common patterns and gotchas.
  • PyTorch CUDA to ROCm Migration — Notes on CUDA/HIP compatibility in PyTorch — most CUDA PyTorch code runs on ROCm unchanged.

🌐 Community & Learning


Contributing

See CONTRIBUTING.md. In short: open source, demonstrably ROCm-compatible, and worth including based on quality or popularity. File an issue or open a PR.


License

CC0

To the extent possible under law, Michael Roy has waived all copyright and related or neighboring rights to this work.

About

A curated list of awesome open source projects, tools, and resources for AMD GPUs via ROCm

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors