[Experiment] ROCm backend #2300

NripeshN · 2025-06-16T21:43:44Z

Experiment with ROCm backend.

install MLX with ROCm backend using:

mkdir build && cd build
cmake -DMLX_BUILD_ROCM=ON \
      -DCMAKE_PREFIX_PATH=/opt/rocm \
      -DCMAKE_HIP_ARCHITECTURES="gfx90a;gfx1100" \
      ..
make -j$(nproc)

closes #2556

Inspired by @zcbenz

lin72h · 2025-06-17T07:07:21Z

What an unexpected and amazing surprise! I'm absolutely thrilled.

NripeshN · 2025-06-18T23:51:44Z

@awni
What do you think of this PR? Does this have the potential to be merged into main? I can turn this PR from experimental to WIP if so.

angeloskath · 2025-06-24T00:38:27Z

I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it.

akshat2602 · 2025-08-18T17:56:41Z

I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat.

countradooku · 2026-01-04T20:27:49Z

Stole my idea :(

goniz · 2026-01-22T15:20:15Z

How is this even possible for such an awesome PR to be left like this?

Copilot

Pull request overview

This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.

Changes:

Added ROCm backend infrastructure with device management, memory allocation, and stream handling
Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
Updated build system (CMake) to support ROCm compilation with configurable GPU architectures

Reviewed changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
CMakeLists.txt	Added MLX_BUILD_ROCM option and ROCm library detection
mlx/CMakeLists.txt	Integrated ROCm backend build configuration
mlx/device.cpp	Added ROCm device availability checks
mlx/backend/rocm/*.hip	HIP kernel implementations for various operations
mlx/backend/rocm/device.*	ROCm device and stream management
mlx/backend/rocm/allocator.*	ROCm-specific memory allocator using HIP unified memory
mlx/backend/rocm/worker.*	Async task execution worker for stream synchronization
mlx/backend/rocm/utils.*	HIP utility functions and error handling
mlx/backend/rocm/jit_module.*	JIT compilation support using HIPRTC
mlx/backend/rocm/device/*.hpp	Device-side utility functions and type definitions
mlx/backend/rocm/CMakeLists.txt	ROCm backend build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlx/backend/rocm/softmax.hip

mlx/backend/rocm/device.cpp

mlx/backend/rocm/layer_norm.hip

mlx/backend/rocm/rope.hip

mlx/backend/rocm/softmax.hip

mlx/backend/rocm/allocator.cpp

CMakeLists.txt

mlx/backend/rocm/binary.hip

mlx/backend/rocm/rms_norm.hip

mlx/backend/rocm/layer_norm.hip

…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.

goniz · 2026-01-24T17:42:45Z

👑👑👑

NripeshN · 2026-01-24T18:12:04Z

Can anyone run

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .

Replace {based on your GPU} with your GPU architecture

You can run

rocm-smi

to get your GPU information

goniz · 2026-01-24T18:49:41Z

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .

      -- Configuring done (4.8s)
      CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
      Please set them or make sure they are set and tested correctly in the CMake files:
      /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
         used as include directory in directory /home/goniz/Work/mlx
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      -- Generating done (0.0s)
      CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

NripeshN · 2026-01-25T00:54:26Z

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .

     -- Configuring done (4.8s)
     CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
     Please set them or make sure they are set and tested correctly in the CMake files:
     /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
        used as include directory in directory /home/goniz/Work/mlx
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     -- Generating done (0.0s)
     CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅

… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.

goniz · 2026-01-25T02:18:36Z

  Created wheel for mlx: filename=mlx-0.30.4.dev20260125+cadf18c1-0.editable-cp314-cp314-linux_x86_64.whl size=4722 sha256=72c664adbfc4fb9ec317522a8d83b84f85d599d08bd691d7fec3abfdb6f3a5e9
  Stored in directory: /tmp/pip-ephem-wheel-cache-nt7w6bq0/wheels/8a/63/d1/d7d629a5ff73457822bb71aa527c083674bb19ca314735cd05
Successfully built mlx
Installing collected packages: mlx
Successfully installed mlx-0.30.4.dev20260125+cadf18c1

Now what can I test? 😍

goniz · 2026-01-25T02:21:36Z

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

NripeshN · 2026-01-26T04:32:13Z

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

goniz · 2026-01-26T13:04:33Z


mlx rocm-support ? ❯︎ python3 qwen3.py 
Fetching 9 files: 100%|███████| 9/9 [00:00<00:00, 42799.02it/s]
Download complete: : 0.00B [00:00, ?B/s]              ?, ?it/s]
==========
/usr/lib/python3.14/multiprocessing/resource_tracker.py:396: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/mp-ofwi__8o'}
  warnings.warn(
Segmentation fault         (core dumped) python3 qwen3.py

[Mon Jan 26 15:02:44 2026] python3[278273]: segfault at e7 ip 00007fda1b828270 sp 00007ffdbf572518 error 4 in libmlx.so[1228270,7fda1aa81000+11b0000] likely on CPU 1 (core 1, socket 0)
[Mon Jan 26 15:02:44 2026] Code: 26 ff 48 89 c3 49 39 c4 0f 85 1c ff ff ff e9 68 ff ff ff e8 c2 f1 25 ff e9 51 51 2e ff 90 90 66 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b 07 48 85 c0 74 03 48 8b 00 c3 0f 1f 40 00 31 c0 c3 90 90 66
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd6c2c00000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd72ee00000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd773400000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd777200000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0x7fd778a00000, queue evicted

NripeshN · 2026-01-26T13:17:14Z

I can try other models, just randomly chose qwen3-0.6b

Could you maybe try: mlx-community/Meta-Llama-3.1-8B-Instruct-bf16

goniz · 2026-01-26T13:21:17Z


mlx rocm-support ? ❯︎ mlx_lm.chat --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16
Fetching 9 files: 100%|██████████| 9/9 [02:02<00:00, 13.58s/it]
Download complete: : 16.1GB [02:05, 128MB/s]              s/it]
[INFO] Starting chat session with mlx-community/Meta-Llama-3.1-8B-Instruct-bf16.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> hi
Traceback (most recent call last):
  File "/home/goniz/Work/mlx/venv/bin/mlx_lm.chat", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/chat.py", line 146, in main
    for response in stream_generate(
                    ~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<12 lines>...
        prompt_cache=prompt_cache,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 699, in stream_generate
    for n, (token, logprobs, from_draft) in enumerate(token_generator):
                                            ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 689, in <genexpr>
    (token, logprobs, False) for token, logprobs in token_generator
                                                    ^^^^^^^^^^^^^^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 432, in generate_step
    mx.eval([c.state for c in prompt_cache])
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cross-type copy not yet fully implemented for ROCm.

goniz · 2026-01-27T19:53:18Z

mlx/mlx-fork/mlx/backend/rocm/eval.cpp:4
:10: fatal error: mlx/backend/gpu/available.h: No such file or 
directory                                                      
          4 | #include "mlx/backend/gpu/available.h"           
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~           
      compilation terminated.

NripeshN · 2026-01-31T20:57:59Z

I will get back to this in a bit😁

Geramy · 2026-02-02T20:52:46Z

I also have a Halo Strix, I have it setup in a C++ project, i'm unable to compile with your branch either.

[main] Building folder: /home/geramyl/Documents/Programming/ryzenai-server/build 
[build] Starting build
[proc] Executing command: /usr/bin/cmake --build /home/geramyl/Documents/Programming/ryzenai-server/build --config Release --target ryzenai-server --
[build] [33/149   0% :: 0.429] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip 
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/scaled_dot_product_attention.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/scaled_dot_product_attention.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   4% :: 0.432] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip:3:10: fatal error: 'mlx/backend/common/binary.h' file not found
[build]     3 | #include "mlx/backend/common/binary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   4% :: 0.432] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   5% :: 0.436] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip:3:10: fatal error: 'mlx/backend/common/unary.h' file not found
[build]     3 | #include "mlx/backend/common/unary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   6% :: 0.437] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip:3:10: fatal error: 'mlx/backend/common/utils.h' file not found
[build]     3 | #include "mlx/backend/common/utils.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   6% :: 0.437] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce/col_reduce.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce/col_reduce.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   8% :: 0.445] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   8% :: 0.448] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/event.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip:3:10: fatal error: 'mlx/backend/rocm/allocator.h' file not found
[build]     3 | #include "mlx/backend/rocm/allocator.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   9% :: 0.449] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  10% :: 0.449] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/random.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/random.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/random.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/convert_fp8.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/convert_fp8.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  11% :: 0.450] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  12% :: 0.452] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  12% :: 0.453] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip:3:10: fatal error: 'mlx/backend/common/utils.h' file not found
[build]     3 | #include "mlx/backend/common/utils.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  13% :: 0.453] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/layer_norm.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/layer_norm.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  16% :: 0.460] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip:3:10: fatal error: 'mlx/backend/common/ternary.h' file not found
[build]     3 | #include "mlx/backend/common/ternary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  17% :: 0.467] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  18% :: 0.475] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  18% :: 0.500] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  19% :: 0.532] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip:3:10: fatal error: 'mlx/backend/common/binary.h' file not found
[build]     3 | #include "mlx/backend/common/binary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  20% :: 0.562] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/sort.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] ninja: build stopped: subcommand failed.
[proc] The command: /usr/bin/cmake --build /home/geramyl/Documents/Programming/ryzenai-server/build --config Release --target ryzenai-server -- exited with code: 1
[driver] Build completed: 00:00:00.841
[build] Build finished with exit code 1

Geramy · 2026-02-02T21:27:41Z

The problem looks like its stemming from the CMakeLists.txt
AT

# Build include flags
set(HIP_INCLUDE_FLAGS "-I${CMAKE_SOURCE_DIR}" "-I${HIP_INCLUDE_DIRS}")

Geramy · 2026-02-03T16:33:09Z

I have submitted a PR to the ROCm-support branch that fixes these compile errors.

NripeshN · 2026-02-03T17:19:26Z

Just got my hands on Radeon Pro V520, should be able to test things out now😏

- Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR for correct path resolution - Add GCC C++ standard library include paths for HIP compiler - ROCm's clang needs explicit paths to libstdc++ headers

Geramy · 2026-02-03T18:35:12Z

Just got my hands on Radeon Pro V520, should be able to test things out now😏

Awesome there is a is_available in eval.cpp that doesn’t need to be there :)

- Replace rocPRIM-based sort with custom block merge sort - Avoids rocPRIM uninitialized_array compatibility issues with ROCm 7.x - Mirrors CUDA sort implementation approach

Geramy · 2026-02-03T18:41:07Z

Here is some profiling information.

^CW20260203 10:08:04.353332 126298330114368 tool.cpp:3105] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 caught signal 2...
W20260203 10:08:04.353396 126298330114368 tool.cpp:3128] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 will wait for 0 children to exit
W20260203 10:08:04.353403 126298330114368 tool.cpp:3143] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 finalizing after signal 2...
W20260203 10:08:04.375341 126298330114368 correlation_id.cpp:231] retiring dangling correlation ID 21188 from thread 909312 :: remaining reference count: 1
W20260203 10:08:04.387442 126298330114368 generateRocpd.cpp:583] writing SQL database for process 909286 on node 396112738
E20260203 10:08:04.387864 126298330114368 generateRocpd.cpp:606] Opened result file: /home/geramyl/Documents/Programming/ryzenai-server/geramyl-MS-S1-MAX/909286_results.db (UUID=0000ce95-f006-7006-9abb-ffe296071d45)
W20260203 10:08:05.444378 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.103665 sec
W20260203 10:08:05.466962 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.022550 sec
W20260203 10:08:05.479344 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.012372 sec
W20260203 10:08:05.524184 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.015430 sec
W20260203 10:08:05.524194 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.000002 sec
W20260203 10:08:05.626345 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.102148 sec
W20260203 10:08:06.604940 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.978583 sec
W20260203 10:08:06.604968 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.000000 sec
W20260203 10:08:06.604970 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
W20260203 10:08:06.604973 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
W20260203 10:08:06.604976 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
W20260203 10:08:06.605060 126298330114368 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000081 sec
W20260203 10:08:06.607686 126298330114368 simple_timer.cpp:55] SQLite3 generation :: total                    ::     2.220244 sec

    ROCPROFV3 SUMMARY:

    |                   NAME                   |    DOMAIN    |      CALLS      | DURATION (nsec) | AVERAGE (nsec)  | PERCENT (INC) |   MIN (nsec)    |   MAX (nsec)    |     STDDEV      |
    |------------------------------------------|--------------|-----------------|-----------------|-----------------|---------------|-----------------|-----------------|-----------------|
    | hipLaunchKernel                          | HIP_API      |              20 |       147065952 |       7.353e+06 |     63.436500 |             892 |       117336876 |       2.621e+07 |
    | hipStreamCreateWithFlags                 | HIP_API      |               2 |        27579283 |       1.379e+07 |     11.896249 |         5376108 |        22203175 |       1.190e+07 |
    | hipGetDeviceCount                        | HIP_API      |              19 |        22693835 |       1.194e+06 |      9.788924 |              20 |        22684118 |       5.204e+06 |
    | hipMallocManaged                         | HIP_API      |              28 |        14909837 |       5.325e+05 |      6.431318 |            8015 |         9543318 |       1.780e+06 |
    | __hipRegisterFunction                    | HIP_API      |           20890 |         8000032 |       3.830e+02 |      3.450792 |              60 |         2054986 |       2.012e+04 |
    | hipMemcpy                                | HIP_API      |              21 |         7127456 |       3.394e+05 |      3.074409 |           20278 |         6213199 |       1.346e+06 |
    | hipMemAdvise                             | HIP_API      |               1 |         2297511 |       2.298e+06 |      0.991025 |         2297511 |         2297511 |       0.000e+00 |
    | hipMalloc                                | HIP_API      |              22 |         1336490 |       6.075e+04 |      0.576491 |             330 |         1161530 |       2.479e+05 |
    | hipEventSynchronize                      | HIP_API      |               3 |          440456 |       1.468e+05 |      0.189990 |          115887 |          203382 |       4.906e+04 |
    | hipLaunchHostFunc                        | HIP_API      |               3 |          177873 |       5.929e+04 |      0.076725 |           28072 |           76613 |       2.709e+04 |
    | hipStreamQuery                           | HIP_API      |               1 |           40275 |       4.028e+04 |      0.017373 |           40275 |           40275 |       0.000e+00 |
    | hipEventRecord                           | HIP_API      |               3 |           38542 |       1.285e+04 |      0.016625 |           10500 |           14387 |       2.066e+03 |
    | __hipRegisterFatBinary                   | HIP_API      |             107 |           27101 |       2.533e+02 |      0.011690 |              50 |            9839 |       9.755e+02 |
    | hipStreamSynchronize                     | HIP_API      |              13 |           21459 |       1.651e+03 |      0.009256 |             150 |           10259 |       2.769e+03 |
    | hipMemGetInfo                            | HIP_API      |               1 |           19797 |       1.980e+04 |      0.008539 |           19797 |           19797 |       0.000e+00 |
    | __hipPushCallConfiguration               | HIP_API      |              21 |           14900 |       7.095e+02 |      0.006427 |              30 |            6633 |       1.496e+03 |
    | hipStreamIsCapturing                     | HIP_API      |               1 |            9077 |       9.077e+03 |      0.003915 |            9077 |            9077 |       0.000e+00 |
    | __hipPopCallConfiguration                | HIP_API      |              21 |            8867 |       4.222e+02 |      0.003825 |              30 |            1643 |       5.839e+02 |
    | hipEventQuery                            | HIP_API      |               3 |            7002 |       2.334e+03 |      0.003020 |             140 |            4037 |       1.994e+03 |
    | hipEventCreateWithFlags                  | HIP_API      |               2 |            5351 |       2.676e+03 |      0.002308 |            2315 |            3036 |       5.098e+02 |
    | hipSetDevice                             | HIP_API      |               1 |            4379 |       4.379e+03 |      0.001889 |            4379 |            4379 |       0.000e+00 |
    | hipGetDevicePropertiesR0600              | HIP_API      |               2 |            2896 |       1.448e+03 |      0.001249 |             421 |            2475 |       1.452e+03 |
    | __hipRegisterVar                         | HIP_API      |               1 |            1733 |       1.733e+03 |      0.000748 |            1733 |            1733 |       0.000e+00 |
    | hipGetDevice                             | HIP_API      |               1 |            1653 |       1.653e+03 |      0.000713 |            1653 |            1653 |       0.000e+00 |

W20260203 10:08:06.609210 126298330114368 simple_timer.cpp:55] [rocprofv3] output generation ::     2.233775 sec
W20260203 10:08:06.609620 126298330114368 simple_timer.cpp:55] [rocprofv3] tool finalization ::     2.234263 sec

The command I used.

/opt/rocm/bin/rocprofv3 --hip-trace --marker-trace -S --

- Add Limits struct to device/utils.hpp for sort operations - Add missing numeric_limits specializations for int8, uint8, int16, uint16, bool - Fix C++20 lambda syntax to be C++17 compatible

….cpp - Remove mlx/backend/gpu/available.h include (doesn't exist) - Remove is_available() function (already defined elsewhere) Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>

- Implement gpu::device_info(), gpu::device_count(), gpu::is_available() - Provides device name, architecture, UUID, PCI bus ID, memory info - Uses hipGetDeviceProperties and hipMemGetInfo for AMD GPU info - Mirrors CUDA device_info.cpp implementation Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>

- Add mlx/memory.h include to ensure MLX_API visibility attributes are applied to memory function definitions - Fixes undefined symbol errors for reset_peak_memory and other memory management functions Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>

- Add (void) casts to suppress nodiscard warnings for HIP API calls (hipMalloc, hipMemcpy, hipFree, hipStreamSynchronize, etc.) - Fix implicit float-to-bool conversion warnings in unary_ops.hpp (Erf, ErfInv, Expm1) and binary_ops.hpp (ArcTan2) - Add explicit type checks for bool/integral types before float operations

- Add (void) casts for hipMemsetAsync and hipMemcpyAsync calls in: - conv/gemm_conv.cpp - random.hip - reduce/init_reduce.hip - scaled_dot_product_attention.hip

- Add python/src/rocm.cpp with mx.rocm.is_available() function - Add python/tests/rocm_skip.py with tests to skip for ROCm backend - Update mlx_tests.py to detect ROCm backend and use appropriate skip list - Update CMakeLists.txt to include rocm.cpp and rocm.pyi stub The ROCm skip list includes: - Same tests as CUDA (FFT, linalg, hadamard, etc.) - ROCm-specific: grouped convolution, 1D/3D convolution, input dilation - Quantization tests (different support level than CUDA)

Geramy · 2026-02-03T19:45:49Z

I am running the Phi3 Kernel I had made, which works fine on MacOS with the ROCm experimental build.
Getting a weird error here.

signal SIGSEGV: address not mapped to object (fault address: 0x0)

  if (!ptr_) {
    return nullptr;
  }
  return static_cast<rocm::RocmBuffer*>(ptr_)->data;
}


*** Aborted at 1770147905 (unix time) try "date -d @1770147905" if you are using GNU date ***
PC: @     0x59752d092c20 mlx::core::allocator::Buffer::raw_ptr()
*** SIGSEGV (@0x0) received by PID 930345 (TID 0x7ea3c49fd6c0) from PID 0; stack trace: ***
    @     0x7ea4e5ea1ed3 (unknown)
    @     0x7ea4eb34224e google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7ea4e5e45330 (unknown)
    @     0x59752d092c20 mlx::core::allocator::Buffer::raw_ptr()
    @     0x59752d34a587 mlx::core::fast::RoPE::eval_gpu(std::vector<mlx::core::array, std::allocator<mlx::core::array> > const&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&)
    @     0x59752d0ab18a mlx::core::gpu::eval(mlx::core::array&)
    @     0x59752c6ad338 mlx::core::eval_impl(std::vector<mlx::core::array, std::allocator<mlx::core::array> >, bool)
    @     0x59752c6adebb mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array> >)
    @     0x59752c4871b5 mlx::core::array::eval()
    @     0x59752c40c8e2 int mlx::core::array::item<int>()
    @     0x59752c450786 Phi3Inference::sample_token(mlx::core::array const&, MlxOgaGeneratorParams const&)
    @     0x59752c4093e4 MlxOgaGenerator::GenerateNextToken()
    @     0x59752c4815e6 ryzenai::MlxBackend::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*)
    
    MORE DETAILED STACK

mlx::core::allocator::Buffer::raw_ptr() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/allocator.cpp:319)
mlx::core::fast::RoPE::eval_gpu(std::vector<mlx::core::array, std::allocator<mlx::core::array>> const&, std::vector<mlx::core::array, std::allocator<mlx::core::array>>&) (Unknown Source:0)
mlx::core::gpu::eval(mlx::core::array&) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/eval.cpp:28)
mlx::core::eval_impl(std::vector<mlx::core::array, std::allocator<mlx::core::array>>, bool) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/transforms.cpp:237)
mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array>>) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/transforms.cpp:324)
mlx::core::array::eval() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/array.cpp:156)
int mlx::core::array::item<int>() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/array.h:570)
Phi3Inference::sample_token(mlx::core::array const&, MlxOgaGeneratorParams const&) (/home/geramyl/Documents/Programming/ryzenai-server/src/mlx/models/phi3_inference.cpp:320)
MlxOgaGenerator::GenerateNextToken() (/home/geramyl/Documents/Programming/ryzenai-server/src/mlx/mlx_oga.cpp:593)
ryzenai::MlxBackend::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*) (/home/geramyl/Documents/Programming/ryzenai-server/src/backend/mlx_backend.cpp:295)
ryzenai::InferenceEngine::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*) (/home/geramyl/Documents/Programming/ryzenai-server/src/inference_engine.cpp:445)
ryzenai::RyzenAIServer::handleChatCompletions(httplib::Request const&, httplib::Response&) (/home/geramyl/Documents/Programming/ryzenai-server/src/server.cpp:557)
ryzenai::RyzenAIServer::setupRoutes()::$_3::operator()(httplib::Request const&, httplib::Response&) const (/home/geramyl/Documents/Programming/ryzenai-server/src/server.cpp:213)
void std::__invoke_impl<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>(std::__invoke_other, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/invoke.h:61)
std::enable_if<is_invocable_r_v<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>, void>::type std::__invoke_r<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>(ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/invoke.h:111)
std::_Function_handler<void (httplib::Request const&, httplib::Response&), ryzenai::RyzenAIServer::setupRoutes()::$_3>::_M_invoke(std::_Any_data const&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/std_function.h:290)
std::function<void (httplib::Request const&, httplib::Response&)>::operator()(httplib::Request const&, httplib::Response&) const (/usr/include/c++/13/bits/std_function.h:591)
httplib::Server::dispatch_request(httplib::Request&, httplib::Response&, std::vector<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_delete<httplib::detail::MatcherBase>>, std::function<void (httplib::Request const&, httplib::Response&)>>, std::allocator<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_delete<httplib::detail::MatcherBase>>, std::function<void (httplib::Request const&, httplib::Response&)>>>> const&) const (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8112)
httplib::Server::routing(httplib::Request&, httplib::Response&, httplib::Stream&) (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8087)
httplib::Server::process_request(httplib::Stream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, int, bool, bool&, std::function<void (httplib::Request&)> const&) (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8375)
    ```

The function needs the MLX_API attribute to be exported from the shared library so it can be called from Python bindings.

Some AMD GPUs (like the Radeon Pro V520) report managed memory support but hipMallocManaged fails with "out of memory" even for small allocations. This change adds a runtime check that tests if managed memory actually works, and falls back to regular hipMalloc if it doesn't.

NripeshN · 2026-02-03T20:19:11Z

I am running the Phi3 Kernel I had made, which works fine on MacOS with the ROCm experimental build. Getting a weird error here.

Yup a lot of errors on my end to. Earlier I had just tried eyeballing the implementation trying to copy the structure from cuda and check for compilation errors through docker. I did not have AMD GPUs before this, now that I have AMD GPU I will incrementally patch all the errors.

When hipMallocManaged fails (which happens on some AMD GPUs like the Radeon Pro V520), fall back to hipHostMalloc instead of hipMalloc. hipHostMalloc allocates pinned host memory that is accessible from both CPU and GPU, which is required because MLX's array initialization code uses std::copy to write data directly to the allocated buffer from CPU. Regular hipMalloc allocates device-only memory that cannot be accessed from CPU code, causing segfaults when std::copy tries to write to it.

AMD GPUs have different wavefront (warp) sizes depending on architecture: - CDNA/GCN (gfx9xx and earlier): 64 - RDNA (gfx10xx, gfx11xx): 32 The previous code hardcoded WARP_SIZE=64 everywhere, which caused incorrect results on RDNA GPUs like the Radeon Pro V520 (gfx1011). This change: 1. Updates device/config.h to detect the target architecture and set WARP_SIZE appropriately using __AMDGCN_WAVEFRONT_SIZE__ or architecture detection macros 2. Updates all kernel files to use the centralized WARP_SIZE definition instead of local hardcoded values

[Experiment] ROCM backend initial push

8bb8b76

NripeshN changed the title ~~[Experiment] ROCm backend initial push~~ [Experiment] ROCm backend Jun 16, 2025

NripeshN added 2 commits June 19, 2025 00:33

increment 1: few ops and jit update

ac5adfa

Increment 2: Implement major ops and add structure similar to cuda

cc4de6a

NripeshN mentioned this pull request Sep 12, 2025

Add ROCm Support for AMD GPUs #2556

Open

Merge remote-tracking branch 'upstream/main' into rocm-support

1163da1

Copilot AI review requested due to automatic review settings January 24, 2026 17:08

Copilot started reviewing on behalf of NripeshN January 24, 2026 17:09 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

NripeshN added 2 commits January 24, 2026 17:29

rocm yaay

667cd9b

NripeshN and others added 2 commits January 24, 2026 18:03

chore fix cmake

63d6b6a

Merge branch 'main' into rocm-support

7c1b29d

compile fix

ee8b705

NripeshN added 2 commits January 25, 2026 01:18

Refactor error handling in ROCm backend to use std::ostringstream for…

9aa0f5c

… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.

lint

cadf18c

add more features

6fa7c7c

Merge remote-tracking branch 'upstream/main' into rocm-support

1c74fba

Fix HIP include paths for C++ standard library headers

04efa16

- Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR for correct path resolution - Add GCC C++ standard library include paths for HIP compiler - ROCm's clang needs explicit paths to libstdc++ headers

Rewrite ROCm sort with custom merge sort implementation

bf993f8

- Replace rocPRIM-based sort with custom block merge sort - Avoids rocPRIM uninitialized_array compatibility issues with ROCm 7.x - Mirrors CUDA sort implementation approach

NripeshN and others added 7 commits February 3, 2026 18:44

Fix ROCm sort compilation errors

b76745e

- Add Limits struct to device/utils.hpp for sort operations - Add missing numeric_limits specializations for int8, uint8, int16, uint16, bool - Fix C++20 lambda syntax to be C++17 compatible

Remove duplicate is_available() and unavailable header from ROCm eval…

969fd0b

….cpp - Remove mlx/backend/gpu/available.h include (doesn't exist) - Remove is_available() function (already defined elsewhere) Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>

Fix remaining ROCm backend compiler warnings

04b2e8d

- Add (void) casts for hipMemsetAsync and hipMemcpyAsync calls in: - conv/gemm_conv.cpp - random.hip - reduce/init_reduce.hip - scaled_dot_product_attention.hip

NripeshN added 2 commits February 3, 2026 19:53

Add MLX_API to rocm::is_available() for proper symbol export

9af0755

The function needs the MLX_API attribute to be exported from the shared library so it can be called from Python bindings.

NripeshN added 5 commits February 3, 2026 20:40

Fix macro conflicts in WARP_SIZE and MAX_NDIM definitions

467fb00

Fix WARP_SIZE_ROW namespace reference

4545bac

Fix MAX_NDIM macro reference in compiled.cpp

6e6d837

[Experiment] ROCm backend #2300

Are you sure you want to change the base?

[Experiment] ROCm backend #2300

Conversation

NripeshN commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lin72h commented Jun 17, 2025

Uh oh!

NripeshN commented Jun 18, 2025

Uh oh!

angeloskath commented Jun 24, 2025

Uh oh!

akshat2602 commented Aug 18, 2025

Uh oh!

countradooku commented Jan 4, 2026

Uh oh!

goniz commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goniz commented Jan 24, 2026

Uh oh!

NripeshN commented Jan 24, 2026

Uh oh!

goniz commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NripeshN commented Jan 25, 2026

Uh oh!

goniz commented Jan 25, 2026

Uh oh!

goniz commented Jan 25, 2026

Uh oh!

NripeshN commented Jan 26, 2026

Uh oh!

goniz commented Jan 26, 2026

Uh oh!

NripeshN commented Jan 26, 2026

Uh oh!

goniz commented Jan 26, 2026

Uh oh!

goniz commented Jan 27, 2026

Uh oh!

NripeshN commented Jan 31, 2026

Uh oh!

Geramy commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Geramy commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Geramy commented Feb 3, 2026

Uh oh!

NripeshN commented Feb 3, 2026

Uh oh!

Geramy commented Feb 3, 2026

Uh oh!

Geramy commented Feb 3, 2026

Uh oh!

Geramy commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NripeshN commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

NripeshN commented Jun 16, 2025 •

edited

Loading

goniz commented Jan 24, 2026 •

edited

Loading

Geramy commented Feb 2, 2026 •

edited

Loading

Geramy commented Feb 2, 2026 •

edited

Loading

Geramy commented Feb 3, 2026 •

edited

Loading