Skip to content

Conversation

@NripeshN
Copy link
Contributor

@NripeshN NripeshN commented Jun 16, 2025

Experiment with ROCm backend.

install MLX with ROCm backend using:

mkdir build && cd build
cmake -DMLX_BUILD_ROCM=ON \
      -DCMAKE_PREFIX_PATH=/opt/rocm \
      -DCMAKE_HIP_ARCHITECTURES="gfx90a;gfx1100" \
      ..
make -j$(nproc)

closes #2556

Inspired by @zcbenz

@NripeshN NripeshN changed the title [Experiment] ROCm backend initial push [Experiment] ROCm backend Jun 16, 2025
@lin72h
Copy link

lin72h commented Jun 17, 2025

What an unexpected and amazing surprise! I'm absolutely thrilled.

@NripeshN
Copy link
Contributor Author

@awni
What do you think of this PR? Does this have the potential to be merged into main? I can turn this PR from experimental to WIP if so.

@angeloskath
Copy link
Member

I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it.

@akshat2602
Copy link

I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat.

@countradooku
Copy link

Stole my idea :(

@goniz
Copy link

goniz commented Jan 22, 2026

How is this even possible for such an awesome PR to be left like this?

Copilot AI review requested due to automatic review settings January 24, 2026 17:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.

Changes:

  • Added ROCm backend infrastructure with device management, memory allocation, and stream handling
  • Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
  • Updated build system (CMake) to support ROCm compilation with configurable GPU architectures

Reviewed changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
CMakeLists.txt Added MLX_BUILD_ROCM option and ROCm library detection
mlx/CMakeLists.txt Integrated ROCm backend build configuration
mlx/device.cpp Added ROCm device availability checks
mlx/backend/rocm/*.hip HIP kernel implementations for various operations
mlx/backend/rocm/device.* ROCm device and stream management
mlx/backend/rocm/allocator.* ROCm-specific memory allocator using HIP unified memory
mlx/backend/rocm/worker.* Async task execution worker for stream synchronization
mlx/backend/rocm/utils.* HIP utility functions and error handling
mlx/backend/rocm/jit_module.* JIT compilation support using HIPRTC
mlx/backend/rocm/device/*.hpp Device-side utility functions and type definitions
mlx/backend/rocm/CMakeLists.txt ROCm backend build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.
@goniz
Copy link

goniz commented Jan 24, 2026

👑👑👑

@NripeshN
Copy link
Contributor Author

Can anyone run

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .

Replace {based on your GPU} with your GPU architecture

You can run

rocm-smi

to get your GPU information

@goniz
Copy link

goniz commented Jan 24, 2026

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .

      -- Configuring done (4.8s)
      CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
      Please set them or make sure they are set and tested correctly in the CMake files:
      /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
         used as include directory in directory /home/goniz/Work/mlx
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      -- Generating done (0.0s)
      CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

@NripeshN
Copy link
Contributor Author

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .
     -- Configuring done (4.8s)
     CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
     Please set them or make sure they are set and tested correctly in the CMake files:
     /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
        used as include directory in directory /home/goniz/Work/mlx
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     -- Generating done (0.0s)
     CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅

… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.
@goniz
Copy link

goniz commented Jan 25, 2026

  Created wheel for mlx: filename=mlx-0.30.4.dev20260125+cadf18c1-0.editable-cp314-cp314-linux_x86_64.whl size=4722 sha256=72c664adbfc4fb9ec317522a8d83b84f85d599d08bd691d7fec3abfdb6f3a5e9
  Stored in directory: /tmp/pip-ephem-wheel-cache-nt7w6bq0/wheels/8a/63/d1/d7d629a5ff73457822bb71aa527c083674bb19ca314735cd05
Successfully built mlx
Installing collected packages: mlx
Successfully installed mlx-0.30.4.dev20260125+cadf18c1

Now what can I test? 😍

@goniz
Copy link

goniz commented Jan 25, 2026

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

@NripeshN
Copy link
Contributor Author

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

@goniz
Copy link

goniz commented Jan 26, 2026


mlx rocm-support ? ❯︎ python3 qwen3.py 
Fetching 9 files: 100%|███████| 9/9 [00:00<00:00, 42799.02it/s]
Download complete: : 0.00B [00:00, ?B/s]              ?, ?it/s]
==========
/usr/lib/python3.14/multiprocessing/resource_tracker.py:396: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/mp-ofwi__8o'}
  warnings.warn(
Segmentation fault         (core dumped) python3 qwen3.py
[Mon Jan 26 15:02:44 2026] python3[278273]: segfault at e7 ip 00007fda1b828270 sp 00007ffdbf572518 error 4 in libmlx.so[1228270,7fda1aa81000+11b0000] likely on CPU 1 (core 1, socket 0)
[Mon Jan 26 15:02:44 2026] Code: 26 ff 48 89 c3 49 39 c4 0f 85 1c ff ff ff e9 68 ff ff ff e8 c2 f1 25 ff e9 51 51 2e ff 90 90 66 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b 07 48 85 c0 74 03 48 8b 00 c3 0f 1f 40 00 31 c0 c3 90 90 66
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd6c2c00000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd72ee00000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd773400000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0
x7fd777200000, queue evicted
[Mon Jan 26 15:02:46 2026] amdgpu: Freeing queue vital buffer 0x7fd778a00000, queue evicted

@NripeshN
Copy link
Contributor Author

I can try other models, just randomly chose qwen3-0.6b

Could you maybe try: mlx-community/Meta-Llama-3.1-8B-Instruct-bf16

@goniz
Copy link

goniz commented Jan 26, 2026


mlx rocm-support ? ❯︎ mlx_lm.chat --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16
Fetching 9 files: 100%|██████████| 9/9 [02:02<00:00, 13.58s/it]
Download complete: : 16.1GB [02:05, 128MB/s]              s/it]
[INFO] Starting chat session with mlx-community/Meta-Llama-3.1-8B-Instruct-bf16.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> hi
Traceback (most recent call last):
  File "/home/goniz/Work/mlx/venv/bin/mlx_lm.chat", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/chat.py", line 146, in main
    for response in stream_generate(
                    ~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<12 lines>...
        prompt_cache=prompt_cache,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 699, in stream_generate
    for n, (token, logprobs, from_draft) in enumerate(token_generator):
                                            ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 689, in <genexpr>
    (token, logprobs, False) for token, logprobs in token_generator
                                                    ^^^^^^^^^^^^^^^
  File "/home/goniz/Work/mlx/venv/lib/python3.14/site-packages/mlx_lm/generate.py", line 432, in generate_step
    mx.eval([c.state for c in prompt_cache])
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cross-type copy not yet fully implemented for ROCm.

@goniz
Copy link

goniz commented Jan 27, 2026

mlx/mlx-fork/mlx/backend/rocm/eval.cpp:4
:10: fatal error: mlx/backend/gpu/available.h: No such file or 
directory                                                      
          4 | #include "mlx/backend/gpu/available.h"           
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~           
      compilation terminated.                                  

@NripeshN
Copy link
Contributor Author

I will get back to this in a bit😁

@Geramy
Copy link

Geramy commented Feb 2, 2026

I also have a Halo Strix, I have it setup in a C++ project, i'm unable to compile with your branch either.

[main] Building folder: /home/geramyl/Documents/Programming/ryzenai-server/build 
[build] Starting build
[proc] Executing command: /usr/bin/cmake --build /home/geramyl/Documents/Programming/ryzenai-server/build --config Release --target ryzenai-server --
[build] [33/149   0% :: 0.429] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/logsumexp.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/logsumexp.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip 
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/scaled_dot_product_attention.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/scaled_dot_product_attention.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   4% :: 0.432] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip:3:10: fatal error: 'mlx/backend/common/binary.h' file not found
[build]     3 | #include "mlx/backend/common/binary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary_two.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary_two.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   4% :: 0.432] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_input.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_input.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   5% :: 0.436] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip:3:10: fatal error: 'mlx/backend/common/unary.h' file not found
[build]     3 | #include "mlx/backend/common/unary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/unary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/unary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   6% :: 0.437] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip:3:10: fatal error: 'mlx/backend/common/utils.h' file not found
[build]     3 | #include "mlx/backend/common/utils.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/arg_reduce.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/arg_reduce.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   6% :: 0.437] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/rms_norm.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/rms_norm.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce/col_reduce.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce/col_reduce.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   8% :: 0.445] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_contiguous.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_contiguous.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   8% :: 0.448] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/event.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip:3:10: fatal error: 'mlx/backend/rocm/allocator.h' file not found
[build]     3 | #include "mlx/backend/rocm/allocator.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/event.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/event.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149   9% :: 0.449] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/fp_quantize.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/fp_quantize.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  10% :: 0.449] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/random.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/random.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/random.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/random.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/convert_fp8.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/convert_fp8.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  11% :: 0.450] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  12% :: 0.452] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/affine_quantize.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/affine_quantize.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  12% :: 0.453] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip:3:10: fatal error: 'mlx/backend/common/utils.h' file not found
[build]     3 | #include "mlx/backend/common/utils.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  13% :: 0.453] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/reduce.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/reduce.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/layer_norm.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/layer_norm.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  16% :: 0.460] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip:3:10: fatal error: 'mlx/backend/common/ternary.h' file not found
[build]     3 | #include "mlx/backend/common/ternary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/ternary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/ternary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  17% :: 0.467] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip:3:10: fatal error: 'mlx/backend/rocm/copy/copy.hpp' file not found
[build]     3 | #include "mlx/backend/rocm/copy/copy.hpp"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/copy/copy_general_dynamic.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/copy/copy_general_dynamic.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  18% :: 0.475] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/gemms/gemv.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/gemms/gemv.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  18% :: 0.500] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip:3:10: fatal error: 'mlx/backend/rocm/quantized/quantized.h' file not found
[build]     3 | #include "mlx/backend/rocm/quantized/quantized.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/quantized/qmm.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/quantized/qmm.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  19% :: 0.532] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip:3:10: fatal error: 'mlx/backend/common/binary.h' file not found
[build]     3 | #include "mlx/backend/common/binary.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/binary.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/binary.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] [33/149  20% :: 0.562] Compiling HIP source /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip
[build] FAILED: _deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o 
[build] cd /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm && /opt/rocm/bin/hipcc -c /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip -o /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/indexing.o -fPIC -DMLX_USE_ROCM --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/indexing.hip:3:10: fatal error: 'mlx/backend/rocm/device.h' file not found
[build]     3 | #include "mlx/backend/rocm/device.h"
[build]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated when compiling for gfx1030.
[build] failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -O3  -c -x hip /home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/sort.hip -o "/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/home/geramyl/Documents/Programming/ryzenai-server -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -std=c++17
[build] ninja: build stopped: subcommand failed.
[proc] The command: /usr/bin/cmake --build /home/geramyl/Documents/Programming/ryzenai-server/build --config Release --target ryzenai-server -- exited with code: 1
[driver] Build completed: 00:00:00.841
[build] Build finished with exit code 1

@Geramy
Copy link

Geramy commented Feb 2, 2026

The problem looks like its stemming from the CMakeLists.txt
AT

# Build include flags
set(HIP_INCLUDE_FLAGS "-I${CMAKE_SOURCE_DIR}" "-I${HIP_INCLUDE_DIRS}")

@Geramy
Copy link

Geramy commented Feb 3, 2026

I have submitted a PR to the ROCm-support branch that fixes these compile errors.

@NripeshN
Copy link
Contributor Author

NripeshN commented Feb 3, 2026

Just got my hands on Radeon Pro V520, should be able to test things out now😏

- Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR for correct path resolution
- Add GCC C++ standard library include paths for HIP compiler
- ROCm's clang needs explicit paths to libstdc++ headers
@Geramy
Copy link

Geramy commented Feb 3, 2026

Just got my hands on Radeon Pro V520, should be able to test things out now😏

Awesome there is a is_available in eval.cpp that doesn’t need to be there :)

- Replace rocPRIM-based sort with custom block merge sort
- Avoids rocPRIM uninitialized_array compatibility issues with ROCm 7.x
- Mirrors CUDA sort implementation approach
@Geramy
Copy link

Geramy commented Feb 3, 2026

Here is some profiling information.

^CW20260203 10:08:04.353332 126298330114368 tool.cpp:3105] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 caught signal 2...
W20260203 10:08:04.353396 126298330114368 tool.cpp:3128] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 will wait for 0 children to exit
W20260203 10:08:04.353403 126298330114368 tool.cpp:3143] [PPID=908649][PID=909286][TID=909286][rocprofv3_error_signal_handler] rocprofv3 finalizing after signal 2...
W20260203 10:08:04.375341 126298330114368 correlation_id.cpp:231] retiring dangling correlation ID 21188 from thread 909312 :: remaining reference count: 1
W20260203 10:08:04.387442 126298330114368 generateRocpd.cpp:583] writing SQL database for process 909286 on node 396112738
E20260203 10:08:04.387864 126298330114368 generateRocpd.cpp:606] Opened result file: /home/geramyl/Documents/Programming/ryzenai-server/geramyl-MS-S1-MAX/909286_results.db (UUID=0000ce95-f006-7006-9abb-ffe296071d45)
W20260203 10:08:05.444378 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.103665 sec
W20260203 10:08:05.466962 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.022550 sec
W20260203 10:08:05.479344 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.012372 sec
W20260203 10:08:05.524184 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.015430 sec
W20260203 10:08:05.524194 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.000002 sec
W20260203 10:08:05.626345 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.102148 sec
W20260203 10:08:06.604940 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.978583 sec
W20260203 10:08:06.604968 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.000000 sec
W20260203 10:08:06.604970 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
W20260203 10:08:06.604973 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
W20260203 10:08:06.604976 126298330114368 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
W20260203 10:08:06.605060 126298330114368 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000081 sec
W20260203 10:08:06.607686 126298330114368 simple_timer.cpp:55] SQLite3 generation :: total                    ::     2.220244 sec

    ROCPROFV3 SUMMARY:

    |                   NAME                   |    DOMAIN    |      CALLS      | DURATION (nsec) | AVERAGE (nsec)  | PERCENT (INC) |   MIN (nsec)    |   MAX (nsec)    |     STDDEV      |
    |------------------------------------------|--------------|-----------------|-----------------|-----------------|---------------|-----------------|-----------------|-----------------|
    | hipLaunchKernel                          | HIP_API      |              20 |       147065952 |       7.353e+06 |     63.436500 |             892 |       117336876 |       2.621e+07 |
    | hipStreamCreateWithFlags                 | HIP_API      |               2 |        27579283 |       1.379e+07 |     11.896249 |         5376108 |        22203175 |       1.190e+07 |
    | hipGetDeviceCount                        | HIP_API      |              19 |        22693835 |       1.194e+06 |      9.788924 |              20 |        22684118 |       5.204e+06 |
    | hipMallocManaged                         | HIP_API      |              28 |        14909837 |       5.325e+05 |      6.431318 |            8015 |         9543318 |       1.780e+06 |
    | __hipRegisterFunction                    | HIP_API      |           20890 |         8000032 |       3.830e+02 |      3.450792 |              60 |         2054986 |       2.012e+04 |
    | hipMemcpy                                | HIP_API      |              21 |         7127456 |       3.394e+05 |      3.074409 |           20278 |         6213199 |       1.346e+06 |
    | hipMemAdvise                             | HIP_API      |               1 |         2297511 |       2.298e+06 |      0.991025 |         2297511 |         2297511 |       0.000e+00 |
    | hipMalloc                                | HIP_API      |              22 |         1336490 |       6.075e+04 |      0.576491 |             330 |         1161530 |       2.479e+05 |
    | hipEventSynchronize                      | HIP_API      |               3 |          440456 |       1.468e+05 |      0.189990 |          115887 |          203382 |       4.906e+04 |
    | hipLaunchHostFunc                        | HIP_API      |               3 |          177873 |       5.929e+04 |      0.076725 |           28072 |           76613 |       2.709e+04 |
    | hipStreamQuery                           | HIP_API      |               1 |           40275 |       4.028e+04 |      0.017373 |           40275 |           40275 |       0.000e+00 |
    | hipEventRecord                           | HIP_API      |               3 |           38542 |       1.285e+04 |      0.016625 |           10500 |           14387 |       2.066e+03 |
    | __hipRegisterFatBinary                   | HIP_API      |             107 |           27101 |       2.533e+02 |      0.011690 |              50 |            9839 |       9.755e+02 |
    | hipStreamSynchronize                     | HIP_API      |              13 |           21459 |       1.651e+03 |      0.009256 |             150 |           10259 |       2.769e+03 |
    | hipMemGetInfo                            | HIP_API      |               1 |           19797 |       1.980e+04 |      0.008539 |           19797 |           19797 |       0.000e+00 |
    | __hipPushCallConfiguration               | HIP_API      |              21 |           14900 |       7.095e+02 |      0.006427 |              30 |            6633 |       1.496e+03 |
    | hipStreamIsCapturing                     | HIP_API      |               1 |            9077 |       9.077e+03 |      0.003915 |            9077 |            9077 |       0.000e+00 |
    | __hipPopCallConfiguration                | HIP_API      |              21 |            8867 |       4.222e+02 |      0.003825 |              30 |            1643 |       5.839e+02 |
    | hipEventQuery                            | HIP_API      |               3 |            7002 |       2.334e+03 |      0.003020 |             140 |            4037 |       1.994e+03 |
    | hipEventCreateWithFlags                  | HIP_API      |               2 |            5351 |       2.676e+03 |      0.002308 |            2315 |            3036 |       5.098e+02 |
    | hipSetDevice                             | HIP_API      |               1 |            4379 |       4.379e+03 |      0.001889 |            4379 |            4379 |       0.000e+00 |
    | hipGetDevicePropertiesR0600              | HIP_API      |               2 |            2896 |       1.448e+03 |      0.001249 |             421 |            2475 |       1.452e+03 |
    | __hipRegisterVar                         | HIP_API      |               1 |            1733 |       1.733e+03 |      0.000748 |            1733 |            1733 |       0.000e+00 |
    | hipGetDevice                             | HIP_API      |               1 |            1653 |       1.653e+03 |      0.000713 |            1653 |            1653 |       0.000e+00 |

W20260203 10:08:06.609210 126298330114368 simple_timer.cpp:55] [rocprofv3] output generation ::     2.233775 sec
W20260203 10:08:06.609620 126298330114368 simple_timer.cpp:55] [rocprofv3] tool finalization ::     2.234263 sec

The command I used.

/opt/rocm/bin/rocprofv3 --hip-trace --marker-trace -S -- 

NripeshN and others added 7 commits February 3, 2026 18:44
- Add Limits struct to device/utils.hpp for sort operations
- Add missing numeric_limits specializations for int8, uint8, int16, uint16, bool
- Fix C++20 lambda syntax to be C++17 compatible
….cpp

- Remove mlx/backend/gpu/available.h include (doesn't exist)
- Remove is_available() function (already defined elsewhere)

Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>
- Implement gpu::device_info(), gpu::device_count(), gpu::is_available()
- Provides device name, architecture, UUID, PCI bus ID, memory info
- Uses hipGetDeviceProperties and hipMemGetInfo for AMD GPU info
- Mirrors CUDA device_info.cpp implementation

Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>
- Add mlx/memory.h include to ensure MLX_API visibility attributes
  are applied to memory function definitions
- Fixes undefined symbol errors for reset_peak_memory and other
  memory management functions

Co-authored-by: Geramy Loveless <geramy@users.noreply.github.com>
- Add (void) casts to suppress nodiscard warnings for HIP API calls
  (hipMalloc, hipMemcpy, hipFree, hipStreamSynchronize, etc.)
- Fix implicit float-to-bool conversion warnings in unary_ops.hpp
  (Erf, ErfInv, Expm1) and binary_ops.hpp (ArcTan2)
- Add explicit type checks for bool/integral types before float operations
- Add (void) casts for hipMemsetAsync and hipMemcpyAsync calls in:
  - conv/gemm_conv.cpp
  - random.hip
  - reduce/init_reduce.hip
  - scaled_dot_product_attention.hip
- Add python/src/rocm.cpp with mx.rocm.is_available() function
- Add python/tests/rocm_skip.py with tests to skip for ROCm backend
- Update mlx_tests.py to detect ROCm backend and use appropriate skip list
- Update CMakeLists.txt to include rocm.cpp and rocm.pyi stub

The ROCm skip list includes:
- Same tests as CUDA (FFT, linalg, hadamard, etc.)
- ROCm-specific: grouped convolution, 1D/3D convolution, input dilation
- Quantization tests (different support level than CUDA)
@Geramy
Copy link

Geramy commented Feb 3, 2026

I am running the Phi3 Kernel I had made, which works fine on MacOS with the ROCm experimental build.
Getting a weird error here.

signal SIGSEGV: address not mapped to object (fault address: 0x0)

  if (!ptr_) {
    return nullptr;
  }
  return static_cast<rocm::RocmBuffer*>(ptr_)->data;
}


*** Aborted at 1770147905 (unix time) try "date -d @1770147905" if you are using GNU date ***
PC: @     0x59752d092c20 mlx::core::allocator::Buffer::raw_ptr()
*** SIGSEGV (@0x0) received by PID 930345 (TID 0x7ea3c49fd6c0) from PID 0; stack trace: ***
    @     0x7ea4e5ea1ed3 (unknown)
    @     0x7ea4eb34224e google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7ea4e5e45330 (unknown)
    @     0x59752d092c20 mlx::core::allocator::Buffer::raw_ptr()
    @     0x59752d34a587 mlx::core::fast::RoPE::eval_gpu(std::vector<mlx::core::array, std::allocator<mlx::core::array> > const&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&)
    @     0x59752d0ab18a mlx::core::gpu::eval(mlx::core::array&)
    @     0x59752c6ad338 mlx::core::eval_impl(std::vector<mlx::core::array, std::allocator<mlx::core::array> >, bool)
    @     0x59752c6adebb mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array> >)
    @     0x59752c4871b5 mlx::core::array::eval()
    @     0x59752c40c8e2 int mlx::core::array::item<int>()
    @     0x59752c450786 Phi3Inference::sample_token(mlx::core::array const&, MlxOgaGeneratorParams const&)
    @     0x59752c4093e4 MlxOgaGenerator::GenerateNextToken()
    @     0x59752c4815e6 ryzenai::MlxBackend::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*)
    
    MORE DETAILED STACK

mlx::core::allocator::Buffer::raw_ptr() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/allocator.cpp:319)
mlx::core::fast::RoPE::eval_gpu(std::vector<mlx::core::array, std::allocator<mlx::core::array>> const&, std::vector<mlx::core::array, std::allocator<mlx::core::array>>&) (Unknown Source:0)
mlx::core::gpu::eval(mlx::core::array&) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/backend/rocm/eval.cpp:28)
mlx::core::eval_impl(std::vector<mlx::core::array, std::allocator<mlx::core::array>>, bool) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/transforms.cpp:237)
mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array>>) (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/transforms.cpp:324)
mlx::core::array::eval() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/array.cpp:156)
int mlx::core::array::item<int>() (/home/geramyl/Documents/Programming/ryzenai-server/build/_deps/mlx-src/mlx/array.h:570)
Phi3Inference::sample_token(mlx::core::array const&, MlxOgaGeneratorParams const&) (/home/geramyl/Documents/Programming/ryzenai-server/src/mlx/models/phi3_inference.cpp:320)
MlxOgaGenerator::GenerateNextToken() (/home/geramyl/Documents/Programming/ryzenai-server/src/mlx/mlx_oga.cpp:593)
ryzenai::MlxBackend::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*) (/home/geramyl/Documents/Programming/ryzenai-server/src/backend/mlx_backend.cpp:295)
ryzenai::InferenceEngine::complete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, ryzenai::GenerationParams const&, ryzenai::CompletionTimingData*) (/home/geramyl/Documents/Programming/ryzenai-server/src/inference_engine.cpp:445)
ryzenai::RyzenAIServer::handleChatCompletions(httplib::Request const&, httplib::Response&) (/home/geramyl/Documents/Programming/ryzenai-server/src/server.cpp:557)
ryzenai::RyzenAIServer::setupRoutes()::$_3::operator()(httplib::Request const&, httplib::Response&) const (/home/geramyl/Documents/Programming/ryzenai-server/src/server.cpp:213)
void std::__invoke_impl<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>(std::__invoke_other, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/invoke.h:61)
std::enable_if<is_invocable_r_v<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>, void>::type std::__invoke_r<void, ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&>(ryzenai::RyzenAIServer::setupRoutes()::$_3&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/invoke.h:111)
std::_Function_handler<void (httplib::Request const&, httplib::Response&), ryzenai::RyzenAIServer::setupRoutes()::$_3>::_M_invoke(std::_Any_data const&, httplib::Request const&, httplib::Response&) (/usr/include/c++/13/bits/std_function.h:290)
std::function<void (httplib::Request const&, httplib::Response&)>::operator()(httplib::Request const&, httplib::Response&) const (/usr/include/c++/13/bits/std_function.h:591)
httplib::Server::dispatch_request(httplib::Request&, httplib::Response&, std::vector<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_delete<httplib::detail::MatcherBase>>, std::function<void (httplib::Request const&, httplib::Response&)>>, std::allocator<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_delete<httplib::detail::MatcherBase>>, std::function<void (httplib::Request const&, httplib::Response&)>>>> const&) const (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8112)
httplib::Server::routing(httplib::Request&, httplib::Response&, httplib::Stream&) (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8087)
httplib::Server::process_request(httplib::Stream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, int, bool, bool&, std::function<void (httplib::Request&)> const&) (/home/geramyl/Documents/Programming/ryzenai-server/external/cpp-httplib/httplib.h:8375)
    ```

The function needs the MLX_API attribute to be exported from the
shared library so it can be called from Python bindings.
Some AMD GPUs (like the Radeon Pro V520) report managed memory support
but hipMallocManaged fails with "out of memory" even for small allocations.
This change adds a runtime check that tests if managed memory actually
works, and falls back to regular hipMalloc if it doesn't.
@NripeshN
Copy link
Contributor Author

NripeshN commented Feb 3, 2026

I am running the Phi3 Kernel I had made, which works fine on MacOS with the ROCm experimental build. Getting a weird error here.

Yup a lot of errors on my end to. Earlier I had just tried eyeballing the implementation trying to copy the structure from cuda and check for compilation errors through docker. I did not have AMD GPUs before this, now that I have AMD GPU I will incrementally patch all the errors.

When hipMallocManaged fails (which happens on some AMD GPUs like the
Radeon Pro V520), fall back to hipHostMalloc instead of hipMalloc.

hipHostMalloc allocates pinned host memory that is accessible from both
CPU and GPU, which is required because MLX's array initialization code
uses std::copy to write data directly to the allocated buffer from CPU.

Regular hipMalloc allocates device-only memory that cannot be accessed
from CPU code, causing segfaults when std::copy tries to write to it.
AMD GPUs have different wavefront (warp) sizes depending on architecture:
- CDNA/GCN (gfx9xx and earlier): 64
- RDNA (gfx10xx, gfx11xx): 32

The previous code hardcoded WARP_SIZE=64 everywhere, which caused incorrect
results on RDNA GPUs like the Radeon Pro V520 (gfx1011).

This change:
1. Updates device/config.h to detect the target architecture and set
   WARP_SIZE appropriately using __AMDGCN_WAVEFRONT_SIZE__ or architecture
   detection macros
2. Updates all kernel files to use the centralized WARP_SIZE definition
   instead of local hardcoded values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ROCm Support for AMD GPUs

7 participants