Skip to content

GFX950 support #13

@gyulaz-htec

Description

@gyulaz-htec

🚀 Feature

I'm trying to use rocm/dgl on mi355x and getting errors when running the unit tests after installation. I used the build steps from the readme to build dgl inside this docker image: rocm/7.0:rocm7.0_ubuntu22.04_py3.10_pytorch_release_2.8.0_rc1:
I try to force cmake to target gfx950 arch with cmake --preset rocm -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DGPU_TARGETS="gfx950" -DCMAKE_HIP_ARCHITECTURES="gfx950" I got the following error when building the cuda kernels in the repo:

In file included from /workspace/deps/dgl/src/array/cuda/csr2coo.cu:7:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/iterator/constant_iterator.h:26:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/iterator/detail/constant_iterator_base.h:21:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/iterator/counting_iterator.h:35:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/iterator/iterator_adaptor.h:36:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/iterator/iterator_facade.h:36:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/detail/type_traits.h:678:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/thrust/detail/type_traits/has_trivial_assign.h:33:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/cuda/std/type_traits:14:
In file included from /opt/rocm-7.0.0/lib/llvm/bin/../../../include/cuda/std/detail/__config:43:
/opt/rocm-7.0.0/lib/llvm/bin/../../../include/cuda/std/detail/libcxx/include/__config:2244:9: error: Timing-related utility APIs (e.g., chrono) are currently not supported on the current architecture by libhipcxx. To override this error and proceed with the build, please set the compile-time flag _LIBCUDACXX_ALLOW_UNSUPPORTED_ARCHITECTURE
 2244 | #       error Timing-related utility APIs (e.g., chrono) are currently not supported on the current architecture by libhipcxx. \
      |         ^

Full log:

dgl_build_log.log

I've checked the source code and I see that the latest architecture supported is gfx942.

Motivation

I want to use rocm/dgl for RGAT training for mlperf: https://github.com/mlcommons/training/tree/master/graph_neural_network

Alternatives

Pitch

Is gfx950 support planned? If yes when can I expect it?

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions