Skip to content

sycl_benchmark crashes for N>128 on LUMI-G #3

@jamesavery

Description

@jamesavery

Running for N>128 passes validation, but crashes in benchmark.

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 200 200 
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 200
Success!

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 200
Dualising 1000000 triangulation graphs, each with 200 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
        NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
        NOT USING: AMD EPYC 7A53 64-Core Processor                 has 4 compute-units.
Platform: AMD HIP BACKEND
        USING    : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377767,0,0], local id: [167,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377768,0,0], local id: [168,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377769,0,0], local id: [169,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377770,0,0], local id: [170,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377771,0,0], local id: [171,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377772,0,0], local id: [172,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377773,0,0], local id: [173,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377774,0,0], local id: [174,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377776,0,0], local id: [176,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377778,0,0], local id: [178,0,0] Assertion `false` failed.
:0:rocdevice.cpp            :2652: 1910724722915 us: 1686 : [tid:0x14a7b1aef700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
Aborted

Running for N<=128 works for both. Why?

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 128 128
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 128
Success!

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 128
Dualising 1000000 triangulation graphs, each with 128 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
        NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
        NOT USING: AMD EPYC 7A53 64-Core Processor                 has 4 compute-units.
Platform: AMD HIP BACKEND
        USING    : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
Mean Time per Graph: 26.4305 +/- 7.02391 ns

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions