-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Running for N>128 passes validation, but crashes in benchmark.
averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 200 200
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 200
Success!
averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 200
Dualising 1000000 triangulation graphs, each with 200 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
NOT USING: AMD EPYC 7A53 64-Core Processor has 4 compute-units.
Platform: AMD HIP BACKEND
USING : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377767,0,0], local id: [167,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377768,0,0], local id: [168,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377769,0,0], local id: [169,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377770,0,0], local id: [170,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377771,0,0], local id: [171,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377772,0,0], local id: [172,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377773,0,0], local id: [173,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377774,0,0], local id: [174,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377776,0,0], local id: [176,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377778,0,0], local id: [178,0,0] Assertion `false` failed.
:0:rocdevice.cpp :2652: 1910724722915 us: 1686 : [tid:0x14a7b1aef700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
Aborted
Running for N<=128 works for both. Why?
averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 128 128
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 128
Success!
averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 128
Dualising 1000000 triangulation graphs, each with 128 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
NOT USING: AMD EPYC 7A53 64-Core Processor has 4 compute-units.
Platform: AMD HIP BACKEND
USING : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
Mean Time per Graph: 26.4305 +/- 7.02391 ns
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels