Skip to content

Task03 Данил Дорошин ITMO#927

Closed
ddddanil wants to merge 12 commits intoGPGPUCourse:task03from
ddddanil:task03
Closed

Task03 Данил Дорошин ITMO#927
ddddanil wants to merge 12 commits intoGPGPUCourse:task03from
ddddanil:task03

Conversation

@ddddanil
Copy link

@ddddanil ddddanil commented Dec 23, 2025

Transpose

Локальный вывод

$ ./main_matrix_transpose 2
Found 3 GPUs in 0.328274 sec (CUDA: 0.122461 sec, OpenCL: 0.113303 sec, Vulkan: 0.0924006 sec)
Available devices:
  Device #0: API: OpenCL. CPU. Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz. Intel(R) Corporation. Total memory: 15731 Mb.
  Device #1: API: Vulkan. iGPU. Intel(R) UHD Graphics (CML GT2). Free memory: 6315/11798 Mb.
  Device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce GTX 1650 Ti (CUDA 13000). Free memory: 3654/3717 Mb.
Using device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce GTX 1650 Ti (CUDA 13000). Free memory: 3654/3717 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.145769 10%=0.145774 median=0.145834 90%=0.152064 max=0.152064)
median effective algorithm bandwidth: 6.85711 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.0132707 10%=0.0132795 median=0.0132996 90%=0.0134834 max=0.0134834)
median effective algorithm bandwidth: 75.1901 GB/s

Вывод Github CI

$ ./main_matrix_transpose 0
Found 1 GPUs in 11.8432 sec (CUDA: 0.112519 sec, OpenCL: 0.706046 sec, Vulkan: 11.0246 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.0239883 10%=0.0239894 median=0.0240359 90%=0.0258209 max=0.0258209)
median effective algorithm bandwidth: 41.6045 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.00817342 10%=0.00817681 median=0.00818267 90%=0.0082992 max=0.0082992)
median effective algorithm bandwidth: 122.209 GB/s

Multiply

Локальный вывод

$ ./main_matrix_multiply 2
Found 3 GPUs in 1.03068 sec (CUDA: 0.195379 sec, OpenCL: 0.700266 sec, Vulkan: 0.134775 sec)
Available devices:
  Device #0: API: OpenCL. CPU. Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz. Intel(R) Corporation. Total memory: 15731 Mb.
  Device #1: API: Vulkan. iGPU. Intel(R) UHD Graphics (CML GT2). Free memory: 8373/11798 Mb.
  Device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce GTX 1650 Ti (CUDA 13010). Free memory: 3654/3717 Mb.
Using device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce GTX 1650 Ti (CUDA 13010). Free memory: 3654/3717 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=54.3206 10%=54.3206 median=54.3206 90%=54.3206 max=54.3206)
algorithm GFlops: 0.316114 GFlops
algorithm effective memory bandwidth: 0.00100675 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=0.219984 10%=0.220031 median=0.220104 90%=0.237713 max=0.237713)
algorithm GFlops: 78.0153 GFlops
algorithm effective memory bandwidth: 0.248462 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.21073e-07 90%=1.12363e-06 max=2.77294)
median relative difference with CPU: 2.21073e-07
99% percentile relative difference with CPU: 1.09303e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.34166 10%=0.341669 median=0.343068 90%=0.34515 max=0.34515)
algorithm GFlops: 50.0527 GFlops
algorithm effective memory bandwidth: 0.159407 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.67428e-08 median=4.71664e-07 90%=2.08005e-06 max=20.4377)
median relative difference with CPU: 4.71664e-07
99% percentile relative difference with CPU: 1.96524e-05

Вывод Github CI

$ ./main_matrix_multiply 0
Found 1 GPUs in 0.308016 sec (CUDA: 0.124573 sec, OpenCL: 0.038077 sec, Vulkan: 0.145307 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=11.97 10%=11.97 median=11.97 90%=11.97 max=11.97)
algorithm GFlops: 1.43454 GFlops
algorithm effective memory bandwidth: 0.0045687 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=0.171345 10%=0.172939 median=0.174058 90%=0.329976 max=0.329976)
algorithm GFlops: 98.6536 GFlops
algorithm effective memory bandwidth: 0.314191 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.67401e-08 median=4.71637e-07 90%=2.07923e-06 max=3.12559)
median relative difference with CPU: 4.71637e-07
99% percentile relative difference with CPU: 1.95534e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.152764 10%=0.152767 median=0.152778 90%=0.155259 max=0.155259)
algorithm GFlops: 112.395 GFlops
algorithm effective memory bandwidth: 0.357955 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.67415e-08 median=4.71645e-07 90%=2.07943e-06 max=6.30526)
median relative difference with CPU: 4.71645e-07
99% percentile relative difference with CPU: 1.95739e-05

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_matrix_transpose, main_matrix_multiply ===
=== main_matrix_transpose stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.31939 sec (CUDA: 0.113937 sec, OpenCL: 0.707783 sec, Vulkan: 7.49761 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.0236044 10%=0.0236145 median=0.0238951 90%=0.0255099 max=0.0255099)
median effective algorithm bandwidth: 41.8495 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.00817343 10%=0.00817558 median=0.00818427 90%=0.00830092 max=0.00830092)
median effective algorithm bandwidth: 122.186 GB/s
=== main_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.304857 sec (CUDA: 0.126677 sec, OpenCL: 0.0388888 sec, Vulkan: 0.139232 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=11.4562 10%=11.4562 median=11.4562 90%=11.4562 max=11.4562)
algorithm GFlops: 1.49889 GFlops
algorithm effective memory bandwidth: 0.00477364 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=0.0590775 10%=0.0597447 median=0.0611722 90%=0.0652011 max=0.0652011)
algorithm GFlops: 280.707 GFlops
algorithm effective memory bandwidth: 0.893993 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.21073e-07 90%=1.12363e-06 max=2.77294)
median relative difference with CPU: 2.21073e-07
99% percentile relative difference with CPU: 1.09303e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.0172957 10%=0.0182773 median=0.022297 90%=0.0234924 max=0.0234924)
algorithm GFlops: 770.126 GFlops
algorithm effective memory bandwidth: 2.45269 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.33797e-07 90%=1.88501e-06 max=31106)
median relative difference with CPU: 2.33797e-07
99% percentile relative difference with CPU: 0.130007
=== main_matrix_multiply stderr (exit code: -11 (segfault после выполнения)) ===
Error: Assertion "54623452334232 0.130007" failed at line 199

Посмотреть полные логи

@PolarNick239
Copy link
Member

упал CI на github, пожалуйста исправьте его: откройте и пролистайте логи вниз (кнопкой End например, там много пустых строк видимо, долго грузит), поймите в чем проблема, попробуйте поискать в чате курса по этой ошибке, если не понятно что-то на этих этапах или не найдется, или еще что - не стесняйтесь спрашивать (в чате, или в личку)

@GPUcourseBOT
Copy link
Collaborator

⚠️ Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
Ошибка компиляции

=== ВЫВОД CMAKE (stdout) ===
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Vulkan glslc compiler from system PATH will be used
-- Found GTest: /usr/local/lib/cmake/GTest/GTestConfig.cmake (found version "1.10.0")
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found Threads: TRUE
-- Found X11: /usr/include
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Looking for IceConnectionNumber in ICE
-- Looking for IceConnectionNumber in ICE - found
-- Found CUDA: /usr (found version "12.0")
-- Found Vulkan: /usr/local/lib/libvulkan.so (found version "1.3.283") found components: glslc missing components: glslangValidator
-- The CUDA compiler identification is NVIDIA 12.0.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/include (found version "12.0.140")
-- Configuring done (3.7s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmph93v9bos/build

=== ВЫВОД CMAKE (stderr) ===
CMake Warning (dev) at libs/utils/CMakeLists.txt:25 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning:
Manually-specified variables were not used by the project:
CUDA_SUPPORT
USE_CUDA
WITH_CUDA

=== ВЫВОД MAKE (stdout) ===
[ 1%] Building CXX object libs/clew/CMakeFiles/libclew.dir/libclew/ocl_init.cpp.o
[ 2%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/gtest_utils.cpp.o
[ 2%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/omp_utils.cpp.o
[ 3%] Building CXX object libs/gpu/CMakeFiles/hexdumparray.dir/libgpu/hexdumparray.cpp.o
[ 4%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/string_utils.cpp.o
[ 4%] Linking CXX static library liblibclew.a
[ 4%] Built target libclew
[ 5%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/thread_mutex.cpp.o
[ 6%] Linking CXX executable hexdumparray
[ 6%] Built target hexdumparray
[ 7%] Linking CXX static library liblibbase.a
[ 7%] Built target libbase
[ 8%] Building CXX object libs/images/CMakeFiles/libimages.dir/libimages/debug_io.cpp.o
[ 9%] Building CXX object libs/images/CMakeFiles/libimages.dir/libimages/images.cpp.o
[ 10%] Linking CXX static library liblibimages.a
[ 10%] Built target libimages
[ 13%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/opencl/cl/generated_kernels/dummy_kernel_nospir_opencl120.h
[ 13%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/aplusb_comp_spirv_vulkan.spir
[ 13%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/write_value_at_index_comp_spirv_vulkan.spir
[ 14%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/opencl/tests/kernels/generated_kernels/aplusb_nospir_opencl120.h
[ 16%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/batched_binary_search_comp_spirv_vulkan.spir
[ 16%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/atomic_add_comp_spirv_vulkan.spir
[ 17%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_16U_comp_spirv_vulkan.spir
[ 17%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_32F_comp_spirv_vulkan.spir
[ 18%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_8U_comp_spirv_vulkan.spir
[ 19%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_16U_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_32F_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_8U_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_16U_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_32F_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_8U_comp_spirv_vulkan.spir
[ 22%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_16U_comp_spirv_vulkan.spir
[ 23%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_32F_comp_spirv_vulkan.spir
[ 24%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_8U_comp_spirv_vulkan.spir
[ 26%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_16U_comp_spirv_vulkan.spir
[ 26%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_32F_comp_spirv_vulkan.spir
[ 27%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_8U_comp_spirv_vulkan.spir
[ 28%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_blending_frag_spirv_vulkan.spir
[ 29%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_vert_spirv_vulkan.spir
[ 29%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_frag_spirv_vulkan.spir
[ 30%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/write_value_at_index_comp_spirv_vulkan.h
[ 31%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/aplusb_comp_spirv_vulkan.h
[ 31%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/atomic_add_comp_spirv_vulkan.h
[ 32%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/batched_binary_search_comp_spirv_vulkan.h
[ 33%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_16U_comp_spirv_vulkan.h
[ 33%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_32F_comp_spirv_vulkan.h
[ 34%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_8U_comp_spirv_vulkan.h
[ 35%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_16U_comp_spirv_vulkan.h
[ 37%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_32F_comp_spirv_vulkan.h
[ 37%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_8U_comp_spirv_vulkan.h
[ 38%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_16U_comp_spirv_vulkan.h
[ 39%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_32F_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_16U_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_8U_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_32F_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_8U_comp_spirv_vulkan.h
[ 41%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_16U_comp_spirv_vulkan.h
[ 42%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_32F_comp_spirv_vulkan.h
[ 43%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_8U_comp_spirv_vulkan.h
[ 43%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_blending_frag_spirv_vulkan.h
[ 44%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_frag_spirv_vulkan.h
[ 45%] Generating /tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_vert_spirv_vulkan.h
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/utils.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/engine.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/enum.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/device_info.cpp.o
[ 49%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/spirv_reflect/shader_module_info.cpp.o
[ 50%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/spirv_reflect/spirv_reflect.cpp.o
[ 50%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/vk/common_host.cpp.o
[ 51%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/data_buffer.cpp.o
[ 52%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/data_image.cpp.o
[ 53%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/device.cpp.o
[ 53%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/engine.cpp.o
[ 54%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/enum.cpp.o
[ 55%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/utils.cpp.o

=== ВЫВОД MAKE (stderr) ===
In file included from /usr/include/stdio.h:980,
from /usr/include/c++/13/cstdio:42,
from /usr/include/c++/13/ext/string_conversions.h:45,
from /usr/include/c++/13/bits/basic_string.h:4109,
from /usr/include/c++/13/string:54,
from /tmp/tmph93v9bos/libs/images/libimages/images.h:6,
from /tmp/tmph93v9bos/libs/images/libimages/images.cpp:1:
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = char]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = char]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short unsigned int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short unsigned int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = float]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = float]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long unsigned int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long unsigned int]’ at /tmp/tmph93v9bos/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
/tmp/tmph93v9bos/libs/gpu/libgpu/opencl/engine.cpp:3: warning: "SHORT_FILE" redefined
3 | #define SHORT_FILE "ocl_engine.cpp"
|
In file included from /tmp/tmph93v9bos/libs/gpu/libgpu/work_size.h:3,
from /tmp/tmph93v9bos/libs/gpu/libgpu/opencl/engine.h:17,
from /tmp/tmph93v9bos/libs/gpu/libgpu/opencl/engine.cpp:1:
/tmp/tmph93v9bos/libs/gpu/libgpu/utils.h:52: note: this is the location of the previous definition
52 | #define SHORT_FILE "unknown"
|
/tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/engine.cpp: In function ‘VkDebugUtilsMessengerEXT_T* {anonymous}::setupDebugCallback(avk2::InstanceContext*)’:
/tmp/tmph93v9bos/libs/gpu/libgpu/vulkan/engine.cpp:70:47: error: invalid conversion from ‘VkBool32 ()(vk::DebugUtilsMessageSeverityFlagBitsEXT, vk::DebugUtilsMessageTypeFlagsEXT, const vk::DebugUtilsMessengerCallbackDataEXT, void*)’ {aka ‘unsigned int ()(vk::DebugUtilsMessageSeverityFlagBitsEXT, vk::Flagsvk::DebugUtilsMessageTypeFlagBitsEXT, const vk::DebugUtilsMessengerCallbackDataEXT, void*)’} to ‘PFN_vkDebugUtilsMessengerCallbackEXT’ {aka ‘unsigned int ()(VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const VkDebugUtilsMessengerCallbackDataEXT, void*)’} [-fpermissive]
70 | create_info.pfnUserCallback = debugCallback;
| ^~~~~~~~~~~~~
| |
| VkBool32 ()(vk::DebugUtilsMessageSeverityFlagBitsEXT, vk::DebugUtilsMessageTypeFlagsEXT, const vk::DebugUtilsMessengerCallbackDataEXT, void*) {aka unsigned int ()(vk::DebugUtilsMessageSeverityFlagBitsEXT, vk::Flagsvk::DebugUtilsMessageTypeFlagBitsEXT, const vk::DebugUtilsMessengerCallbackDataEXT, void*)}
make[2]: *** [libs/gpu/CMakeFiles/libgpu.dir/build.make:616: libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/engine.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:485: libs/gpu/CMakeFiles/libgpu.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_matrix_transpose, main_matrix_multiply ===
=== main_matrix_transpose stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.313536 sec (CUDA: 0.122462 sec, OpenCL: 0.0383136 sec, Vulkan: 0.152698 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.0236967 10%=0.023718 median=0.0237391 90%=0.0238717 max=0.0238717)
median effective algorithm bandwidth: 42.1246 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.00817316 10%=0.0081756 median=0.00818112 90%=0.00831771 max=0.00831771)
median effective algorithm bandwidth: 122.233 GB/s
=== main_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.327598 sec (CUDA: 0.126617 sec, OpenCL: 0.0386167 sec, Vulkan: 0.162306 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=11.2695 10%=11.2695 median=11.2695 90%=11.2695 max=11.2695)
algorithm GFlops: 1.52372 GFlops
algorithm effective memory bandwidth: 0.00485272 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=0.060987 10%=0.061368 median=0.0648583 90%=0.0658256 max=0.0658256)
algorithm GFlops: 264.754 GFlops
algorithm effective memory bandwidth: 0.843185 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.21073e-07 90%=1.12363e-06 max=2.77294)
median relative difference with CPU: 2.21073e-07
99% percentile relative difference with CPU: 1.09303e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.0172814 10%=0.0189008 median=0.0231557 90%=0.0243095 max=0.0243095)
algorithm GFlops: 741.567 GFlops
algorithm effective memory bandwidth: 2.36173 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.35155e-07 90%=2.03321e-06 max=70276.7)
median relative difference with CPU: 2.35155e-07
99% percentile relative difference with CPU: 0.146045
=== main_matrix_multiply stderr (exit code: -11 (segfault после выполнения)) ===
Error: Assertion "54623452334232 0.146045" failed at line 199

Посмотреть полные логи

@PolarNick239
Copy link
Member

На Tesla T4 падает rassert 54623452334232

@GPUcourseBOT
Copy link
Collaborator

⚠️ Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
Ошибка компиляции

=== ВЫВОД CMAKE (stdout) ===
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Vulkan glslc compiler from system PATH will be used
-- Found GTest: /usr/local/lib/cmake/GTest/GTestConfig.cmake (found version "1.10.0")
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found Threads: TRUE
-- Found X11: /usr/include
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Looking for IceConnectionNumber in ICE
-- Looking for IceConnectionNumber in ICE - found
-- Found CUDA: /usr (found version "12.0")
-- Found Vulkan: /usr/local/lib/libvulkan.so (found version "1.3.283") found components: glslc missing components: glslangValidator
-- The CUDA compiler identification is NVIDIA 12.0.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/include (found version "12.0.140")
-- Configuring done (19.7s)
-- Generating done (0.1s)
-- Build files have been written to: /tmp/tmpjmeyg1w4/build

=== ВЫВОД CMAKE (stderr) ===
CMake Warning (dev) at libs/utils/CMakeLists.txt:25 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning:
Manually-specified variables were not used by the project:
CUDA_SUPPORT
USE_CUDA
WITH_CUDA

=== ВЫВОД MAKE (stdout) ===
[ 1%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/gtest_utils.cpp.o
[ 1%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/omp_utils.cpp.o
[ 3%] Building CXX object libs/gpu/CMakeFiles/hexdumparray.dir/libgpu/hexdumparray.cpp.o
[ 3%] Building CXX object libs/clew/CMakeFiles/libclew.dir/libclew/ocl_init.cpp.o
[ 3%] Linking CXX static library liblibclew.a
[ 3%] Built target libclew
[ 4%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/string_utils.cpp.o
[ 5%] Building CXX object libs/base/CMakeFiles/libbase.dir/libbase/thread_mutex.cpp.o
[ 6%] Linking CXX executable hexdumparray
[ 6%] Built target hexdumparray
[ 7%] Linking CXX static library liblibbase.a
[ 7%] Built target libbase
[ 8%] Building CXX object libs/images/CMakeFiles/libimages.dir/libimages/debug_io.cpp.o
[ 9%] Building CXX object libs/images/CMakeFiles/libimages.dir/libimages/images.cpp.o
[ 10%] Linking CXX static library liblibimages.a
[ 10%] Built target libimages
[ 14%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/tests/kernels/generated_kernels/aplusb_nospir_opencl120.h
[ 14%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/aplusb_comp_spirv_vulkan.spir
[ 14%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/write_value_at_index_comp_spirv_vulkan.spir
[ 14%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/cl/generated_kernels/dummy_kernel_nospir_opencl120.h
[ 15%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/atomic_add_comp_spirv_vulkan.spir
[ 16%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/batched_binary_search_comp_spirv_vulkan.spir
[ 16%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_16U_comp_spirv_vulkan.spir
[ 19%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_32F_comp_spirv_vulkan.spir
[ 19%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_8U_comp_spirv_vulkan.spir
[ 19%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_16U_comp_spirv_vulkan.spir
[ 20%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_32F_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_8U_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_16U_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_32F_comp_spirv_vulkan.spir
[ 21%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_8U_comp_spirv_vulkan.spir
[ 23%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_32F_comp_spirv_vulkan.spir
[ 23%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_16U_comp_spirv_vulkan.spir
[ 24%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_8U_comp_spirv_vulkan.spir
[ 25%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_16U_comp_spirv_vulkan.spir
[ 27%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_8U_comp_spirv_vulkan.spir
[ 27%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_32F_comp_spirv_vulkan.spir
[ 28%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_blending_frag_spirv_vulkan.spir
[ 29%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_frag_spirv_vulkan.spir
[ 29%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_vert_spirv_vulkan.spir
[ 30%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/write_value_at_index_comp_spirv_vulkan.h
[ 31%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/aplusb_comp_spirv_vulkan.h
[ 31%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/atomic_add_comp_spirv_vulkan.h
[ 32%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/batched_binary_search_comp_spirv_vulkan.h
[ 33%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_16U_comp_spirv_vulkan.h
[ 33%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_32F_comp_spirv_vulkan.h
[ 34%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_conversion_from_float_to_T_T_8U_comp_spirv_vulkan.h
[ 36%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_32F_comp_spirv_vulkan.h
[ 36%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_16U_comp_spirv_vulkan.h
[ 37%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_1_T_8U_comp_spirv_vulkan.h
[ 38%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_16U_comp_spirv_vulkan.h
[ 39%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_32F_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_16U_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_2_T_8U_comp_spirv_vulkan.h
[ 40%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_32F_comp_spirv_vulkan.h
[ 41%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_16U_comp_spirv_vulkan.h
[ 41%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_3_T_8U_comp_spirv_vulkan.h
[ 42%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_32F_comp_spirv_vulkan.h
[ 43%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/image_interpolation_NCHANNELS_4_T_8U_comp_spirv_vulkan.h
[ 43%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_blending_frag_spirv_vulkan.h
[ 44%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_frag_spirv_vulkan.h
[ 45%] Generating /tmp/tmpjmeyg1w4/libs/gpu/libgpu/vulkan/tests/kernels/generated_kernels/rasterize_vert_spirv_vulkan.h
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/utils.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/device_info.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/enum.cpp.o
[ 48%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/opencl/engine.cpp.o
[ 49%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/spirv_reflect/shader_module_info.cpp.o
[ 50%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/spirv_reflect/spirv_reflect.cpp.o
[ 50%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/vk/common_host.cpp.o
[ 51%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/data_buffer.cpp.o
[ 52%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/data_image.cpp.o
[ 53%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/device.cpp.o
[ 53%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/engine.cpp.o
[ 54%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/enum.cpp.o
[ 55%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/utils.cpp.o
[ 56%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/vulkan/vulkan_api_headers.cpp.o
[ 56%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/context.cpp.o
[ 57%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/device.cpp.o
[ 58%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/device_memory_pool.cpp.o
[ 59%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/gold_helpers.cpp.o
[ 60%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/shared_device_buffer.cpp.o
[ 60%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/shared_device_image.cpp.o
[ 61%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/shared_host_buffer.cpp.o
[ 62%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/utils.cpp.o
[ 63%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/cuda/cuda_api.cpp.o
[ 63%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/cuda/enum.cpp.o
[ 64%] Building CXX object libs/gpu/CMakeFiles/libgpu.dir/libgpu/cuda/utils.cpp.o
[ 65%] Linking CXX static library liblibgpu.a
[ 65%] Built target libgpu
[ 65%] Building CXX object libs/utils/CMakeFiles/libutils.dir/libutils/misc.cpp.o
[ 66%] Building CXX object libs/utils/CMakeFiles/libutils.dir/__/base/libbase/string_utils.cpp.o
[ 67%] Linking CXX static library liblibutils.a
[ 67%] Built target libutils
[ 67%] Generating /tmp/tmpjmeyg1w4/src/kernels/cl/generated_kernels/aplusb_nospir_opencl120.h
[ 68%] Generating /tmp/tmpjmeyg1w4/src/kernels/cl/generated_kernels/matrix_01_transpose_naive_nospir_opencl120.h
[ 68%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_05_multiply_cooperative_matrix_comp_spirv_vulkan.spir
[ 69%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/opencl/tests/kernels/kernels.cpp.o
[ 69%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/opencl/tests/aplusb_test.cpp.o
[ 70%] Generating /tmp/tmpjmeyg1w4/src/kernels/cl/generated_kernels/matrix_02_transpose_coalesced_via_local_memory_nospir_opencl120.h
[ 71%] Generating /tmp/tmpjmeyg1w4/src/kernels/cl/generated_kernels/matrix_03_multiply_naive_nospir_opencl120.h
[ 71%] Generating /tmp/tmpjmeyg1w4/src/kernels/cl/generated_kernels/matrix_04_multiply_via_local_memory_nospir_opencl120.h
[ 72%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/aplusb_comp_spirv_vulkan.spir
[ 72%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_01_transpose_naive_comp_spirv_vulkan.spir
[ 73%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_02_transpose_coalesced_via_local_memory_comp_spirv_vulkan.spir
[ 74%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_03_multiply_naive_comp_spirv_vulkan.spir
[ 75%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_04_multiply_via_local_memory_comp_spirv_vulkan.spir
[ 76%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_05_multiply_cooperative_matrix_comp_spirv_vulkan.h
[ 77%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/kernels/kernels.cpp.o
[ 78%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/aplusb_comp_spirv_vulkan.h
[ 79%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_01_transpose_naive_comp_spirv_vulkan.h
[ 80%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_02_transpose_coalesced_via_local_memory_comp_spirv_vulkan.h
[ 80%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_03_multiply_naive_comp_spirv_vulkan.h
[ 81%] Generating /tmp/tmpjmeyg1w4/src/kernels/vk/generated_kernels/matrix_04_multiply_via_local_memory_comp_spirv_vulkan.h
[ 82%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/aplusb_test.cpp.o
[ 83%] Building CXX object CMakeFiles/GPGPUTasks_core.dir/src/kernels/kernels.cpp.o
[ 84%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/aplusb.cu.o
[ 85%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/atomic_add_test.cpp.o
[ 85%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/batched_binary_search_test.cpp.o
[ 86%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/buffers_magic_guards_test.cpp.o
[ 87%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/matrix_01_transpose_naive.cu.o
[ 87%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/matrix_02_transpose_coalesced_via_local_memory.cu.o
[ 88%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/image_conversion_from_float_to_T_test.cpp.o
[ 89%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/matrix_03_multiply_naive.cu.o
[ 90%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/matrix_04_multiply_via_local_memory.cu.o
[ 91%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/interpolation_test.cpp.o
[ 91%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/main.cpp.o
[ 92%] Building CUDA object CMakeFiles/GPGPUTasks_core.dir/src/kernels/cu/matrix_05_multiply_wmma.cu.o
[ 93%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/rasterization_blending_test.cpp.o
[ 94%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/vulkan/tests/rasterization_test.cpp.o
[ 95%] Building CXX object libs/gpu/CMakeFiles/libgpu_test.dir/libgpu/device_test.cpp.o
[ 95%] Linking CXX static library libGPGPUTasks_core.a
[ 95%] Built target GPGPUTasks_core
[ 96%] Building CXX object CMakeFiles/main_aplusb.dir/src/main_aplusb.cpp.o
[ 97%] Building CXX object CMakeFiles/main_matrix_transpose.dir/src/main_01_matrix_transpose.cpp.o
[ 98%] Building CXX object CMakeFiles/main_matrix_multiply.dir/src/main_02_matrix_multiply.cpp.o
[ 98%] Linking CXX executable main_aplusb
[ 98%] Built target main_aplusb
[ 99%] Linking CXX executable main_matrix_transpose
[ 99%] Built target main_matrix_transpose
[ 99%] Linking CXX executable libgpu_test
[ 99%] Built target libgpu_test

=== ВЫВОД MAKE (stderr) ===
In file included from /usr/include/stdio.h:980,
from /usr/include/c++/13/cstdio:42,
from /usr/include/c++/13/ext/string_conversions.h:45,
from /usr/include/c++/13/bits/basic_string.h:4109,
from /usr/include/c++/13/string:54,
from /tmp/tmpjmeyg1w4/libs/images/libimages/images.h:6,
from /tmp/tmpjmeyg1w4/libs/images/libimages/images.cpp:1:
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = char]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = char]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short unsigned int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = short unsigned int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = float]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = float]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long unsigned int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58211:29:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
In function ‘int snprintf(char*, size_t, const char*, ...)’,
inlined from ‘cimg_library::CImgList& cimg_library::CImgList::_load_gif_external(const char*, bool) [with T = long long unsigned int]’ at /tmp/tmpjmeyg1w4/libs/images/libimages/CImg.h:58210:48:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:54:35: warning: null destination pointer [-Wformat-truncation=]
54 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.cpp:3: warning: "SHORT_FILE" redefined
3 | #define SHORT_FILE "ocl_engine.cpp"
|
In file included from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h:3,
from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.h:17,
from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.cpp:1:
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/utils.h:52: note: this is the location of the previous definition
52 | #define SHORT_FILE "unknown"
|
In file included from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/context.h:4,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:6,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.cpp:1:
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.h:3: warning: "CL_TARGET_OPENCL_VERSION" redefined
3 | #define CL_TARGET_OPENCL_VERSION 210
|
In file included from /tmp/tmpjmeyg1w4/libs/clew/CL/cl.h:20,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:4:
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:23: note: this is the location of the previous definition
23 | #define CL_TARGET_OPENCL_VERSION 220
|
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
| ^
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/../../base/libbase/math.h(11): warning #186-D: pointless comparison of unsigned integer with zero
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
Remark: The warnings can be suppressed with "-diag-suppress "
detected during instantiation of "T div_ceil(T, T) [with T=size_t]"
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/work_size.h(63): here
In file included from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/context.h:4,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:6,
from /tmp/tmpjmeyg1w4/src/main_aplusb.cpp:2:
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.h:3: warning: "CL_TARGET_OPENCL_VERSION" redefined
3 | #define CL_TARGET_OPENCL_VERSION 210
|
In file included from /tmp/tmpjmeyg1w4/libs/clew/CL/cl.h:20,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:4:
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:23: note: this is the location of the previous definition
23 | #define CL_TARGET_OPENCL_VERSION 220
|
In file included from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/context.h:4,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:6,
from /tmp/tmpjmeyg1w4/src/main_01_matrix_transpose.cpp:2:
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.h:3: warning: "CL_TARGET_OPENCL_VERSION" redefined
3 | #define CL_TARGET_OPENCL_VERSION 210
|
In file included from /tmp/tmpjmeyg1w4/libs/clew/CL/cl.h:20,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:4:
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:23: note: this is the location of the previous definition
23 | #define CL_TARGET_OPENCL_VERSION 220
|
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
| ^
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
| ^
In file included from /tmp/tmpjmeyg1w4/libs/gpu/libgpu/context.h:4,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:6,
from /tmp/tmpjmeyg1w4/src/main_02_matrix_multiply.cpp:2:
/tmp/tmpjmeyg1w4/libs/gpu/libgpu/opencl/engine.h:3: warning: "CL_TARGET_OPENCL_VERSION" redefined
3 | #define CL_TARGET_OPENCL_VERSION 210
|
In file included from /tmp/tmpjmeyg1w4/libs/clew/CL/cl.h:20,
from /tmp/tmpjmeyg1w4/libs/utils/libutils/misc.h:4:
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:23: note: this is the location of the previous definition
23 | #define CL_TARGET_OPENCL_VERSION 220
|
/tmp/tmpjmeyg1w4/libs/clew/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
| ^
/tmp/tmpjmeyg1w4/src/main_02_matrix_multiply.cpp: In function ‘void run(int, char**)’:
/tmp/tmpjmeyg1w4/src/main_02_matrix_multiply.cpp:187:39: error: ‘results’ was not declared in this scope
187 | float gpu_value = results[j * w + i];
| ^~~~~~~
make[2]: *** [CMakeFiles/main_matrix_multiply.dir/build.make:76: CMakeFiles/main_matrix_multiply.dir/src/main_02_matrix_multiply.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:326: CMakeFiles/main_matrix_multiply.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:91: all] Error 2

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_matrix_transpose, main_matrix_multiply ===
=== main_matrix_transpose stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.65443 sec (CUDA: 0.116201 sec, OpenCL: 0.707217 sec, Vulkan: 7.83095 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.0235627 10%=0.0235686 median=0.0239527 90%=0.0288704 max=0.0288704)
median effective algorithm bandwidth: 41.749 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.00812178 10%=0.00812347 median=0.00813146 90%=0.00824007 max=0.00824007)
median effective algorithm bandwidth: 122.979 GB/s
=== main_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.301851 sec (CUDA: 0.12824 sec, OpenCL: 0.0419848 sec, Vulkan: 0.131569 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=11.9697 10%=11.9697 median=11.9697 90%=11.9697 max=11.9697)
algorithm GFlops: 1.43458 GFlops
algorithm effective memory bandwidth: 0.00456883 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=1.24433 10%=1.24486 median=1.24952 90%=1.35516 max=1.35516)
algorithm GFlops: 13.7424 GFlops
algorithm effective memory bandwidth: 0.0437667 GB/s
relative differences with CPU: 8388608 values (min=0 10%=0 median=2.21073e-07 90%=1.12363e-06 max=2.77294)
median relative difference with CPU: 2.21073e-07
99% percentile relative difference with CPU: 1.09303e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.152937 10%=0.152941 median=0.152948 90%=0.154574 max=0.154574)
algorithm GFlops: 112.27 GFlops
algorithm effective memory bandwidth: 0.357557 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.6743e-08 median=4.71658e-07 90%=2.07979e-06 max=9.13368)
median relative difference with CPU: 4.71658e-07
99% percentile relative difference with CPU: 1.9618e-05

Посмотреть полные логи

} else if (context.type() == gpu::Context::TypeCUDA) {
if (algorithm == "01 naive") {
cuda::matrix_multiply_naive(gpu::WorkSize(GROUP_SIZE, 1, w, h), matrix_a_gpu, matrix_b_gpu, matrix_c_gpu, w, h, k);
cuda::matrix_multiply_naive(gpu::WorkSize(1, 1, w, h), matrix_a_gpu, matrix_b_gpu, matrix_c_gpu, w, h, k);
Copy link
Member

@PolarNick239 PolarNick239 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

не запускайте пожалуйста на GPU рабочую группу 1х1, иначе где-то грустит 31 лилипут

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, как-то пропустил в ходе дебага. Поправил

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #927

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_matrix_transpose, main_matrix_multiply ===
=== main_matrix_transpose stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 11.8432 sec (CUDA: 0.112519 sec, OpenCL: 0.706046 sec, Vulkan: 11.0246 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Matrix size: rows=H=8192 x cols=W=16384 (512 MB)
______________________________________________________
Evaluating algorithm #1/2: 01 naive transpose (non-coalesced)
algorithm times (in seconds) - 10 values (min=0.0239883 10%=0.0239894 median=0.0240359 90%=0.0258209 max=0.0258209)
median effective algorithm bandwidth: 41.6045 GB/s
______________________________________________________
Evaluating algorithm #2/2: 02 transpose via local memory (coalesced)
algorithm times (in seconds) - 10 values (min=0.00817342 10%=0.00817681 median=0.00818267 90%=0.0082992 max=0.0082992)
median effective algorithm bandwidth: 122.209 GB/s
=== main_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.308016 sec (CUDA: 0.124573 sec, OpenCL: 0.038077 sec, Vulkan: 0.145307 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
C = A x B, matrices size: C (rows=H=2048 x cols=W=4096) = A (rows=H=2048 x cols=K=1024) x B (rows=K=1024 x cols=W=4096)
matrices data size: A - 8 MB, B - 16 MB, C - 16 MB
______________________________________________________
Evaluating algorithm #1/3: CPU with OpenMP
algorithm times (in seconds) - 1 values (min=11.97 10%=11.97 median=11.97 90%=11.97 max=11.97)
algorithm GFlops: 1.43454 GFlops
algorithm effective memory bandwidth: 0.0045687 GB/s
______________________________________________________
Evaluating algorithm #2/3: 01 naive
algorithm times (in seconds) - 10 values (min=0.171345 10%=0.172939 median=0.174058 90%=0.329976 max=0.329976)
algorithm GFlops: 98.6536 GFlops
algorithm effective memory bandwidth: 0.314191 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.67401e-08 median=4.71637e-07 90%=2.07923e-06 max=3.12559)
median relative difference with CPU: 4.71637e-07
99% percentile relative difference with CPU: 1.95534e-05
______________________________________________________
Evaluating algorithm #3/3: 02 using local memory
algorithm times (in seconds) - 10 values (min=0.152764 10%=0.152767 median=0.152778 90%=0.155259 max=0.155259)
algorithm GFlops: 112.395 GFlops
algorithm effective memory bandwidth: 0.357955 GB/s
relative differences with CPU: 8388608 values (min=0 10%=8.67415e-08 median=4.71645e-07 90%=2.07943e-06 max=6.30526)
median relative difference with CPU: 4.71645e-07
99% percentile relative difference with CPU: 1.95739e-05

Посмотреть полные логи

@PolarNick239
Copy link
Member

9/10 баллов 👍 (за дедлайн)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants