Skip to content

Task07 Панов Антон Александрович ITMO#1039

Closed
LargonG wants to merge 1 commit intoGPGPUCourse:task07from
LargonG:task07
Closed

Task07 Панов Антон Александрович ITMO#1039
LargonG wants to merge 1 commit intoGPGPUCourse:task07from
LargonG:task07

Conversation

@LargonG
Copy link

@LargonG LargonG commented Jan 30, 2026

Local:

Found 3 GPUs in 0.38049 sec (CUDA: 0.0309575 sec, OpenCL: 0.0786745 sec, Vulkan: 0.269651 sec)
Available devices:
  Device #0: API: OpenCL. GPU. AMD Radeon(TM) Graphics (gfx1035). Free memory: 6073/6153 Mb.
  Device #1: API: OpenCL. CPU. AMD Ryzen 7 6800H with Radeon Graphics         . Intel(R) Corporation. Total memory: 15556 Mb.
  Device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce RTX 3060 Laptop GPU (CUDA 13000). Free memory: 5120/6143 Mb.
Using device #2: API: CUDA+OpenCL+Vulkan. GPU. NVIDIA GeForce RTX 3060 Laptop GPU (CUDA 13000). Free memory: 5120/6143 Mb.
Using CUDA API...
Evaluating CSR matrix nrows x ncols=1000000x1000000 with values in range [0; 1000]
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 32], median NNZ per row=32, total NNZ=32000000...
CPU (multi-threaded via OpenMP) finished in 0.0947064 sec
CPU effective bandwidth: 1.32289 GB/s (332.874 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0108161 10%=0.0108181 median=0.0112597 90%=0.153456 max=0.153456)
GPU SpMV median effective VRAM bandwidth: 11.249 GB/s (2841.99 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [128; 128], median NNZ per row=128, total NNZ=128000000...
CPU (multi-threaded via OpenMP) finished in 0.252042 sec
CPU effective bandwidth: 1.91538 GB/s (505.172 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0151182 10%=0.0151359 median=0.0158202 90%=0.0165051 max=0.0165051)
GPU SpMV median effective VRAM bandwidth: 30.612 GB/s (8090.92 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 32], median NNZ per row=17, total NNZ=16499998...
CPU (multi-threaded via OpenMP) finished in 0.0570123 sec
CPU effective bandwidth: 1.19578 GB/s (275.84 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0101825 10%=0.0101991 median=0.0105835 90%=0.0753446 max=0.0753446)
GPU SpMV median effective VRAM bandwidth: 6.51182 GB/s (1511.79 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 128], median NNZ per row=64, total NNZ=64499934...
CPU (multi-threaded via OpenMP) finished in 0.183316 sec
CPU effective bandwidth: 1.34476 GB/s (346.278 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0121328 10%=0.0121331 median=0.0121675 90%=0.0122886 max=0.0122886)
GPU SpMV median effective VRAM bandwidth: 20.3601 GB/s (5259.91 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 128], median NNZ per row=80, total NNZ=79933808...
CPU (multi-threaded via OpenMP) finished in 0.214026 sec
CPU effective bandwidth: 1.4213 GB/s (366.935 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0129436 10%=0.0129472 median=0.0129765 90%=0.0131224 max=0.0131224)
GPU SpMV median effective VRAM bandwidth: 23.5215 GB/s (6087.93 uint millions/s)

D:\dev\gpu\GPGPUTasks2025\out\build\default\main_sparse_matrix_multiply.exe (process 5580) exited with code 0 (0x0).

Github CI:

Found 2 GPUs in 0.049046 sec (CUDA: 7.6513e-05 sec, OpenCL: 0.0225917 sec, Vulkan: 0.0263291 sec)
Available devices:
  Device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15994 Mb.
  Device #1: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15994/15994 Mb.
Using device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15994 Mb.
Device AMD EPYC 7763 64-Core Processor                 doesn't support CUDA
Error: Device doesn't support requested API

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1039

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_sparse_matrix_multiply ===
=== main_sparse_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.55535 sec (CUDA: 0.115358 sec, OpenCL: 0.706201 sec, Vulkan: 7.73372 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using CUDA API...
Evaluating CSR matrix nrows x ncols=1000000x1000000 with values in range [0; 1000]
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 32], median NNZ per row=32, total NNZ=32000000...
CPU (multi-threaded via OpenMP) finished in 0.043668 sec
CPU effective bandwidth: 2.89859 GB/s (732.297 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0218915 10%=0.0218929 median=0.0219004 90%=0.0247298 max=0.0247298)
GPU SpMV median effective VRAM bandwidth: 5.78346 GB/s (1461.16 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [128; 128], median NNZ per row=128, total NNZ=128000000...
CPU (multi-threaded via OpenMP) finished in 0.168708 sec
CPU effective bandwidth: 2.87002 GB/s (758.556 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0168651 10%=0.0172695 median=0.025286 90%=0.0253646 max=0.0253646)
GPU SpMV median effective VRAM bandwidth: 19.1524 GB/s (5062.08 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 32], median NNZ per row=17, total NNZ=16499998...
CPU (multi-threaded via OpenMP) finished in 0.0225671 sec
CPU effective bandwidth: 3.0495 GB/s (707.936 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0109355 10%=0.0109367 median=0.0109378 90%=0.0110137 max=0.0110137)
GPU SpMV median effective VRAM bandwidth: 6.30086 GB/s (1462.81 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 128], median NNZ per row=64, total NNZ=64499934...
CPU (multi-threaded via OpenMP) finished in 0.0847006 sec
CPU effective bandwidth: 2.92367 GB/s (755.301 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0214121 10%=0.0214138 median=0.0214202 90%=0.0229551 max=0.0229551)
GPU SpMV median effective VRAM bandwidth: 11.5653 GB/s (2987.83 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 128], median NNZ per row=80, total NNZ=80011495...
CPU (multi-threaded via OpenMP) finished in 0.105854 sec
CPU effective bandwidth: 2.88548 GB/s (755.558 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0140884 10%=0.0142801 median=0.0236615 90%=0.0237399 max=0.0237399)
GPU SpMV median effective VRAM bandwidth: 12.912 GB/s (3381.02 uint millions/s)

Посмотреть полные логи

@PolarNick239 PolarNick239 changed the title Task 07 Панов Антон Александрович ITMO Task07 Панов Антон Александрович ITMO Feb 4, 2026
@PolarNick239
Copy link
Member

4/5 баллов 👍(т.к. дедлайн)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants