Skip to content

Task02 Максим Ярош ITMO#1010

Closed
mrsfer1 wants to merge 2 commits intoGPGPUCourse:task02from
mrsfer1:task02
Closed

Task02 Максим Ярош ITMO#1010
mrsfer1 wants to merge 2 commits intoGPGPUCourse:task02from
mrsfer1:task02

Conversation

@mrsfer1
Copy link

@mrsfer1 mrsfer1 commented Jan 19, 2026

Локальный вывод

$ ./main_mandelbrot 1
Found 3 GPUs in 0.102268 sec (OpenCL: 0.0499321 sec, Vulkan: 0.0522292 sec)
Available devices:
  Device #0: API: Vulkan. iGPU. AMD Radeon Graphics (RADV RENOIR). Free memory: 4974/5462 Mb.
  Device #1: API: OpenCL. CPU. AMD Ryzen 7 7730U with Radeon Graphics         . Intel(R) Corporation. Total memory: 15364 Mb.
  Device #2: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15364/15364 Mb.
Using device #1: API: OpenCL. CPU. AMD Ryzen 7 7730U with Radeon Graphics         . Intel(R) Corporation. Total memory: 15364 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=6.88174 10%=6.88174 median=6.88174 90%=6.88174 max=6.88174)
Mandelbrot effective algorithm GFlops: 1.45312 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x16 threads
algorithm times (in seconds) - 10 values (min=0.354709 10%=0.358455 median=0.475545 90%=0.551221 max=0.551221)
Mandelbrot effective algorithm GFlops: 21.0285 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.108898 seconds
algorithm times (in seconds) - 10 values (min=0.0386772 10%=0.0387416 median=0.0389675 90%=0.149253 max=0.149253)
Mandelbrot effective algorithm GFlops: 256.624 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%

$ ./main_sum 1
Found 3 GPUs in 0.102431 sec (OpenCL: 0.049709 sec, Vulkan: 0.0526249 sec)
Available devices:
  Device #0: API: Vulkan. iGPU. AMD Radeon Graphics (RADV RENOIR). Free memory: 4980/5462 Mb.
  Device #1: API: OpenCL. CPU. AMD Ryzen 7 7730U with Radeon Graphics         . Intel(R) Corporation. Total memory: 15364 Mb.
  Device #2: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15364/15364 Mb.
Using device #1: API: OpenCL. CPU. AMD Ryzen 7 7730U with Radeon Graphics         . Intel(R) Corporation. Total memory: 15364 Mb.
Using OpenCL API...
PCI-E median bandwidth - 0.372529 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.181423 10%=0.181442 median=0.182399 90%=0.184426 max=0.184426)
sum median effective algorithm bandwidth: 2.04239 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0214587 10%=0.021471 median=0.0215453 90%=0.0251869 max=0.0251869)
sum median effective algorithm bandwidth: 17.2905 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.178815 seconds
algorithm times (in seconds) - 10 values (min=1.06969 10%=1.07143 median=1.07601 90%=2.40566 max=2.40566)
sum median effective algorithm bandwidth: 0.346213 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0248694 seconds
algorithm times (in seconds) - 10 values (min=0.540925 10%=0.54126 median=0.541665 90%=0.564745 max=0.564745)
sum median effective algorithm bandwidth: 0.687748 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0378023 seconds
algorithm times (in seconds) - 10 values (min=0.0168567 10%=0.0169098 median=0.0174152 90%=0.0560182 max=0.0560182)
sum median effective algorithm bandwidth: 21.391 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0351819 seconds
algorithm times (in seconds) - 10 values (min=0.0711733 10%=0.0713272 median=0.0719104 90%=0.106359 max=0.106359)
sum median effective algorithm bandwidth: 5.18046 GB/s

еще сделал теор задание, прислал на почту. моя почта - maxim.ja54@gmail.com

Вывод Github CI

Run ./main_mandelbrot 0
Found 2 GPUs in 0.0476947 sec (CUDA: 7.9167e-05 sec, OpenCL: 0.0225674 sec, Vulkan: 0.0250032 sec)
Available devices:
  Device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15995 Mb.
  Device #1: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15995/15995 Mb.
Using device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15995 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=1.99978 10%=1.99978 median=1.99978 90%=1.99978 max=1.99978)
Mandelbrot effective algorithm GFlops: 5.00056 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=0.603731 10%=0.606879 median=0.607061 90%=0.608474 max=0.608474)
Mandelbrot effective algorithm GFlops: 16.4728 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.15827 seconds
algorithm times (in seconds) - 10 values (min=0.151715 10%=0.156907 median=0.162415 90%=0.312005 max=0.312005)
Mandelbrot effective algorithm GFlops: 61.5706 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%

Run ./main_sum 0
Found 2 GPUs in 0.0494065 sec (CUDA: 8.1382e-05 sec, OpenCL: 0.0232529 sec, Vulkan: 0.0260293 sec)
Available devices:
  Device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15995 Mb.
  Device #1: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15995/15995 Mb.
Using device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15995 Mb.
Using OpenCL API...
PCI-E median bandwidth - 0.372529 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0324789 10%=0.0325819 median=0.0328541 90%=0.0331745 max=0.0331745)
sum median effective algorithm bandwidth: 11.3389 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0213247 10%=0.0213637 median=0.0215077 90%=0.0221741 max=0.0221741)
sum median effective algorithm bandwidth: 17.3207 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.111614 seconds
algorithm times (in seconds) - 10 values (min=1.53227 10%=1.53845 median=1.54005 90%=1.65099 max=1.65099)
sum median effective algorithm bandwidth: 0.241894 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0309507 seconds
algorithm times (in seconds) - 10 values (min=0.769907 10%=0.771234 median=0.771519 90%=0.802409 max=0.802409)
sum median effective algorithm bandwidth: 0.482852 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0553541 seconds
algorithm times (in seconds) - 10 values (min=0.0573202 10%=0.0573213 median=0.0573733 90%=0.113336 max=0.113336)
sum median effective algorithm bandwidth: 6.49308 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0435418 seconds
algorithm times (in seconds) - 10 values (min=0.270416 10%=0.270514 median=0.270817 90%=0.314753 max=0.314753)
sum median effective algorithm bandwidth: 1.37557 GB/s

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1010

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.66802 sec (CUDA: 0.117864 sec, OpenCL: 0.706237 sec, Vulkan: 7.84385 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.10105 10%=3.10105 median=3.10105 90%=3.10105 max=3.10105)
Mandelbrot effective algorithm GFlops: 3.22471 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=0.953028 10%=0.954393 median=0.960289 90%=0.965054 max=0.965054)
Mandelbrot effective algorithm GFlops: 10.4135 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 4.45138 seconds
algorithm times (in seconds) - 10 values (min=0.00427509 10%=0.00427582 median=0.00427973 90%=4.45572 max=4.45572)
Mandelbrot effective algorithm GFlops: 2336.6 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.295086 sec (CUDA: 0.127871 sec, OpenCL: 0.03798 sec, Vulkan: 0.129175 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E median bandwidth - 0.372529 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0335375 10%=0.0335759 median=0.0336883 90%=0.0342653 max=0.0342653)
sum median effective algorithm bandwidth: 11.0581 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0153573 10%=0.0154189 median=0.0156364 90%=0.0163363 max=0.0163363)
sum median effective algorithm bandwidth: 23.8244 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.0498866 seconds
algorithm times (in seconds) - 10 values (min=0.00275093 10%=0.00275146 median=0.00275357 90%=0.0527548 max=0.0527548)
sum median effective algorithm bandwidth: 135.289 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0700375 seconds
algorithm times (in seconds) - 10 values (min=0.00146321 10%=0.00146355 median=0.00146513 90%=0.0716105 max=0.0716105)
sum median effective algorithm bandwidth: 254.264 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0994171 seconds
algorithm times (in seconds) - 10 values (min=0.0106701 10%=0.0106709 median=0.010678 90%=0.11019 max=0.11019)
sum median effective algorithm bandwidth: 34.8876 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.139517 seconds
algorithm times (in seconds) - 10 values (min=0.0177374 10%=0.0177383 median=0.0279139 90%=0.167551 max=0.167551)
sum median effective algorithm bandwidth: 13.3456 GB/s

Посмотреть полные логи

@PolarNick239
Copy link
Member

Какую бы PCI-E bandwidth вы бы ожидали на вашем компьютере по техническим характеристикам вашего компьютера? А на узле с Tesla T4? А вообще на лекции мы ее какой обсуждали? Насколько это сходится с логами?

@mrsfer1
Copy link
Author

mrsfer1 commented Jan 21, 2026

тут такая же ошибка была - забыл поделить на медианное время хоть прямо перед этим 5 раз время замерил

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1010

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.54059 sec (CUDA: 0.115785 sec, OpenCL: 0.70738 sec, Vulkan: 7.71736 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.40858 10%=3.40858 median=3.40858 90%=3.40858 max=3.40858)
Mandelbrot effective algorithm GFlops: 2.93378 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=1.0495 10%=1.0496 median=1.04992 90%=1.06077 max=1.06077)
Mandelbrot effective algorithm GFlops: 9.52457 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 3.58923 seconds
algorithm times (in seconds) - 10 values (min=0.00427744 10%=0.00428059 median=0.00428987 90%=3.59359 max=3.59359)
Mandelbrot effective algorithm GFlops: 2331.07 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.334489 sec (CUDA: 0.127823 sec, OpenCL: 0.0386513 sec, Vulkan: 0.167954 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E median bandwidth - 8.45462 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0364211 10%=0.036488 median=0.0368963 90%=0.0372249 max=0.0372249)
sum median effective algorithm bandwidth: 10.0967 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.016925 10%=0.0169256 median=0.0172708 90%=0.0178131 max=0.0178131)
sum median effective algorithm bandwidth: 21.5699 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.067585 seconds
algorithm times (in seconds) - 10 values (min=0.0027527 10%=0.00275295 median=0.00275533 90%=0.0704515 max=0.0704515)
sum median effective algorithm bandwidth: 135.203 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0512642 seconds
algorithm times (in seconds) - 10 values (min=0.00146345 10%=0.001464 median=0.00146541 90%=0.0528408 max=0.0528408)
sum median effective algorithm bandwidth: 254.215 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0498652 seconds
algorithm times (in seconds) - 10 values (min=0.010679 10%=0.0106874 median=0.0110821 90%=0.0606583 max=0.0606583)
sum median effective algorithm bandwidth: 33.6153 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0483242 seconds
algorithm times (in seconds) - 10 values (min=0.0239867 10%=0.0239924 median=0.024605 90%=0.0908858 max=0.0908858)
sum median effective algorithm bandwidth: 15.1404 GB/s

Посмотреть полные логи

@PolarNick239
Copy link
Member

5/6 баллов 👍 (за дедлайн)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants