Skip to content

Task01 Максим Синицын HSE#1045

Open
c0ldzy17 wants to merge 1 commit intoGPGPUCourse:task01from
c0ldzy17:task01
Open

Task01 Максим Синицын HSE#1045
c0ldzy17 wants to merge 1 commit intoGPGPUCourse:task01from
c0ldzy17:task01

Conversation

@c0ldzy17
Copy link

@c0ldzy17 c0ldzy17 commented Feb 25, 2026

Локальный вывод

Found 1 GPUs in 0.654228 sec (OpenCL: 0.066856 sec, Vulkan: 0.58732 sec)
Available devices:
  Device #0: API: OpenCL+Vulkan. GPU. Apple M3 Pro. Free memory: 27648/27648 Mb.
Using device #0: API: OpenCL+Vulkan. GPU. Apple M3 Pro. Free memory: 27648/27648 Mb.
Using OpenCL API...
matrixes size: 16384x8192 = 3 * 512 MB

Running BAD matrix kernel...
Kernels compilation done in 0.10887 seconds
BAD kernel times, s10 values (min=0.012569 10%=0.012577 median=0.012618 90%=0.164671 max=0.164671)
BAD kernel median bandwidth, gb/s118.878

Running GOOD matrix kernel...
Kernels compilation done in 0.003939 seconds
GOOD kernel times, s10 values (min=0.012919 10%=0.012939 median=0.013008 90%=0.046022 max=0.046022)
GOOD kernel median bandwidth, gb/s115.314

P.S. Как я понял, на MacBook M3 Pro разницы особо нет из-за умной работы с памятью уже на уровне чипа

Вывод Github CI

Logs for PR #1045 (2026-02-25T17:53:30.986255+00:00):

=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix ===
=== main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.58803 sec (CUDA: 0.118098 sec, OpenCL: 0.718217 sec, Vulkan: 7.75166 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
matrixes size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 3.47536 seconds
BAD kernel times, s10 values (min=0.02029 10%=0.020293 median=0.020397 90%=3.496 max=3.496)
BAD kernel median bandwidth, gb/s73.5402
Running GOOD matrix kernel...
Kernels compilation done in 0.084948 seconds
GOOD kernel times, s10 values (min=0.006437 10%=0.006437 median=0.006439 90%=0.091471 max=0.091471)
GOOD kernel median bandwidth, gb/s232.955

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1045

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix ===
=== main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.58803 sec (CUDA: 0.118098 sec, OpenCL: 0.718217 sec, Vulkan: 7.75166 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
matrixes size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 3.47536 seconds
BAD kernel times, s10 values (min=0.02029 10%=0.020293 median=0.020397 90%=3.496 max=3.496)
BAD kernel median bandwidth, gb/s73.5402
Running GOOD matrix kernel...
Kernels compilation done in 0.084948 seconds
GOOD kernel times, s10 values (min=0.006437 10%=0.006437 median=0.006439 90%=0.091471 max=0.091471)
GOOD kernel median bandwidth, gb/s232.955

Посмотреть полные логи

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants