Closed
Conversation
Collaborator
|
✅ Результаты тестирования PR #1010 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 8.66802 sec (CUDA: 0.117864 sec, OpenCL: 0.706237 sec, Vulkan: 7.84385 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.10105 10%=3.10105 median=3.10105 90%=3.10105 max=3.10105) Mandelbrot effective algorithm GFlops: 3.22471 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=0.953028 10%=0.954393 median=0.960289 90%=0.965054 max=0.965054) Mandelbrot effective algorithm GFlops: 10.4135 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 4.45138 seconds algorithm times (in seconds) - 10 values (min=0.00427509 10%=0.00427582 median=0.00427973 90%=4.45572 max=4.45572) Mandelbrot effective algorithm GFlops: 2336.6 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.295086 sec (CUDA: 0.127871 sec, OpenCL: 0.03798 sec, Vulkan: 0.129175 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E median bandwidth - 0.372529 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.0335375 10%=0.0335759 median=0.0336883 90%=0.0342653 max=0.0342653) sum median effective algorithm bandwidth: 11.0581 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.0153573 10%=0.0154189 median=0.0156364 90%=0.0163363 max=0.0163363) sum median effective algorithm bandwidth: 23.8244 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.0498866 seconds algorithm times (in seconds) - 10 values (min=0.00275093 10%=0.00275146 median=0.00275357 90%=0.0527548 max=0.0527548) sum median effective algorithm bandwidth: 135.289 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.0700375 seconds algorithm times (in seconds) - 10 values (min=0.00146321 10%=0.00146355 median=0.00146513 90%=0.0716105 max=0.0716105) sum median effective algorithm bandwidth: 254.264 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.0994171 seconds algorithm times (in seconds) - 10 values (min=0.0106701 10%=0.0106709 median=0.010678 90%=0.11019 max=0.11019) sum median effective algorithm bandwidth: 34.8876 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.139517 seconds algorithm times (in seconds) - 10 values (min=0.0177374 10%=0.0177383 median=0.0279139 90%=0.167551 max=0.167551) sum median effective algorithm bandwidth: 13.3456 GB/s |
Member
|
Какую бы PCI-E bandwidth вы бы ожидали на вашем компьютере по техническим характеристикам вашего компьютера? А на узле с Tesla T4? А вообще на лекции мы ее какой обсуждали? Насколько это сходится с логами? |
Author
|
тут такая же ошибка была - забыл поделить на медианное время хоть прямо перед этим 5 раз время замерил |
Collaborator
|
✅ Результаты тестирования PR #1010 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 8.54059 sec (CUDA: 0.115785 sec, OpenCL: 0.70738 sec, Vulkan: 7.71736 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.40858 10%=3.40858 median=3.40858 90%=3.40858 max=3.40858) Mandelbrot effective algorithm GFlops: 2.93378 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=1.0495 10%=1.0496 median=1.04992 90%=1.06077 max=1.06077) Mandelbrot effective algorithm GFlops: 9.52457 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 3.58923 seconds algorithm times (in seconds) - 10 values (min=0.00427744 10%=0.00428059 median=0.00428987 90%=3.59359 max=3.59359) Mandelbrot effective algorithm GFlops: 2331.07 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.334489 sec (CUDA: 0.127823 sec, OpenCL: 0.0386513 sec, Vulkan: 0.167954 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E median bandwidth - 8.45462 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.0364211 10%=0.036488 median=0.0368963 90%=0.0372249 max=0.0372249) sum median effective algorithm bandwidth: 10.0967 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.016925 10%=0.0169256 median=0.0172708 90%=0.0178131 max=0.0178131) sum median effective algorithm bandwidth: 21.5699 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.067585 seconds algorithm times (in seconds) - 10 values (min=0.0027527 10%=0.00275295 median=0.00275533 90%=0.0704515 max=0.0704515) sum median effective algorithm bandwidth: 135.203 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.0512642 seconds algorithm times (in seconds) - 10 values (min=0.00146345 10%=0.001464 median=0.00146541 90%=0.0528408 max=0.0528408) sum median effective algorithm bandwidth: 254.215 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.0498652 seconds algorithm times (in seconds) - 10 values (min=0.010679 10%=0.0106874 median=0.0110821 90%=0.0606583 max=0.0606583) sum median effective algorithm bandwidth: 33.6153 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.0483242 seconds algorithm times (in seconds) - 10 values (min=0.0239867 10%=0.0239924 median=0.024605 90%=0.0908858 max=0.0908858) sum median effective algorithm bandwidth: 15.1404 GB/s |
Member
|
5/6 баллов 👍 (за дедлайн) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Локальный вывод
еще сделал теор задание, прислал на почту. моя почта - maxim.ja54@gmail.com
Вывод Github CI