Skip to content

ahmadrezarazian/OpenCL_MultiDevice_Bandwidth_Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenCL Multi‑Device Memory Bandwidth Analyzer

Overview

OpenCL Multi‑Device Memory Bandwidth Analyzer is a C++ benchmarking tool that measures memory bandwidth performance across all OpenCL devices available on a system.

The program automatically detects all OpenCL platforms and devices (GPU, CPU, accelerators) and performs several tests to evaluate:

• Host → Device memory bandwidth
• Device → Host memory bandwidth
• Kernel global memory throughput

This allows developers and researchers to quickly identify which compute device provides the best OpenCL memory performance.

The project is lightweight, dependency‑minimal, and designed for reproducible benchmarking.


Screenshot

Example execution on a laptop GPU system.

Benchmark Screenshot


Example Output

Below is a real example output produced by the program.

====================================================================================================
PLATFORM #0
====================================================================================================
Name    : NVIDIA CUDA
Vendor  : NVIDIA Corporation
Version : OpenCL 3.0 CUDA 13.1.86
Devices : 1

  DEVICE #0
    Name           : NVIDIA GeForce RTX 4070 Laptop GPU
    Type           : GPU
    Version        : OpenCL 3.0 CUDA
    Driver         : 591.44
    Compute Units  : 36
    Global Mem     : 8187 MB
    Write BW       : 11.39 GB/s
    Read  BW       : 12.27 GB/s
    Kernel BW      : 13343.97 GB/s
    Status         : PASS

The program evaluates multiple devices and reports the measured bandwidth and status.


What is OpenCL

OpenCL (Open Computing Language) is an open standard for parallel computing across heterogeneous hardware.

OpenCL allows programs to run compute workloads on:

• GPUs
• CPUs
• integrated GPUs
• FPGAs
• accelerators

OpenCL separates programs into two parts:

Host Program

Runs on the CPU and is responsible for:

• discovering OpenCL platforms and devices
• allocating memory buffers
• compiling kernels
• launching compute kernels

Kernel Program

Runs on the compute device (GPU / CPU) and performs massively parallel operations.


Software and Tools Using OpenCL

OpenCL is used in many real‑world applications and frameworks.

Examples include:

Software Use Case
Blender GPU rendering
DaVinci Resolve video processing
Darktable photo processing
OpenCV image processing
Intel oneAPI heterogeneous computing
AMD ROCm GPU compute
scientific HPC tools simulations

Benchmark Tests

This tool performs three different measurements.

1 Host Write Bandwidth

Measures transfer speed from:

CPU → GPU

Implemented with:

clEnqueueWriteBuffer

2 Host Read Bandwidth

Measures transfer speed from:

GPU → CPU

Implemented with:

clEnqueueReadBuffer

3 Kernel Memory Throughput

A custom OpenCL kernel repeatedly reads and writes global memory.

Example kernel:

__kernel void memory_copy_test(__global const uchar* src,
                               __global uchar* dst,
                               const uint iterations)

This simulates heavy GPU memory traffic.


Libraries Used

The project intentionally uses minimal dependencies.

OpenCL

Main API used:

CL/cl.h

Used for:

• platform enumeration
• device discovery
• memory allocation
• kernel compilation
• kernel execution

Standard C++ Libraries

Library Purpose
iostream console output
vector data containers
string device information
algorithm sorting results
numeric averaging
chrono performance timing
iomanip formatted printing

Project Structure

opencl-multidevice-bandwidth-analyzer
│
├── src
│   └── main.cpp
│
├── doc
│   └── image1.png
│
├── include
│   └── CL
│       └── cl.h
│
├── lib
│   └── OpenCL.lib
│
├── README.md
├── LICENSE
└── .gitignore

src/

Contains the C++ benchmark implementation.

Main responsibilities:

• OpenCL platform discovery
• device enumeration
• memory transfer benchmarks
• kernel execution
• device ranking

doc/

Contains documentation assets such as screenshots used in the README.


Installation

1 Install OpenCL Drivers

Install OpenCL drivers appropriate for your hardware.

NVIDIA

Install the latest GPU driver.

https://developer.nvidia.com/opencl

Intel

Install Intel oneAPI Base Toolkit.

https://www.intel.com/oneapi

AMD

Install ROCm or AMD GPU drivers.

https://rocm.docs.amd.com


2 Clone the Repository

git clone https://github.com/YOUR_USERNAME/opencl-multidevice-bandwidth-analyzer.git
cd opencl-multidevice-bandwidth-analyzer

3 Build

Linux / WSL

g++ src/main.cpp -O2 -lOpenCL -o bandwidth_analyzer

Windows (MSVC)

cl src\main.cpp OpenCL.lib

4 Run

./bandwidth_analyzer

or

bandwidth_analyzer.exe

The program will automatically detect all OpenCL devices and run the benchmark.


Limitations

Current limitations:

• Only global memory bandwidth is tested
• No local/shared memory benchmarks
• No compute FLOPS test
• No multi‑GPU concurrent benchmarking
• Results may vary due to PCIe bandwidth or driver differences


Future Improvements

Possible future extensions:

• GPU compute FLOPS benchmark
• shared/local memory benchmark
• OpenCL event profiling
• CSV export of results
• graphical charts for comparison
• CUDA vs OpenCL comparison mode
• multi‑GPU parallel testing


Author

Sayed Ahmadreza Razian, PhD

LinkedIn
https://www.linkedin.com/in/ahmadrezarazian/

Google Scholar
https://scholar.google.com/citations?user=Dh9Iy2YAAAAJ

Email
AhmadrezaRazian@gmail.com

Feel free to contact me for collaboration or questions.

About

OpenCL benchmarking tool to measure host-device bandwidth and kernel global memory throughput across GPUs and CPUs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors