A lightweight Python OpenCL benchmarking tool for measuring memory bandwidth performance across CPUs, GPUs, and other OpenCL compute devices.
The tool automatically detects available OpenCL platforms and devices and evaluates:
- Host → Device transfer bandwidth
- Device → Host transfer bandwidth
- Device global memory throughput inside OpenCL kernels
- Comparative performance between multiple devices
This repository is useful for hardware validation, GPU benchmarking, OpenCL experimentation, and performance analysis.
Example run on a laptop system:
OpenCL (Open Computing Language) is an open standard for parallel computing across heterogeneous hardware.
It allows programs to run on:
- CPUs
- GPUs
- Integrated GPUs
- FPGAs
- AI accelerators
Using a single unified programming model.
• Hardware portability -- same code runs on many devices
• Vendor‑neutral standard -- not locked to a single company
• High performance parallel computing
• Access to GPU memory bandwidth and compute cores
• Scalable from embedded systems to supercomputers
Because of these advantages, OpenCL is widely used in scientific computing, image processing, machine learning pipelines, and hardware validation tools.
OpenCL is supported and used by major hardware and software companies:
- Intel -- integrated GPU computing and performance libraries
- NVIDIA -- GPU computing and driver OpenCL runtime
- AMD -- GPU compute platforms and ROCm/OpenCL support
- Apple -- previously used in macOS GPU compute stack
- ARM -- mobile GPU compute support
- Qualcomm -- mobile compute acceleration
- Xilinx / AMD Adaptive -- FPGA compute acceleration
OpenCL is also used in scientific software, rendering engines, signal processing systems, and GPU benchmarking tools.
This tool measures three key bandwidth metrics:
Test Description
Write Bandwidth Host → Device memory transfer speed Read Bandwidth Device → Host memory transfer speed Kernel Bandwidth Device memory throughput inside OpenCL kernel
The program runs tests on all detected OpenCL devices and prints a final comparison table.
The project uses two Python libraries:
PyOpenCL provides Python bindings to the OpenCL runtime.
It allows Python code to:
- detect OpenCL devices
- compile kernels
- allocate GPU memory
- run compute kernels
- measure device performance
Without PyOpenCL, Python cannot directly access OpenCL hardware.
NumPy is used for:
- creating host memory buffers
- generating test data
- efficient numerical operations
- validating memory transfer results
NumPy is the standard numerical computing library in Python and integrates efficiently with OpenCL buffers.
Install Python dependencies:
pip install -r requirements.txtYou must also have an OpenCL runtime installed on your system.
Examples:
Hardware OpenCL Runtime
NVIDIA GPU CUDA OpenCL Intel CPU/GPU Intel OpenCL Runtime AMD GPU ROCm / AMD OpenCL Any CPU POCL (Portable OpenCL)
python src/main.pyThe program will automatically:
- Detect OpenCL platforms
- Detect all available devices
- Run memory bandwidth tests
- Print a comparison table
Example benchmark output can be found here:
results/example_output1.txt
Typical output summary:
Device Type Write GB/s Read GB/s Kernel GB/s
NVIDIA GPU GPU 11.39 12.27 13343 Intel iGPU GPU 6.49 13.96 8200
OpenCL-MultiDevice-Memory-Bandwidth-Benchmark-Python/
│
├── src/
│ └── main.py
│
├── doc/
│ └── image1.png
│
├── results/
│ └── example_output1.txt
│
├── requirements.txt
├── README.md
├── LICENSE
├── .gitignore
└── repo_info.txt
This project can be useful for:
- GPU performance benchmarking
- Hardware validation
- OpenCL experimentation
- Parallel computing education
- GPU memory stress testing
Sayed Ahmadreza Razian, PhD
LinkedIn
https://www.linkedin.com/in/ahmadrezarazian/
Google Scholar
https://scholar.google.com/citations?user=Dh9Iy2YAAAAJ
Email
AhmadrezaRazian@gmail.com
Feel free to contact me for collaboration or questions.
