This repository contains CUDA and PyTorch-based implementations for matrix operations and performance comparisons. It includes:
- Implements matrix multiplication using CUDA cores.
- Compares GPU performance with CPU performance.
- Demonstrates the use of CUDA kernels for parallel computation.
- Compares matrix multiplication using CUDA cores (FP32) and Tensor cores (FP16).
- Highlights the performance benefits of Tensor cores for half-precision computations.
- Includes result validation to ensure accuracy between FP32 and FP16 computations.
- NVIDIA GPU with CUDA support.
- CUDA Toolkit installed.
- C++ compiler (e.g.,
nvcc).
- PyTorch installed with CUDA support.
- NVIDIA GPU with Tensor Core support (for FP16 acceleration).
- Compile and run using
nvcc:nvcc -o MatMul MatMul.cu ./MatMul
- Run the Python script:
python matmul_pytorch.py
- Demonstrates GPU acceleration for matrix operations.
- Compares CPU and GPU performance for matrix multiplication.
- Highlights the use of Tensor cores for FP16 computations in PyTorch.
- Ensure your system has the required hardware and software for CUDA and PyTorch.
- The code includes result validation to ensure correctness of GPU computations.