GPU-Computing

This repository contains code examples and resources for GPU computing using CUDA and OpenCL.

Introduction

GPU computing utilizes the Graphics Processing Units (GPUs) to perform parallel computations, significantly speeding up tasks that can be parallelized. Our goal is to parallelize many simple operations, with techniques such as memory coalescing to speed up computation. This repository provides some personal example code from a GPU Computing course which utilized NVIDIA A40s on a GPU compute node via a supercomputer virtual connection.

Local Setup

Install the necessary drivers for your GPU.
Install CUDA Toolkit (for NVIDIA GPUs) or OpenCL SDK (for AMD and Intel GPUs).
Set up your IDE or text editor for CUDA or OpenCL development.
Clone this repository to your local machine.

git clone

This below is the command to access a GPU compute node via SLURM job scheduler:

srun --account=<ACCOUNT_NAME> \
     --partition=<GPU_PARTITION_NAME> \
     --nodes=1 \
     --gpus-per-node=1 \
     --tasks=1 \
     --tasks-per-node=16 \
     --cpus-per-task=1 \
     --mem=20g \
     --pty bash

or if you have environment variables set for your account and partition, you can use:

srun --account=$ACCOUNT \
     --partition=$PARTITION \
     --nodes=1 \
     --gpus-per-node=1 \
     --tasks=1 \
     --tasks-per-node=16 \
     --cpus-per-task=1 \
     --mem=20g \
     --pty bash

Explanation of SLRUM, and other options to consider

--account=<ACCOUNT_NAME>: Specifies the account to be charged for the job.
--partition=<GPU_PARTITION_NAME>: Specifies the partition (queue) to submit the job to. This should be a GPU-enabled partition.
--nodes=1: Requests one node for the job.
--gpus-per-node=1: Requests one GPU per node.
--tasks=1: Requests one task for the job.
--tasks-per-node=16: Specifies the number of tasks to run on each node.
--cpus-per-task=1: Allocates one CPU core per task.
--mem=20g: Allocates 20 GB of memory for the job.
--pty bash: Starts an interactive bash session on the allocated resources.

Ray, SkyPilot, Kubernetes, Transformer Lab GPU Orchestration, NVIDIA Base Command Platform, LLsub, and other options exist for managing GPU resources, but SLURM is widely used in academic and research environments. Kubernetes is more common in production environments, while Ray and SkyPilot are used for distributed computing tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Matrix Multiplication Tiled		Matrix Multiplication Tiled
Matrix Multiplication		Matrix Multiplication
Multi-Dimensional Convolution		Multi-Dimensional Convolution
Nsight Systems Practice		Nsight Systems Practice
Numerical Methods with cuBLAS M-M Multiplication		Numerical Methods with cuBLAS M-M Multiplication
Vector Addition		Vector Addition
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-Computing

This repository contains code examples and resources for GPU computing using CUDA and OpenCL.

Introduction

Local Setup

Explanation of SLRUM, and other options to consider

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU-Computing

This repository contains code examples and resources for GPU computing using CUDA and OpenCL.

Introduction

Local Setup

Explanation of SLRUM, and other options to consider

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages