GPU computing utilizes the Graphics Processing Units (GPUs) to perform parallel computations, significantly speeding up tasks that can be parallelized. Our goal is to parallelize many simple operations, with techniques such as memory coalescing to speed up computation. This repository provides some personal example code from a GPU Computing course which utilized NVIDIA A40s on a GPU compute node via a supercomputer virtual connection.
- Install the necessary drivers for your GPU.
- Install CUDA Toolkit (for NVIDIA GPUs) or OpenCL SDK (for AMD and Intel GPUs).
- Set up your IDE or text editor for CUDA or OpenCL development.
- Clone this repository to your local machine.
git cloneThis below is the command to access a GPU compute node via SLURM job scheduler:
srun --account=<ACCOUNT_NAME> \
--partition=<GPU_PARTITION_NAME> \
--nodes=1 \
--gpus-per-node=1 \
--tasks=1 \
--tasks-per-node=16 \
--cpus-per-task=1 \
--mem=20g \
--pty bash
or if you have environment variables set for your account and partition, you can use:
srun --account=$ACCOUNT \
--partition=$PARTITION \
--nodes=1 \
--gpus-per-node=1 \
--tasks=1 \
--tasks-per-node=16 \
--cpus-per-task=1 \
--mem=20g \
--pty bash
--account=<ACCOUNT_NAME>: Specifies the account to be charged for the job.--partition=<GPU_PARTITION_NAME>: Specifies the partition (queue) to submit the job to. This should be a GPU-enabled partition.--nodes=1: Requests one node for the job.--gpus-per-node=1: Requests one GPU per node.--tasks=1: Requests one task for the job.--tasks-per-node=16: Specifies the number of tasks to run on each node.--cpus-per-task=1: Allocates one CPU core per task.--mem=20g: Allocates 20 GB of memory for the job.--pty bash: Starts an interactive bash session on the allocated resources.
Ray, SkyPilot, Kubernetes, Transformer Lab GPU Orchestration, NVIDIA Base Command Platform, LLsub, and other options exist for managing GPU resources, but SLURM is widely used in academic and research environments. Kubernetes is more common in production environments, while Ray and SkyPilot are used for distributed computing tasks.