Skip to content

Latest commit

 

History

History
95 lines (66 loc) · 4.42 KB

File metadata and controls

95 lines (66 loc) · 4.42 KB

HPC Profiler Bootcamp Deployment Guide

Prerequisites

To run this bootcamp you will need a machine with NVIDIA GPUs. The profiling tools require:

Tested Environment

We tested and ran all labs on a DGX machine equipped with A100 and H100 GPUs (80GB).

Tool/Environment Version Details
HPC Docker Image 2025.11 nvhpc:25.11-devel-cuda_multi-ubuntu22.04
Nsight Systems 2025.5.1 2025.5.1.121-255136380782v0
Nsight Compute 2025.2.1 2025.2.1.0

Deploying the Labs

Running Docker Container

To run the labs, you will need access to a single GPU. Build a Docker container by following these steps:

  1. Open a terminal window and navigate to the directory where the Dockerfile is located (e.g., cd ~/HPC-Profiler)

  2. To build the docker container, run:

sudo docker build -t hpc-profiler:latest .
  1. To run the built container:
docker run --rm -it --gpus=all -p 8888:8888 \
    --cap-add=SYS_ADMIN --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    -v /path/to/_profiler:/workspace/_profiler \
    -w /workspace/_profiler \
    hpc-profiler:latest

Flag descriptions:

  • --rm cleans up temporary images created during the running of the container
  • -it enables interactive mode and killing the jupyter server with ctrl-c
  • --gpus=all enables all NVIDIA GPUs during container runtime
  • --cap-add=SYS_ADMIN and --cap-add=SYS_PTRACE grant necessary permissions for profiling tools
  • --security-opt seccomp=unconfined allows profiling operations that require system calls
  • -v mounts local directories in the container filesystem
  • -w sets the working directory inside the container
  • -p explicitly maps port 8888

When this command is run, you can browse to the serving machine on port 8888 using any web browser to access the labs. For instance, if running on the local machine, the web browser should be pointed to http://localhost:8888.

  1. Once inside the container, open the jupyter lab in browser: http://localhost:8888, and start the lab by clicking on the _start_profiling.ipynb notebook.

  2. As soon as you are done with the labs, shut down jupyter lab by selecting File > Shut Down and exit the container by typing exit or pressing ctrl + d in the terminal window.

Running Singularity Container

  1. Build the labs Singularity container with:
sudo singularity build _profiler.simg Singularity

If you do not have sudo rights, you can build the singularity container with the --fakeroot option:

singularity build --fakeroot _profiler.simg Singularity
  1. Copy the files to your local machine to ensure changes are stored locally:
singularity run _profiler.simg cp -rT /labs ~/labs
  1. To run the built container:
singularity run --nv _profiler.simg jupyter-lab --notebook-dir=~/labs

The --nv flag enables NVIDIA GPU support.

  1. Once inside the container, open the jupyter lab in browser: http://localhost:8888, and start the lab by clicking on the _start_profiling.ipynb notebook.

  2. When you finish these notebooks, shut down jupyter lab by selecting File > Shut Down in the top left corner, then shut down the Singularity container by typing exit or pressing ctrl + d in the terminal window.

Troubleshooting

ERR_NVGPUCTRPERM: Permission Issue with GPU Performance Counters

If you encounter ERR_NVGPUCTRPERM error when profiling, ensure the container is started with --cap-add=SYS_ADMIN. For a permanent solution, enable access on the host: sudo sh -c 'echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" > /etc/modprobe.d/nvidia-profiling.conf' then reboot.

See NVIDIA's solutions guide for details.