HPC Computing Guide

This guide covers running AlphaFast on HPC clusters using Singularity/Apptainer and SLURM.

Prerequisites

NVIDIA GPU with CUDA compute capability >= 6.0
CUDA driver >= 560.28.03 (required for CUDA 12.6 runtime)
Singularity >= 3.7 or Apptainer >= 1.0
SLURM workload manager

Pulling the Container

Convert the Docker image to a Singularity SIF file:

singularity pull alphafast.sif docker://romerolabduke/alphafast:latest

Set SINGULARITY_CACHEDIR to scratch storage to avoid filling your home directory:

export SINGULARITY_CACHEDIR=/scratch/$USER/.singularity
mkdir -p $SINGULARITY_CACHEDIR
singularity pull alphafast.sif docker://romerolabduke/alphafast:latest

The resulting alphafast.sif file is approximately 10 GB.

Filesystem Considerations

HPC filesystems vary in performance characteristics. The data pipeline writes many small temporary files during MMseqs2 searches, which can be very slow on network-attached storage.

Filesystem	Temp Files	Databases	Recommendations
Lustre	Slow (metadata-heavy)	Good (large sequential reads)	Use `--temp_dir` on local storage
GPFS	Slow (metadata-heavy)	Good	Use `--temp_dir` on local storage
NFS	Slow	Adequate	Use `--temp_dir` on local storage
Local SSD / /scratch	Fast	Fast	Preferred for temp files

Recommendation: Always set --temp_dir to fast local storage (e.g., /scratch or /tmp) when databases reside on network-attached storage. This can provide a 10-13x speedup for MMseqs2 searches. Typical temporary space usage is 1-5 GB per batch.

Bind Mounts

Singularity requires explicit bind mounts for directories outside the container. The run_alphafast.sh script handles this automatically for --db_dir, --weights_dir, --input_dir, --output_dir, --temp_dir, and --jax_compilation_cache_dir. If you need to pass additional paths, you may need to add bind mounts manually or ensure the paths are under an already-bound parent directory.

CUDA Driver Compatibility

AlphaFast uses CUDA 12.6 inside the container. The host system must have a compatible NVIDIA driver:

CUDA Version	Minimum Driver
CUDA 12.6	>= 560.28.03

Check your driver version:

nvidia-smi | head -3

If your driver is too old, contact your HPC system administrator.

Single-GPU SLURM Job

#!/bin/bash
#SBATCH --job-name=alphafast
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=4:00:00
#SBATCH --output=alphafast_%j.out
#SBATCH --error=alphafast_%j.err

# Set paths
INPUT_DIR=/path/to/inputs
OUTPUT_DIR=/path/to/outputs
DB_DIR=/path/to/databases
WEIGHTS_DIR=/path/to/weights
CONTAINER=/path/to/alphafast.sif

# Use local scratch for temporary files (recommended)
export SINGULARITY_CACHEDIR=/scratch/$USER/.singularity
mkdir -p /scratch/$USER/alphafast_tmp /scratch/$USER/alphafast_jax_cache

./scripts/run_alphafast.sh \
    --input_dir $INPUT_DIR \
    --output_dir $OUTPUT_DIR \
    --db_dir $DB_DIR \
    --weights_dir $WEIGHTS_DIR \
    --temp_dir /scratch/$USER/alphafast_tmp \
    --jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
    --container $CONTAINER

Multi-GPU SLURM Job

Multi-GPU mode uses a phase-separated parallel architecture: all N GPUs run MSA search in parallel (Phase 1), then all N GPUs run inference in parallel (Phase 2).

#!/bin/bash
#SBATCH --job-name=alphafast-multi
#SBATCH --partition=gpu
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --mem=256G
#SBATCH --time=8:00:00
#SBATCH --output=alphafast_multi_%j.out
#SBATCH --error=alphafast_multi_%j.err

# Set paths
INPUT_DIR=/path/to/inputs
OUTPUT_DIR=/path/to/outputs
DB_DIR=/path/to/databases
WEIGHTS_DIR=/path/to/weights
CONTAINER=/path/to/alphafast.sif

# Use local scratch for temporary files (recommended)
export SINGULARITY_CACHEDIR=/scratch/$USER/.singularity
mkdir -p /scratch/$USER/alphafast_tmp /scratch/$USER/alphafast_jax_cache

./scripts/run_alphafast.sh \
    --input_dir $INPUT_DIR \
    --output_dir $OUTPUT_DIR \
    --db_dir $DB_DIR \
    --weights_dir $WEIGHTS_DIR \
    --container $CONTAINER \
    --temp_dir /scratch/$USER/alphafast_tmp \
    --jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
    --gpu_devices 0,1,2,3

With 4 GPUs, all 4 GPUs run MSA search simultaneously in Phase 1. After all MSA jobs complete, the results are re-distributed and all 4 GPUs run inference simultaneously in Phase 2.

Resource Sizing

Workload	GPUs	CPUs	Memory	Wall Time (est.)
1 protein, single-GPU	1	8	64 GB	30-60 min
32 proteins, single-GPU	1	8	64 GB	4-8 hours
32 proteins, 4 GPUs	4	32	256 GB	1-3 hours
512 proteins, 4 GPUs	4	32	256 GB	12-24 hours

Memory requirements depend on protein size. For sequences longer than 2000 residues, increase to 128 GB (single-GPU) or 512 GB (multi-GPU).

Troubleshooting

Container fails to start with GPU errors: Verify the CUDA driver version is >= 560.28.03. Run nvidia-smi to check.

Slow MSA searches: Set --temp_dir to local scratch storage. Network filesystems add significant overhead for the many small I/O operations in MMseqs2.

Out of memory during inference: Reduce XLA_CLIENT_MEM_FRACTION (default 0.95) or use --flash_attention_implementation xla for lower memory usage on older GPUs.

First inference batch is much slower: Use --jax_compilation_cache_dir on fast storage so later runs can reuse compiled JAX executables. A fully cold multi-GPU launch can still spend extra time compiling on the first worker batch before the cache is populated.

Singularity permission errors: Ensure SINGULARITY_CACHEDIR is writable and on a filesystem that supports file locking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPC Computing Guide

Prerequisites

Pulling the Container

Filesystem Considerations

Bind Mounts

CUDA Driver Compatibility

Single-GPU SLURM Job

Multi-GPU SLURM Job

Resource Sizing

Troubleshooting

FilesExpand file tree

hpc.md

Latest commit

History

hpc.md

File metadata and controls

HPC Computing Guide

Prerequisites

Pulling the Container

Filesystem Considerations

Bind Mounts

CUDA Driver Compatibility

Single-GPU SLURM Job

Multi-GPU SLURM Job

Resource Sizing

Troubleshooting