A collection of shell scripts and utilities for working with the SLURM BigPurple High-Performance Computing (HPC) environment at NYU Langone Health.
Shtools/
├── bin/ # CLI tools (add to PATH)
│ ├── bpcn # Request and connect to compute nodes
│ ├── bpdownload # Download data from BigPurple
│ ├── bpupload # Upload data to BigPurple
│ └── bpinfo # Show cluster and job queue status
└── scripts/ # Templates to copy and customize
├── bioinformatics/ # SLURM array jobs and Perl utilities
├── general/ # Utility scripts
└── parallel/ # Spark on SLURM setup
Standalone executables for interacting with BigPurple from your local machine:
| Tool | Description |
|---|---|
bpcn |
Request a compute node and configure SSH ProxyJump for easy access via ssh cn |
bpdownload |
rsync wrapper to download files from BigPurple |
bpupload |
rsync wrapper to upload files to BigPurple |
bpinfo |
Display cluster node info and your job queue |
Copy and customize for each project - don't run directly from repo:
fastp.sh- SLURM array job for FASTQ quality controlfastq_functions.sh- Functions for FASTQ file handlingfeatureCounts_functions.sh- Process featureCounts output filesgather_fastqs.pl- Perl script to collect and organize FASTQ filesget_references.pl- Perl script to generate reference genome settings
download_urls.sh- Download files from a list of URLsjoin-many.sh- Join multiple tab/comma-separated files
spark.sh- Configure Spark cluster on SLURM (source, don't execute)spark_jupyter.sh- SLURM job for Jupyter with Spark
git clone https://github.com/mattmuller0/Shtools.gitAdd the bin/ directory to your PATH in ~/.zshrc:
export PATH="$HOME/src/Shtools/bin:$PATH"The tools require an hpc host alias in your SSH config. Add to ~/.ssh/config:
Host hpc
HostName bigpurple.hpc.nyumc.org
User your_username
IdentityFile ~/.ssh/your_key
The bpcn script will automatically:
- Create
~/.ssh/config.d/directory if needed - Add
Include ~/.ssh/config.d/*to your SSH config - Write compute node config to
~/.ssh/config.d/bpcn
To use a different SSH host alias:
export BPCN_SSH_HOST=my_hpc_alias# Request a compute node (8GB/CPU, 4 CPUs = 32GB total, 4 hour limit)
bpcn -m 8GB -c 4 -t 4:00:00
# Dry run - show what would be submitted
bpcn -n -m 8GB -c 4
# Cancel your current compute node job
bpcn -k
# Show help
bpcn -hAfter the job starts, connect with:
ssh cn# Download from BigPurple
bpdownload -r /gpfs/data/yourlab/project -l ./local_dir
# Upload to BigPurple
bpupload -l ./local_dir -r /gpfs/data/yourlab/projectbpinfo- SSH access to BigPurple HPC with
hpchost alias configured - macOS or Linux
MIT License