Assumes a UNIX based environment. If you have a SLURM environment, the shell scripts can be submitted with sbatch, assuming nodes with the needed resources exist.
- Clone repo from GitHub.
- From a terminal, make sure the current directory is the root of the cloned repo.
- Set up a self-contained Conda (Micromamba) and environment (
analysis) inScripts/mmbased on the yaml fileScripts/base.mm.yamlby runningmmSetup.sbatch:
To run locally (from the base dir of the cloned repo):
./Scripts/mmSetup.sbatch ./Scripts/base.mm.yaml
To submit to a SLURM cluster (specifying the partition to run on as ):
sbatch --partition <PART> ./Scripts/mmSetup.sbatch ./Scripts/base.mm.yaml
Note: This is not a minimal environment, it is a generic environment and includes packages not needed here.
The ./Scripts/mm directory now contains a bin/micromamba executable and an envs/analysis environment.
The directory ./RawData/ has a sampleMeta.tsv file that describes the samples used in the analysis and corresponding to the samples uploaded to GEO as GSE289043.
The gene expression matrix file GSE289043_salmon_gene_matrix.tsv.gz must be downloaded, un-gzipped, and saved to the ./RawData/ directory.
The TPM (transcripts per million) version of the gene expression matrix file GSE289043_salmon_gene_matrix_tpm.tsv.gz must be downloaded, un-gzipped, and saved to the ./RawData/ directory.
The c2 (go), c5 (curated) and h (hallmark) MSigDB geneset files with HUGO symbols need to be downloaded and saved to the ./RawData/ directory. The v2023.1 versioned files can be downloaded from the MSigDB archive:
Scripts can be knit individually in order, or all of them can be run by executing the analysis script ./Scripts/runAnalysis.sbatch.
To run locally (from the base dir of the cloned repo):
./Scripts/runAnalysis.sbatch
To submit to a SLURM cluster (specifying the partition to run on as ):
sbatch --partition <PART> ./Scripts/runAnalysis.sbatch
Note: To avoid overwriting previous runs, the working directory (
./Working) and the results directory (./Results) must be missing or empty. Rename or delete them to rerun.
As is the default behavior, the *.html output for the Rmd scripts are written in the same directory. Output data files and important plots are saved to subdirectories in the ./Results directory. Intermediate files are saved to subdirectories in the ./Working directory.