RNAseq Analysis

Environment set up

Assumes a UNIX based environment. If you have a SLURM environment, the shell scripts can be submitted with sbatch, assuming nodes with the needed resources exist.

Clone repo from GitHub.
From a terminal, make sure the current directory is the root of the cloned repo.
Set up a self-contained Conda (Micromamba) and environment (analysis) in Scripts/mm based on the yaml file Scripts/base.mm.yaml by running mmSetup.sbatch:

To run locally (from the base dir of the cloned repo):

./Scripts/mmSetup.sbatch ./Scripts/base.mm.yaml

To submit to a SLURM cluster (specifying the partition to run on as ):

sbatch --partition <PART> ./Scripts/mmSetup.sbatch ./Scripts/base.mm.yaml

Note: This is not a minimal environment, it is a generic environment and includes packages not needed here.

The ./Scripts/mm directory now contains a bin/micromamba executable and an envs/analysis environment.

Get needed data

The directory ./RawData/ has a sampleMeta.tsv file that describes the samples used in the analysis and corresponding to the samples uploaded to GEO as GSE289043.

The gene expression matrix file GSE289043_salmon_gene_matrix.tsv.gz must be downloaded, un-gzipped, and saved to the ./RawData/ directory.

The TPM (transcripts per million) version of the gene expression matrix file GSE289043_salmon_gene_matrix_tpm.tsv.gz must be downloaded, un-gzipped, and saved to the ./RawData/ directory.

The c2 (go), c5 (curated) and h (hallmark) MSigDB geneset files with HUGO symbols need to be downloaded and saved to the ./RawData/ directory. The v2023.1 versioned files can be downloaded from the MSigDB archive:

Run the analysis.

Scripts can be knit individually in order, or all of them can be run by executing the analysis script ./Scripts/runAnalysis.sbatch.

To run locally (from the base dir of the cloned repo):

./Scripts/runAnalysis.sbatch

To submit to a SLURM cluster (specifying the partition to run on as ):

sbatch --partition <PART> ./Scripts/runAnalysis.sbatch

Note: To avoid overwriting previous runs, the working directory (./Working) and the results directory (./Results) must be missing or empty. Rename or delete them to rerun.

Results

As is the default behavior, the *.html output for the Rmd scripts are written in the same directory. Output data files and important plots are saved to subdirectories in the ./Results directory. Intermediate files are saved to subdirectories in the ./Working directory.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Analysis		Analysis
RawData		RawData
Scripts		Scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
hamad_nrf2_nslc_xeno_expr.Rproj		hamad_nrf2_nslc_xeno_expr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAseq Analysis

Environment set up

Get needed data

Run the analysis.

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RNAseq Analysis

Environment set up

Get needed data

Run the analysis.

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages