Analysis code for Molecular convergence of risk variants for congenital heart defects leveraging a regulatory map of the human fetal heart

The snakemake pipelines were designed to run on Sherlock, an HPC platform at Stanford University. The config files are located in [dir]/config. The Methods section of the corresponding manuscript provides detailed explanations of each step.

Envrionments

environment for snakemake runs:

envs/snakemake.yml

enivronemnt for running R scripts:

envs/R.yml

scRNA-seq alignment and background removal and Sample demultiplexing

RNA_alignment_snakemake

The code for aligning the scRNA portion of single cell multiomic sequencing to a reference assembly using Starsolo and demultiplexing samples using Souporcell. The input sample table requires information about the location of FASTQ files, the corresponding library name as found in the filenames, and the number of samples pooled in the library.

CellBender_Snakemake

The code for removing ambient RNA (background) using CellBender.

scATAC-seq alignment

scATAC_pipeline_lite

The code for aligning the scATAC portion of single cell multiomic sequencing to a reference assembly. The codebase was forked from https://github.com/austintwang/scATAC_pipeline_lite (commit: 76a4bc545318b5685eb9843e94000dde0d3394e7) with modifications for file and result organization.

scATAC-seq quality control

multiomic_ATAC_QC_Snakemake

The code for performing quality control on the scATAC portion of the multiome libraries. As defined in the config file, the default filters are minTSS >= 6, and minFrags >= 1000, unless overriden in the sample sheet. Only cells deemed valid by STARsolo were considered for this analysis. Both "addDoubletScores" from ArchR and Amulet were used to identify doublets.

scRNA-seq quality control and normalization

RNA_QC_Snakemake

The code for generating the necessary QC metrics of scRNA-seq using the outputs from ambient RNA removal.

Normalization_Snakemake

The code for filtering cells based on the thresholds determined by manual examination of the QC metrics generated by RNA_QC_Snakemake. The filtered scRNA-seq count matrices are then normalized and Scublet is used to identify doublets.

Doublet removal

Doublet_removal_Snakemake

The code for rerunning Scrublet if specified in the SampleSheet (when "Doublet_threshold" is not "Default"), removing the union of doublets identified by Scrublet, Archr, Amulet, and Souporcell.

Clustering and cell type annotation

evaluation_Snakemake

The code for merging filtered Seurat objects, normalizing the resulting count matrices, and performing cell clustering at various resolutions. The pipeline generates heatmaps reflecting marker gene expression in the clusters and tentatively labels clusters by mapping the cells to an annotated earlier version of the heart map.

clustering_Snakemake

The code for performing clustering on each major cell type group. The code performs count spliting (splitting the count matrix into training and test sets), normalization, and clustering of the training set at various resolutions. Subsequently, the test set is labeled using the clustering results from the training set. Finally, the expression of marker genes in these clusters is highlighted in both the test dataset and the original dataset.

annotation_Snakemake

The code for faciliating the annotation of fine-grained clusters. It generates heatmaps visualizaing the top 10 differentially expressed genes in each cluster at the resolution specified in the sample sheet. It also generates dot plots and feature plots of specified marker genes. It calculates the TPM of genes in each cluster.

Defining gene programs with consensus non-negative matrix factorization

cNMF_run_and_analysis/Snakefile_cNMF

The code for running cNMF on count matrices specified in the SampleSheet, using the array of K values specified in the config file.

cNMF_run_and_analysis/Snakefile_cNMF_analysis

The code for performing further analysis on cNMF results, including: rerunning the consensus step with an updated local density filter; organizing output files; summarizing the distribution of gene programs across various features (e.g., cell annotation, sex); identifying Gene Ontology (GO) terms associated with gene programs; quantifying the variance in gene expression explained by the gene programs; and generating additional plots for K selection.

Identifying transcription factors regulating gene programs

ChromVar_Snakemake

The code for using chromVar to calculate per-cell motif accessibility and output the Z-scores of bias corrected deviations of motif accessibility in each cell.

get_motif_enrichment

This directory contains code for calculating the correlation bewteen per-cell motif accessibility and the expression of gene programs.

Assessing cell type heritability enrichment with stratified LD score regression

LDSC_Snakemake

This directory contains code for LD score regression.

Preparing variants for overlapping with scE2G-predicted enhancers

LD_clumping_expansion

The code for calculating beta and odds ratios for variants in each dataset. It also performs LD clumping and expansion to identify the lead variants and variants in LD with the lead variants at each locus.

Linking GWAS variants to enhancers, target genes, cell types, and gene programs

V2G_v2

The code for linking GWAS variants to cell-type specific enhancers, target genes and gene programs.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CHD_enrichment		CHD_enrichment
CellBender_Snakemake		CellBender_Snakemake
ChromVar_Snakemake		ChromVar_Snakemake
Doublet_removal_Snakemake		Doublet_removal_Snakemake
LDSC_Snakemake		LDSC_Snakemake
LD_clumping_expansion		LD_clumping_expansion
Normalization_Snakemake		Normalization_Snakemake
RNA_QC_Snakemake		RNA_QC_Snakemake
RNA_alignment_snakemake		RNA_alignment_snakemake
V2G_v2		V2G_v2
VIC_27_enrichment		VIC_27_enrichment
annotation_Snakemake		annotation_Snakemake
cNMF_run_and_analysis		cNMF_run_and_analysis
clustering_Snakemake		clustering_Snakemake
envs		envs
evaluation_Snakemake		evaluation_Snakemake
get_motif_enrichment		get_motif_enrichment
multiomic_ATAC_QC_Snakemake		multiomic_ATAC_QC_Snakemake
scATAC_pipeline_lite		scATAC_pipeline_lite
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Analysis code for Molecular convergence of risk variants for congenital heart defects leveraging a regulatory map of the human fetal heart

Envrionments

environment for snakemake runs:

enivronemnt for running R scripts:

scRNA-seq alignment and background removal and Sample demultiplexing

RNA_alignment_snakemake

CellBender_Snakemake

scATAC-seq alignment

scATAC_pipeline_lite

scATAC-seq quality control

multiomic_ATAC_QC_Snakemake

scRNA-seq quality control and normalization

RNA_QC_Snakemake

Normalization_Snakemake

Doublet removal

Doublet_removal_Snakemake

Clustering and cell type annotation

evaluation_Snakemake

clustering_Snakemake

annotation_Snakemake

Defining gene programs with consensus non-negative matrix factorization

cNMF_run_and_analysis/Snakefile_cNMF

cNMF_run_and_analysis/Snakefile_cNMF_analysis

Identifying transcription factors regulating gene programs

ChromVar_Snakemake

get_motif_enrichment

Assessing cell type heritability enrichment with stratified LD score regression

LDSC_Snakemake

Preparing variants for overlapping with scE2G-predicted enhancers

LD_clumping_expansion

Linking GWAS variants to enhancers, target genes, cell types, and gene programs

V2G_v2

Testing disease genes for preferential expression in particular cell types

CHD_enrichment

Assessing enrichment of genes associated with valve traits in VIC program 27

VIC_27_enrichment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages