A Nextflow-based pipeline for single-cell RNA sequencing data preprocessing, specifically designed for BCP (Billion Cell Program) analysis. The pipeline performs comprehensive single-cell data processing including alignment, ambient RNA removal, doublet detection, and basic quality control metrics.
- Single-cell alignment using STAR
- Ambient RNA removal for cleaner expression profiles
- Doublet detection to filter multiplets
- Comprehensive QC metrics with MultiQC reporting
- Embedded QC plots included in MultiQC report
- Automated preprocessing with minimal manual intervention
Parameters will be optimized and made configurable for different species and organ types to enhance pipeline flexibility and accuracy.
The pipeline currently requires manual conda environment configuration for testing and development purposes.
Singularity container support will be implemented upon completion of the development phase to ensure better reproducibility and easier deployment across different computing environments.
# Clone the repository
git clone https://github.com/PhrenoVermouth/BCP_analysis.git
cd BCP_analysis
# Create conda environment
mamba env create -f bin/environment.yml# Run the pipeline
nextflow run ~/BCP_analysis/main.nf -profile standard- Genefull only (default):
nextflow run ~/BCP_analysis/main.nf --run_mode genefull - Velocity using prior GeneFull outputs: rerun in the same project directory after a completed GeneFull run. The pipeline
will reuse
${outdir}/soupx/<sample>/<sample>_corrected.h5adby default.If GeneFull outputs live elsewhere, optionally add anextflow run ~/BCP_analysis/main.nf --run_mode velocitycounts_h5adcolumn insamples.csvto point to custom.h5adlocations.
-
By default, Scrublet uses its automatic threshold.
-
To force a manual cutoff, pass
--scrublet_manual_thresholdwhen launching Nextflow; this value is forwarded torun_scrublet.py --manual_threshold.nextflow run ~/BCP_analysis/main.nf --scrublet_manual_threshold 0.25
-
Global threshold (default):
--max_mitosets a single mitochondrial percentage cutoff for all samples (default:0.2). -
Per-sample overrides: provide
--mito_max_map /path/to/filewhere each line maps one or more sample IDs to a cutoff using the formatsample1, sample2 = value. Blank lines and lines starting with#are ignored. Any sample not listed will fall back to--max_mito.Example (
resource/AC.mito):efm, em, fatfm, fatm, fbfm, fbm, hbfm, hbm, kdfm, kdm, lvfm, lvm, mbfm, mbm = 0.2 hfm, hm, ifm, im, pcf, pcm, skfm, skm = 0.6 lf, lm, smf, smm, spm, spfm = 0.4
The pipeline generates:
- Processed single-cell count matrices
- Quality control reports via MultiQC
- Doublet detection results
- Ambient RNA removal metrics