Paulo Jannig - paulo.jannig@ki.se | paulo.jannig@su.se | GitHub account
Hong Jiang - hong.jiang@ki.se | GitHub account
This repository contains the Deng Lab's pipeline for analyzing Prime-seq libraries sequenced on the DNBSEQ-G400 platform or Novogene-Illumina platform, using zUMIs.
This workflow covers:
- Quality control (QC) of raw and processed FASTQ files
- Preparation of sample-specific barcodes and zUMIs configuration
- Running zUMIs with Prime-seq specific parameters
1. Install Pixi (to manage environment)
curl -fsSL https://pixi.sh/install.sh | sh
pixi self-updatemkdir ~/github_resources
cd ~/github_resources
# Clone this Prime-seq pipeline repository
git clone https://github.com/paulojannig/Prime-seq_analysis.git
# Clone zUMIs repository
git clone https://github.com/sdparekh/zUMIs.git
cd ~/github_resources/Prime-seq_analysis
tmux new -s primeseqpixi install
pixi shell -e default --manifest-path pixi.toml- Edit
config.shto match your paths and project variables using VS Code (or bynano config.sh):
Example:
EXPERIMENT=PJ101_TEMPLATE
PATH_EXPERIMENT=/mnt/run/paulo/${EXPERIMENT}
PATH_RAW_DATA=/mnt/storage/paulo/PJ101_TEMPLATE/
FLOWCELL=V350293965
BARCODE=IDTi51i7N701Raw sequencing data for each user should be stored under /mnt/storage/USER/
- Run QC script:
This script will:
- Create the full project folder structure under your
${PATH_EXPERIMENT} - Copy raw FASTQ files and sequencing run reports from the server (
/mnt/storage/USER/) - Merge data from multiple sequencing lanes
- Run initial quality control (FastQC + MultiQC) on the untrimmed reads
- Trim Prime-seq specific adapter and unwanted regions (from Read 1 and Read 2)
- Run quality control again on the trimmed reads
- Organize logs, config files, and R scripts needed for the next steps
Run the script like this:
nohup ./scripts/01.primeseq_QC.sh >> log.01.primeseq_QC.txtThis will keep the script running in the background and log the progress to log.01.primeseq_QC.txt.
Expected runtime: Approximately 1–5 hours, depending on the number of samples and lanes in the flowcell.
- Check QC Reports:
cd ~/$PATH_EXPERIMENT/Data/00.reportsOpen the untrimmed MultiQC report and go to Per Base Sequence Content:
~/$PATH_EXPERIMENT/Data/00.reports/Untrimmed/MultiQC_untrimmed_output/multiqc_report.html
✅ QC Expectations:
-
Read 1: Contains Cell Barcodes, UMIs, and potentially some insert sequence.
- Barcodes = noisy base distribution
- UMIs = smoother, constant bases
- Downstream insert (after BC/UMI) = T-rich (expected for Prime-seq)
-
Read 2: Actual cDNA fragment
- Check correct read length (e.g., 100 bp or 150 bp)
- Note: zUMIs will typically skip bases 1-14 of Read 2 during mapping (to avoid adapter/low-quality sequence).
- Edit sample barcode file using VS Code:
nano ~/github_resources/Prime-seq_analysis/Primeseq_barcodes_samples.tsv- Edit zUMIs YAML config using VS Code:
nano ~/github_resources/Prime-seq_analysis/primeseq_zUMIs_$EXPERIMENT.yaml✅ Check and adjust the following in your YAML config file (primeseq_zUMIs_$EXPERIMENT.yaml):
- Paths to FASTQ files
- Project, flowcell and barcode/index info
- Oligo-Barcodes file path (typically
Primeseq_barcodes_samples.tsv) - Output directory (
/mnt/run/USER/$EXPERIMENT/) - STAR index path and GTF file for the correct species
- Double-check STAR index compatibility with your read length:
- For PE100: Use
STAR_index_85and setbase_definition: cDNA(15-100) - For PE150: Use
STAR_index_135and setbase_definition: cDNA(15-150)
- For PE100: Use
- Double-check STAR index compatibility with your read length:
- Number of threads (adjust based on available CPUs on the server)
cd ~/github_resources/Prime-seq_analysis
nohup ./scripts/02.primeseq_zUMIs.sh >> log.02.primeseq_zUMIs.txt- The R Markdown templates for downstream analysis are available in
~/$PATH_EXPERIMENT/scripts/ - You can either:
- transfer the
$EXPERIMENTfolder to your local machine (recommended for RStudio Desktop), or- run the analysis directly on our workstation (recommended for large datasets or for VS code).
- transfer the
- Note that large files are stored in
~/$PATH_EXPERIMENT/Data/. If you download the experiment to your local machine, avoid syncing this folder. - When the analysis is completed, move at least the
~/$PATH_EXPERIMENT/Data/directory to/mnt/USER/storage/for long-term storage. Do not keep raw data under/mnt/run/.
- Always monitor your log files for errors (log.01.primeseq_QC.txt and log.02.primeseq_zUMIs.txt)
- New session:
tmux new -s session_nameor simplytmux - List sessions:
tmux ls - Attach:
tmux attach -t session_name - Detach:
Ctrl-b d