- The format of the input manifest file
- Available parameter options in global_config files
- Popular toolsets in different workflows
The input manifest file path defines the path to the file that contains the file list for the parameter fastq_list (Fastq pipelines) or bam_list (Bam pipelines) in the study_config file.
| parameterType | shortName | Parameter1 | Parameter2 (optional column) | sample_type (optional column) | match_control (optional column) |
|---|---|---|---|---|---|
| fastqFile | sample1 | sample1.R1.fastq1.gz | sample1.R2.fastq2.gz (if paired reads) | tumor / normal | sample1_N |
shortName can be shared among different technical replicates for one sample ID, while cannot be redundant for different biological samples.
| parameterType | shortName | Parameter1 | sample_type (optional column) | match_control (optional column) |
|---|---|---|---|---|
| bamFile | sample1 | sample1.bam | tumor / normal | sample1_N |
shortName shall be unique.
| parameterType | shortName | Parameter1 | Parameter2 (optional column) |
|---|---|---|---|
| fastqFile | sample1 | sample1.R1.fastq1.gz | sample1.R2.fastq2.gz (if paired reads) |
shortName can be shared among different technical replicates for one sample ID, while cannot be redundant for different biological samples.
| parameterType | shortName | Parameter1 |
|---|---|---|
| bamFile | sample1 | sample1.bam |
shortName shall be unique.
Examples of input manifest files you can see here.
Users should choose to set the tools and databases as their specific pipeline needs, do not need to install all tools. In addition, when set the parameter names in the global_config file, make sure they exactly match the names in the table, the names are case sensitive.
| Section | Parameters |
|---|---|
| [Queue_Parameters] | NUMTHREADS (4) MAXMEM (8g) QUEUE (all.q/c32.q) PE (-pe threaded) |
| [all_tools] need to install properly before running Fonda pipeline |
Star, hisat2, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, lofreq, strelka2, freebayes, sequenza, exomecnv, samtools, picard, transvar, snpsift, xenome, contest, src_scripts |
| [Databases] need to download/prepare properly before running Fonda pipeline |
SPECIES (human/mouse) BED BED_WITH_HEADER BED_FOR_COVERAGE SNPSIFTDB (for snpsift) MOUSEXENOMEINDEX (for xenome) CONTEST_POPAF (for contest) CANONICAL_TRANSCRIPT GENOME GENOME_BUILD (hg19/GRCh38/mm10) KNOWN_INDELS_MILLS (for gatk_realign) KNOWN_INDELS_PHASE1 (for gatk_realign) DBSNP (for gatk_realign) COSMIC (for gatk_realign) NOVOINDEX (for novoalign) ADAPTER_SEQ (for seqpurge) ADAPTER_FWD (for trimmomatic) ADAPTER_REV (for trimmomatic) |
| [Pipeline_Info] | workflow toolset flag_xenome (yes/no) read_type (paired/single) |
| Section | Parameters |
|---|---|
| [Queue_Parameters] | NUMTHREADS (4) MAXMEM (8g) QUEUE (all.q/c32.q) PE (-pe threaded) |
| [all_tools] need to install properly before running Fonda pipeline |
star, hisat2, seqpurge, trimmomatic, java, rnaseqc_java, python, Rscript, cufflinks, rsem, stringtie, feature_count, samtools, picard, rnaseqc, xenome, src_scripts |
| [Databases] need to download/prepare properly before running Fonda pipeline |
SPECIES (human/mouse) ANNOTGENE GENOME GENOME_BUILD (hg19/GRCh38) TRANSCRIPTOME ANNOTGENESAF STARINDEX (for star) MOUSEXENOMEINDEX (for xenome) ADAPTER_SEQ (for seqpurge) ADAPTER_FWD (for trimmomatic) ADAPTER_REV (for trimmomatic) GENOME_LOAD (STAR tool option controls how the genome is loaded into memory) |
| [Pipeline_Info] | workflow toolset flag_xenome (yes/no) read_type (paired/single) |
| Section | Parameters |
|---|---|
| [Queue_Parameters] | NUMTHREADS (4) MAXMEM (8g) QUEUE (all.q/c32.q) PE (-pe threaded) |
| [all_tools] need to install properly before running Fonda pipeline |
bwa, novoalign, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, mutect2, lofreq, strelka2, freebayes, sequenza, exomecnv, samtools, picard, transvar, snpsift, xenome, contest, src_scripts |
| [Databases] need to download/prepare properly before running Fonda pipeline |
SPECIES (human/mouse) BED BED_WITH_HEADER BED_FOR_COVERAGE SNPSIFTDB (for snpsift) MOUSEXENOMEINDEX (for xenome) CONTEST_POPAF (for contest) CANONICAL_TRANSCRIPT GENOME GENOME_BUILD (hg19/GRCh38/mm10) KNOWN_INDELS_MILLS (for gatk_realign) KNOWN_INDELS_PHASE1 (for gatk_realign) DBSNP (for gatk_realign) COSMIC (for gatk_realign) NOVOINDEX (for novoalign) ADAPTER_SEQ (for seqpurge) ADAPTER_FWD (for trimmomatic) ADAPTER_REV (for trimmomatic) |
| [Pipeline_Info] | workflow toolset flag_xenome (yes/no) read_type (paired/single) |
| Section | Parameters |
|---|---|
| [Queue_Parameters] | NUMTHREADS (4) MAXMEM (8g) QUEUE (all.q/c32.q) PE (-pe threaded) |
| [all_tools] need to install properly before running Fonda pipeline |
bwa, novoalign, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, mutect2, lofreq, strelka2, freebayes, samtools, picard, transvar, snpsift, xenome, src_scripts |
| [Databases] need to download/prepare properly before running Fonda pipeline |
SPECIES (human/mouse) BED BED_WITH_HEADER BED_FOR_COVERAGE SNPSIFTDB (for snpsift) MOUSEXENOMEINDEX (for xenome) CONTEST_POPAF (for contest) CANONICAL_TRANSCRIPT GENOME GENOME_BUILD (hg19/GRCh38/mm10) KNOWN_INDELS_MILLS (for gatk_realign) KNOWN_INDELS_PHASE1 (for gatk_realign) DBSNP (for gatk_realign) COSMIC (for gatk_realign) NOVOINDEX (for novoalign) ADAPTER_SEQ (for seqpurge) ADAPTER_FWD (for trimmomatic) ADAPTER_REV (for trimmomatic) |
| [Pipeline_Info] | workflow toolset flag_xenome (yes/no) read_type (paired/single) |
| Section | Parameters |
|---|---|
| [Queue_Parameters] | NUMTHREADS (4) MAXMEM (8g) QUEUE (all.q/c32.q) PE (-pe threaded) |
| [all_tools] need to install properly before running Fonda pipeline |
cellranger, java, python, Rscript, samtools, picard, src_scripts |
| [Databases] need to download/prepare properly before running Fonda pipeline |
SPECIES (human/mouse) GENOME_BUILD (hg19/GRCh38/mm10) GENOME TRANSCRIPTOME |
| [cellranger] | cellranger_EXPECTED_CELLS cellranger_FORCED_CELLS cellranger_NOSECONDARY cellranger_CHEMISTRY cellranger_R1-LENGTH cellranger_R2-LENGTH cellranger_LANES cellranger_INDICES |
| [Pipeline_Info] | workflow toolset flag_xenome (yes/no) read_type (paired/single) |
A toolset contains a number of tools users want to run in a specific pipeline version. The combination of tools represent the components that users want Fonda to execute for a particular study dataset.
As we mentioned previously, any change in the global_config would generate a new pipeline version. Therefore, toolsets that contain different software step combinations will result to different pipeline versions.
Below there are a few popular toolsets for different workflows.
Note: make sure each individual tool executes properly before you use it in the Fonda context.
Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: star, hisat2
expression estimation: cufflinks, stringtie, rsem
read count: feature_count
qc: qc, rnaseqc
data processing: samtools, picard
expression data combination: conversion
Popular toolset options:
toolset=star+qc+featureCount+cufflinks+conversion
toolset=star+qc+featureCount+rsem+conversion
toolset=hisat2+qc+featureCount+stringtie
toolset=star+qc (specific for bam reads QC examination)
Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: bwa, novoalign
sequence realignment: abra2, gatk
variant detection: gatk, mutect, mutect2, vardict, lofreq, strelka2, freebayes, scalpel
CNV detection: sequenza, exomecnv
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard
Popular toolset options:
toolset=bwa+abra_realign+picard+vardict+mutect2+qc
toolset=novoalign+gatk_realign+picard+strelka2+snpsift+qc
toolset=bwa+abra_realign+picard+qc (specific for bam reads QC examination)
Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: bwa, novoalign
sequence realignment: abra2, gatk
variant detection: gatk, mutect, mutect2, vardict, lofreq, strelka2, freebayes, scalpel
CNV detection: sequenza, exomecnv
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard
Popular toolset options:
toolset= novoalign+abra_realign+picard+vardict+mutect1+snpsift
toolset= bwa+abra_realign+picard+vardict+strelka2+snpsift
toolset=bwa+abra_realign+picard+qc (specific for bam reads QC examination)
Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: star, hisat2
sequence realignment: abra2, gatk
variant detection: gatk, mutect, vardict, lofreq, strelka2, freebayes, scalpel
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard
Popular toolset options:
toolset=star+abra_realign+picard+vardict+qc
toolset=star+gatk_realign+picard+strelka2+snpsift+qc
toolset=star+abra_realign+picard+qc (specific for bam reads QC examination)
Available tools for each analytic step:
sequence alignment and analysis: count
doublelet detection: doubletdetection, scrublet
qc: qc
data processing: samtools, picard, python, Rscript
Popular toolset options:
toolset=count
toolset=count+qc
toolset=count+doubletdetection
toolset=count+scrublet
toolset=count+doubletdetection+scrublet