Skip to content

Latest commit

 

History

History
186 lines (141 loc) · 11.2 KB

File metadata and controls

186 lines (141 loc) · 11.2 KB

Additional sections

The format of the input manifest file for batch processing

The input manifest file path defines the path to the file that contains the file list for the parameter fastq_list (Fastq pipelines) or bam_list (Bam pipelines) in the study_config file.

DNA sequencing data from FASTQ

parameterType shortName Parameter1 Parameter2 (optional column) sample_type (optional column) match_control (optional column)
fastqFile sample1 sample1.R1.fastq1.gz sample1.R2.fastq2.gz (if paired reads) tumor / normal sample1_N

shortName can be shared among different technical replicates for one sample ID, while cannot be redundant for different biological samples.

DNA sequencing data from BAM

parameterType shortName Parameter1 sample_type (optional column) match_control (optional column)
bamFile sample1 sample1.bam tumor / normal sample1_N

shortName shall be unique.

RNA sequencing data from FASTQ

parameterType shortName Parameter1 Parameter2 (optional column)
fastqFile sample1 sample1.R1.fastq1.gz sample1.R2.fastq2.gz (if paired reads)

shortName can be shared among different technical replicates for one sample ID, while cannot be redundant for different biological samples.

RNA sequencing data from BAM

parameterType shortName Parameter1
bamFile sample1 sample1.bam

shortName shall be unique.

Examples of input manifest files you can see here.

Available parameter options in global_config for major workflows

Users should choose to set the tools and databases as their specific pipeline needs, do not need to install all tools. In addition, when set the parameter names in the global_config file, make sure they exactly match the names in the table, the names are case sensitive.

RnaCaptureVar_Fastq

Section Parameters
[Queue_Parameters] NUMTHREADS (4)
MAXMEM (8g)
QUEUE (all.q/c32.q)
PE (-pe threaded)
[all_tools]
need to install properly before running Fonda pipeline
Star, hisat2, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, lofreq, strelka2, freebayes, sequenza, exomecnv, samtools, picard, transvar, snpsift, xenome, contest, src_scripts
[Databases]
need to download/prepare properly before running Fonda pipeline
SPECIES (human/mouse)
BED
BED_WITH_HEADER
BED_FOR_COVERAGE
SNPSIFTDB (for snpsift)
MOUSEXENOMEINDEX (for xenome)
CONTEST_POPAF (for contest)
CANONICAL_TRANSCRIPT
GENOME
GENOME_BUILD (hg19/GRCh38/mm10)
KNOWN_INDELS_MILLS (for gatk_realign)
KNOWN_INDELS_PHASE1 (for gatk_realign)
DBSNP (for gatk_realign)
COSMIC (for gatk_realign)
NOVOINDEX (for novoalign)
ADAPTER_SEQ (for seqpurge)
ADAPTER_FWD (for trimmomatic)
ADAPTER_REV (for trimmomatic)
[Pipeline_Info] workflow
toolset
flag_xenome (yes/no)
read_type (paired/single)

RnaExpression_Fastq

Section Parameters
[Queue_Parameters] NUMTHREADS (4)
MAXMEM (8g)
QUEUE (all.q/c32.q)
PE (-pe threaded)
[all_tools]
need to install properly before running Fonda pipeline
star, hisat2, seqpurge, trimmomatic, java, rnaseqc_java, python, Rscript, cufflinks, rsem, stringtie, feature_count, samtools, picard, rnaseqc, xenome, src_scripts
[Databases]
need to download/prepare properly before running Fonda pipeline
SPECIES (human/mouse)
ANNOTGENE
GENOME
GENOME_BUILD (hg19/GRCh38)
TRANSCRIPTOME
ANNOTGENESAF
STARINDEX (for star)
MOUSEXENOMEINDEX (for xenome)
ADAPTER_SEQ (for seqpurge)
ADAPTER_FWD (for trimmomatic)
ADAPTER_REV (for trimmomatic)
GENOME_LOAD (STAR tool option controls how the genome is loaded into memory)
[Pipeline_Info] workflow
toolset
flag_xenome (yes/no)
read_type (paired/single)

DnaCaptureVar_Fastq

Section Parameters
[Queue_Parameters] NUMTHREADS (4)
MAXMEM (8g)
QUEUE (all.q/c32.q)
PE (-pe threaded)
[all_tools]
need to install properly before running Fonda pipeline
bwa, novoalign, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, mutect2, lofreq, strelka2, freebayes, sequenza, exomecnv, samtools, picard, transvar, snpsift, xenome, contest, src_scripts
[Databases]
need to download/prepare properly before running Fonda pipeline
SPECIES (human/mouse)
BED
BED_WITH_HEADER
BED_FOR_COVERAGE
SNPSIFTDB (for snpsift)
MOUSEXENOMEINDEX (for xenome)
CONTEST_POPAF (for contest)
CANONICAL_TRANSCRIPT
GENOME
GENOME_BUILD (hg19/GRCh38/mm10)
KNOWN_INDELS_MILLS (for gatk_realign)
KNOWN_INDELS_PHASE1 (for gatk_realign)
DBSNP (for gatk_realign)
COSMIC (for gatk_realign)
NOVOINDEX (for novoalign)
ADAPTER_SEQ (for seqpurge)
ADAPTER_FWD (for trimmomatic)
ADAPTER_REV (for trimmomatic)
[Pipeline_Info] workflow
toolset
flag_xenome (yes/no)
read_type (paired/single)

DnaAmpliconVar_Fastq

Section Parameters
[Queue_Parameters] NUMTHREADS (4)
MAXMEM (8g)
QUEUE (all.q/c32.q)
PE (-pe threaded)
[all_tools]
need to install properly before running Fonda pipeline
bwa, novoalign, seqpurge, trimmomatic, java, python, Rscript, gatk, abra2, vardict, mutect1, mutect2, lofreq, strelka2, freebayes, samtools, picard, transvar, snpsift, xenome, src_scripts
[Databases]
need to download/prepare properly before running Fonda pipeline
SPECIES (human/mouse)
BED
BED_WITH_HEADER
BED_FOR_COVERAGE
SNPSIFTDB (for snpsift)
MOUSEXENOMEINDEX (for xenome)
CONTEST_POPAF (for contest)
CANONICAL_TRANSCRIPT
GENOME
GENOME_BUILD (hg19/GRCh38/mm10)
KNOWN_INDELS_MILLS (for gatk_realign)
KNOWN_INDELS_PHASE1 (for gatk_realign)
DBSNP (for gatk_realign)
COSMIC (for gatk_realign)
NOVOINDEX (for novoalign)
ADAPTER_SEQ (for seqpurge)
ADAPTER_FWD (for trimmomatic)
ADAPTER_REV (for trimmomatic)
[Pipeline_Info] workflow
toolset
flag_xenome (yes/no)
read_type (paired/single)

scRnaExpression_CellRanger_Fastq

Section Parameters
[Queue_Parameters] NUMTHREADS (4)
MAXMEM (8g)
QUEUE (all.q/c32.q)
PE (-pe threaded)
[all_tools]
need to install properly before running Fonda pipeline
cellranger, java, python, Rscript, samtools, picard, src_scripts
[Databases]
need to download/prepare properly before running Fonda pipeline
SPECIES (human/mouse)
GENOME_BUILD (hg19/GRCh38/mm10)
GENOME
TRANSCRIPTOME
[cellranger] cellranger_EXPECTED_CELLS
cellranger_FORCED_CELLS
cellranger_NOSECONDARY
cellranger_CHEMISTRY
cellranger_R1-LENGTH
cellranger_R2-LENGTH
cellranger_LANES
cellranger_INDICES
[Pipeline_Info] workflow
toolset
flag_xenome (yes/no)
read_type (paired/single)

Popular toolsets in different workflows

A toolset contains a number of tools users want to run in a specific pipeline version. The combination of tools represent the components that users want Fonda to execute for a particular study dataset.
As we mentioned previously, any change in the global_config would generate a new pipeline version. Therefore, toolsets that contain different software step combinations will result to different pipeline versions.
Below there are a few popular toolsets for different workflows.
Note: make sure each individual tool executes properly before you use it in the Fonda context.

RnaExpression_Fastq

Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: star, hisat2
expression estimation: cufflinks, stringtie, rsem
read count: feature_count
qc: qc, rnaseqc
data processing: samtools, picard
expression data combination: conversion

Popular toolset options:
toolset=star+qc+featureCount+cufflinks+conversion
toolset=star+qc+featureCount+rsem+conversion
toolset=hisat2+qc+featureCount+stringtie
toolset=star+qc (specific for bam reads QC examination)

DnaCaptureVar_Fastq

Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: bwa, novoalign
sequence realignment: abra2, gatk
variant detection: gatk, mutect, mutect2, vardict, lofreq, strelka2, freebayes, scalpel
CNV detection: sequenza, exomecnv
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard

Popular toolset options:
toolset=bwa+abra_realign+picard+vardict+mutect2+qc
toolset=novoalign+gatk_realign+picard+strelka2+snpsift+qc
toolset=bwa+abra_realign+picard+qc (specific for bam reads QC examination)

DnaAmpliconVar_Fastq

Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: bwa, novoalign
sequence realignment: abra2, gatk
variant detection: gatk, mutect, mutect2, vardict, lofreq, strelka2, freebayes, scalpel
CNV detection: sequenza, exomecnv
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard

Popular toolset options:
toolset= novoalign+abra_realign+picard+vardict+mutect1+snpsift
toolset= bwa+abra_realign+picard+vardict+strelka2+snpsift
toolset=bwa+abra_realign+picard+qc (specific for bam reads QC examination)

RnaCaptureVar_Fastq

Available tools for each analytic step:
mouse sequence detection: xenome
sequence trimming: trimmomatic, seqpurge
sequence alignment: star, hisat2
sequence realignment: abra2, gatk
variant detection: gatk, mutect, vardict, lofreq, strelka2, freebayes, scalpel
variant annotation: snpsift (associate with transvar), oncotation
qc: qc
data processing: samtools, picard

Popular toolset options: toolset=star+abra_realign+picard+vardict+qc
toolset=star+gatk_realign+picard+strelka2+snpsift+qc
toolset=star+abra_realign+picard+qc (specific for bam reads QC examination)

scRnaExpresson_Fastq

Available tools for each analytic step:
sequence alignment and analysis: count
doublelet detection: doubletdetection, scrublet
qc: qc
data processing: samtools, picard, python, Rscript

Popular toolset options:
toolset=count
toolset=count+qc
toolset=count+doubletdetection
toolset=count+scrublet
toolset=count+doubletdetection+scrublet