Skip to content

Proposed Pipeline #7

@averagehat

Description

@averagehat

Inputs:

  • 0-x unpaired fastq/sff files
  • 0-y pairs of illumina paired fastq files
  • [0-z pacbio files]

Transformation:

  • sff -> fastq

Filter:

  • drop reads with Ns
  • drop illumina reads where the index has a smallest quality score lower than some minimum
  • De novo: run host_filter
  • Note: individual files may be lost or empty after this process

Transformation (trimming):

  • cut 3' and 5' adapters -a, -A, -g, -G
  • cut 3' and 5' primers
  • trim all N's from end of reads --trim-n
  • trim low quality bases from 5' and 3' -q <five>,<three> or -q <five>
  • remove X bases from beginning/end of reads -u <X>
  • [run something on pacbio reads separately]

Reduction (mapping/assembly): -- This is where denovo/mapping branch

  • Mapping:
    • compile paired, unpaired reads
    • run bwa on compiled reads, separaetly
    • [run bwa/other mapper on pacbio reads, then merge bam files]
    • merge paired and unpaired bam files, if they exist
    • sort, index bam file
    • tag the bam file via tagreads
    • run freebayes on the bam file
    • create a consensus fasta
  • DeNovo:
  • compile paired and unpaired (?)
  • run a De Novo assembler (sga and spades are both in bioconda, but not ray)
  • try to figure out the number of reads for each contig
  • maybe simulate iterative_blast

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions