Skip to content

marcbux/variant-call

 
 

Repository files navigation

Variant Calling

A variant calling workflow.

Variant calling is a typical use case from the field of Next-Generation Sequencing. The workflow takes as input a genetic sample from a person and determines differences with respect to a reference human genome. Variations that have been identified to be significant are summarized in a table and annotated.

First, the sample in FastQ format undergoes quality control. This step allows an a priori assessment of the suitability of the sample. A low read quality or the presence of barcodes or adapter sequences may reduce the validity of the workflow results. Quality control with FastQC helps identifying and precluding these error sources.

The computationally most demanding step is aligning each read in the sample to the human reference genome. This step is performed by the read mapper Bowtie2. The tool creates an index over each chromosome of the reference genome provided in FastA format. Then for each reference index and each sample file in FastQ format, read mapping is performed. The result is a table in BAM format, stating for each mapped read the location it aligns to in the reference genome.

The BAM files are sorted individually and then merged in a way that only one BAM file per chromosome results. Then a multiple pile-up table is created for each of these BAM files. SAMtools is used to perform these transformations. The multiple pile-up is the input to the actual variant detection algorithm.

Variant detection is performed using VarScan which consumes a multiple pile-up and produces a variant table in VCF format. A significance level of 1% is applied to identify variants.

Eventually, the variant table is annotated using Annovar which classifies variants according to their supposed effect and whether they appear in coding or non-coding regions of the human genome.

The workflow is specified using the Cuneiform functional workflow language.

This repository has the form of a Chef cookbook to ensure reproducibility of the workflow itself as well as data and tools.

Prerequisites

Install the following packages:

  • virtualbox
  • vagrant

The Chef DK can be downloaded from the Chef download page. To install it enter on the command line

sudo dpkg -i chefdk_*.deb

Initialize host machine

To build the VM specified in this cookbook for the first time, change your git base directory and enter the following:

git clone https://github.com/joergen7/variant-call.git
cd variant-call
berks install
kitchen converge

You can log into the newly built VM by entering

kitchen login

You can drop the VM by entering

kitchen destroy

Workflow execution

Log into the test machine by typing

kitchen login

Now, execute the workflow script by entering

cuneiform -w /opt/data /opt/wf/variant-call.cf

About

A variant calling workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 60.5%
  • HTML 39.5%