Skip to content
This repository was archived by the owner on Nov 28, 2020. It is now read-only.

GenotypeQuality

agaszmurlo edited this page Dec 14, 2018 · 4 revisions

How the HaplotypeCaller's reference confidence model works:

https://software.broadinstitute.org/gatk/documentation/article.php?id=4042

GenotypeLikelihoods:

https://software.broadinstitute.org/gatk/documentation/article.php?id=4442

What is gVCF:

https://software.broadinstitute.org/gatk/documentation/article.php?id=4017

TODO:

  • check out avocado and guacamole project
  • check out GATK spark functionality

AVOCADO: For highest accuracy, Avocado is run as a two phase tool. In the first phase, we reassemble or realign our reads around INDEL variants. In the second phase, we apply a probabilistic model built around a biallelic model to the reads to identify variants.

Our approach does not rely on the input reads being sorted, and as such, is not unduly impacted by variations in coverage across the genome. This point is critical in a parallel approach, as coverage can vary dramatically across the genome

We then use Apache Spark’s reduceByKey functionality to compute the number of times each variant was observed with high quality. We do this to discard sequence variants that were observed in a read that represent a sequencing error, and not a true variant. (czemu od razu nie odfiltrowuja takich readow?) https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-204.pdf [chapter 7]

Clone this wiki locally