Tutorials and scripts for plethodontid capture kit

Warning

Manuscript in review. Some of this might be out of date - I'll update this when I have time.

Tutorials and scripts for plethodontid capture kit

The scripts and tutorials here are intended to work with the DNA capture bait kit I designed for Desmognathus salamanders (family: Plethodontidae). The capture kit itself works with varying degrees of success with other genera I’ve tested it against within Plethodontidae (Plethodon, Eurycea, and Gyrinophilus). The scripts and analyses here should work for any capture kit or similar type of data. I’ll be adding more scripts and analyses in the near future.

How the baits were built

The baits were built from using loci from ddRAD, primarily from Desmognathus fuscus and D. quadramaculatus (northern and southern lineages). The baits are 80 bp long with an overlap of 50% and are intended to map back to reference sequences that are about 300 bp long. The included test data was run on the original kit; I went back afterwards to remove baits that weren’t mapping successfully. The final kit contains:

9,756 loci
28,256 baits

Lab work

Sort-of tutorial on how to get raw DNA ready for sequencing. Includes list of reagents and recommendations for changes to standard protocols.

Mapping capture reads

Script and tutorial for taking raw reads through to clean bam files ready for variant calling.

Quality control and read mapping statistics

Tutorial for producing read mapping statistics and thinking about quality control measures to implement.

Variant calling and VCF output

Script and tutorial for producing a single or multi-sample VCF. How to use whitelist, blacklists, and other tools to implement further quality control measures on the output. Also, how to get quick statistics from a VCF.

Automated fastq to VCF shell script

Script starts with a directory of trimmed (paired) fastqs and outputs multi-sample VCF for all the samples in the directory with one random SNP per locus. Determines max depth based on average depth across all sites, removes loci with indels, keeps only sites where 50% or more of individuals have data. Requires: bwa, samtools, picard, bedtools, bcftools, vcftools.

To do:

Intersecting bed files to create a whitelist of loci and positions
Output a fasta file with variants for phylogenetic analyses
Check for linkage between loci/finish mapping to transcriptome
Create single output file for mapping statistics (from individual flagstat files)
Optimize for parallelizing on computing cluster

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
images		images
reference		reference
scripts		scripts
README.md		README.md
lab_work.md		lab_work.md
quality_control_statistics.md		quality_control_statistics.md
read_mapping.md		read_mapping.md
vcf_variant_calling.md		vcf_variant_calling.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tutorials and scripts for plethodontid capture kit

How the baits were built

Lab work

Mapping capture reads

Quality control and read mapping statistics

Variant calling and VCF output

Automated fastq to VCF shell script

To do:

About

Uh oh!

Releases

Packages

Languages

karajones/capture

Folders and files

Latest commit

History

Repository files navigation

Tutorials and scripts for plethodontid capture kit

How the baits were built

Lab work

Mapping capture reads

Quality control and read mapping statistics

Variant calling and VCF output

Automated fastq to VCF shell script

To do:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages