The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.
We use a workflow based on Snakemake in a Linux-based system with:
- Awk, for SAM file editing;
- BEDTools, for BAM to FASTQ conversion;
- BWA-MEM, for read alignment;
- Pycision, for amplicon delimitation and selection;
- RtN!, for NUMT removal;
- SAMtools, for BAM conversion, sorting, indexing, and merging;
- Trimmomatic, for read quality control and trimming.
Install the software above and clone this repo to your directory of choice:
git clone https://github.com/filcfig/PCP.gitAdd pycision.py, trimmomatic-0.39.jar, and the RtN folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa) to the tools folder.
Start by adding the FASTQ files to the sequencing/selected_fastqfiles folder. Then, make run_FASTQ.sh executable and run it (make sure Snakemake is activated - if you use conda, type conda activate snakemake):
chmod +x run_FASTQ.sh
./run_FASTQ.shSince running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN, by running Snakefile_noRtN instead:
snakemake -s Snakefile_noRtN -jThe final BAM files will be available at the sequencing/merged folder.
The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.
Our manuscript is published at:
Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.
Distributed under the MIT License. See LICENSE for more information.