The PrecisionCallerPipeline (PCP)

The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.

Prerequisites

We use a workflow based on Snakemake in a Linux-based system with:

Awk, for SAM file editing;
BEDTools, for BAM to FASTQ conversion;
BWA-MEM, for read alignment;
Pycision, for amplicon delimitation and selection;
RtN!, for NUMT removal;
SAMtools, for BAM conversion, sorting, indexing, and merging;
Trimmomatic, for read quality control and trimming.

Installation

Install the software above and clone this repo to your directory of choice:

git clone https://github.com/filcfig/PCP.git

Add pycision.py, trimmomatic-0.39.jar, and the RtN folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa) to the tools folder.

Usage

Start by adding the FASTQ files to the sequencing/selected_fastqfiles folder. Then, make run_FASTQ.sh executable and run it (make sure Snakemake is activated - if you use conda, type conda activate snakemake):

chmod +x run_FASTQ.sh
./run_FASTQ.sh

Since running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN, by running Snakefile_noRtN instead:

snakemake -s Snakefile_noRtN -j

The final BAM files will be available at the sequencing/merged folder.

Data

The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.

Citation

Our manuscript is published at:

Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
sequencing		sequencing
tools		tools
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Snakefile_1		Snakefile_1
Snakefile_2		Snakefile_2
Snakefile_noRtN		Snakefile_noRtN
run_FASTQ.sh		run_FASTQ.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The PrecisionCallerPipeline (PCP)

Prerequisites

Installation

Usage

Data

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

filcfig/PCP

Folders and files

Latest commit

History

Repository files navigation

The PrecisionCallerPipeline (PCP)

Prerequisites

Installation

Usage

Data

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages