Containerized VCF Quality Control Pipeline using PLINK and VCFtools
This containerized QC pipeline performs comprehensive quality control analysis on VCF files including relatedness analysis, gender checking, IBD analysis, and homozygosity detection.
docker build -t quality-control .docker run -v $(pwd):/data quality-control QC --input_vcf /data/input.vcf.gzdocker run \
-v /home/user/vcf_files:/input:ro \
-v /home/user/qc_results:/output \
quality-control QC \
--input_vcf /input/sample.vcf.gz \
--output_directory /output \
--vcf_prefix SampleBatchThe QC script (Scaffold_QC.sh) accepts the following parameters:
| Flag | Required | Default | Description |
|---|---|---|---|
--input_vcf |
Yes | - | Path to input VCF file (must be gzipped) |
--output_directory |
No | . |
Output directory for results |
--vcf_prefix |
No | MyBatchPrefix |
Prefix for output files |
QC --input_vcf /data/samples.vcf.gzQC --input_vcf /data/samples.vcf.gz --output_directory /results --vcf_prefix PopulationStudyThe pipeline generates multiple analysis files:
- Relatedness:
.relatednessand.relatedness2files - Gender Check:
.sexand.sex2files - IBD Analysis:
.IBD.genomefile - Heterozygosity:
.HET.hetfile - Inbreeding Coefficient:
.IBC.ibcfile - Homozygosity:
.HOM.homfiles - Cleaned VCF:
.C.VCF.vcf.gzwith quality filters applied
The container includes:
- PLINK (v1.90b6.21)
- VCFtools (v0.1.16)
- HTSlib/SAMtools/BCFtools (v1.22)
- Input VCF file must be bgzip compressed (
.vcf.gz) - Sufficient disk space for intermediate and output files
- Docker with appropriate permissions for volume mounts