Releases: MorrellLAB/sequence_handling
Release v3.0.0: SNP calling with GATK 4.1 includes Slurm compatibility
This release includes the following changes.
Slurm workload manager is supported for all handlers.
GATK v4.1.2 on the Slurm queueing system is supported for the following handlers:
- Haplotype_Caller
- Added Genomic_DB_Import handler (this combines GVCF files prior to running Genotype_GVCFs handler)
- Genotype_GVCF
- Create_HC_Subset (preparation steps for GATK Variant Recalibrator)
- Variant_Recalibrator
GATK v4.1.2 on non-PBS queueing systems is supported for the following handlers:
- Haplotype_Caller
- Genotype_GVCF
- Variant_Filtering
Additional changes:
- VCF annotation visualization to assist filtering has also been added.
- Jupyter Notebook template for exploring VCF files prior to variant recalibration/filtering steps is now available in the
HelperScriptsdirectory - Realigner_Target_Creator and Indel_Realigner handlers have been separated from the main pipeline because the functionality is only available in GATK 3 or earlier and we still need indel realignment for other downstream tools. Please fill out
Config_Indel_Realignfor indel realignment steps. - Main
Configfile has been updated accordingly with updates to handlers. A few new variables have been added. - Haplotype_Caller, Genomics_DB_Import, and Genotype_GVCFs now handle parallelizing across regions using job arrays.
- This version allows you to re-run specific job array numbers with an optional
-t custom_array_indicesargument from the command line (instead of having to re-create your sample list for failed/aborted jobs). So you can now run it like this:
./sequence_handling SAM_Processing /path/to/config -t 1-5,10,12Without the -t flag, by default runs all samples in your list. So you can still run sequence_handling like this: ./sequence_handling SAM_Processing /path/to/config
This will work for any handler that utilizes job arrays.
- Create_HC_Subset can now handle very large VCF files (>1TB vcf files) in a reasonable manner
- Variant_Recalibrator now has additional features:
- Can specify recalibration "mode" to recalibrate both indels and snps, indels only, or snps only
- Allows specification of a custom set of annotations in the config file
- Allows specification of additional options/flags to include
- Allows more control over setting resource datasets as known, training, or truth sets
- Automatically indexes raw vcf file and resource files if they are not already indexed
Release v2.1.0: Last supported GATK 3.8 version.
This is the most complete version to use with GATK 3.8.
Release v2.0: SNP calling with GATK 3.8
The sequence_handling wiki is fully up to date with this release.
This release adds the following handlers:
- Haplotype_Caller
- Genotype_GVCFs
- Create_HC_Subset
- Variant_Recalibrator
- Variant_Filtering
- Variant_Analysis
- Realigner_Target_Creator
- Indel_Realigner
10x Genomics linked reads and Nanopore long reads processing support is planned for future versions.
Release v1.0: FastQ to BAM pipeline
The sequence_handling wiki is fully up to date with this release.
This release includes the following functional handlers:
- Quality_Assessment
- Adapter_Trimming
- Quality_Trimming
- Read_Mapping
- SAM_Processing
- Coverage_Mapping
This release also includes nonfunctional code for the following:
- GBS_Demultiplexing
- Coverage_Mapping plots with R