Morphling is a model-free ultra fast genome structural variants detection tool. It also provides a machine learning based method to classify SVs based on their mutational signature sequential features as well as providing scoring method for complex SVs.
Morphling requires Java JDK (>=v1.6) and Python (2.7) to run.
- htsjdk(https://github.com/samtools/htsjdk): A Java API for high-throughput sequencing data (HTS) formats.
- Numpy: used in Python script for BAM parameter estimation.
$ git clone https://github.com/jiadong324/Morphling.gitTo run the Morphling, first run bamConfig.py to get BAM configuration file. To run it, you have to specify how much standard deviation (-X) away are considered as discordant insert size and number of read-pairs (-N) you would like to use for the estimation. You will get bam.cfg file contains the read length, estimated library average insert size and standard deviation.
$ samtools view your.bam | python bamConfig.py -X 3 -N 30000To get help info
$ java /path/to/Morphling/dist/MorphReleaseV1.jar
Run mode one example: run with BAM file, and your can either output all SuperItems to file or keep it in the memory. It is suggested to keep them in file, since you won't need to go through the BAM file agian for next run with different parameters.
$ java -jar /path/to/Morphling/dist/MorphRealeaseV1.jar bamFile=file.bam faFile=file.fa bamCfg=bam.cfg
Run mode two example: run on SuperItem files directly without BAM file
$ java -jar /path/to/Morphling/dist/MorphRealeaseV1.jar faFile=file.fa itemOut=item.txt
The SV output file contains predicted SV position on the genome. Additional information includes SupType, Pattern, Region (genome region spanned by pattern), weights, ratio (allele fraction of each Super-Item), orientation (orientation of reads in Super-Item). A single SV can be supported by more than one evidence, more evidence indicates more confident calls.
- SupType=ARP_Span: indicates SV is combined by two patterns that is able to link together through read-pair. Each pattern of the SV might be a breakpoint. Number of read pairs support such relation is provided.
- SupType=Self: a pattern is self-linked through read pairs. Then we estimate potential breakpoint based on abnormal read pairs. Number, quality and weight of these supporting read pairs is provided.
- SupType=Split: indicates SV is discovered based on split alignment. We provide additional information, such as number of split read support, split read mapping quality.
- SupType=Cross: indicates SV is discovered based on local sequence cross links. Additional information includes number of reads support the cross, the maximum cross matched sequence length.
- SupType=Realign: for region with multiple clipped Super-Items, we usually do realignment, this helps discover INDELS and small SVs. Information includes minus and plus strand support read is provided.
- SupType=OEM: one-end-unmapped reads formed cluster may indicate potential insertion breakpoint near OEM Super-Item. This is not a very confident evidence, but we report such abnormal.
Todo...
If you have questions or encouter problems, please feel free to contact: jiadong324@gmail.com, ccxtbut@gmail.com.
MIT
Free Software, Hell Yeah!