-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Hi,
We use somaticseq to just merge variants from Mutect2 and HMF Tools SAGE (the latter as "arbitrary" vcf's), the classification module is not used currently. However we were missing some multi-nucleotide variants (MNVs) in the somaticseq output, so I looked into the somaticseq code for how they are handled. I found that it seems any variants in input vcf's with both REF and ALT with length >1 base are ignored.
I see the following division into SNVs or indels, both in modify_ssMuTect2.py and splitVcf.py (for preparation of arbitrary vcf's):
if len(vcf_i.refbase) == 1 and len(vcf_i.altbase) == 1:
snv_out.write( new_line + '\n' )
elif len(vcf_i.refbase) == 1 or len(vcf_i.altbase) == 1:
indel_out.write( new_line + '\n' )And any other variants, i.e len(vcf_i.refbase) > 1 and len(vcf_i.altbase) > 1, will be skipped.
Is it a correct observation that MNVs and complex variants are ignored? What was the reasoning behind setting it up like this? Is there any way to go around it?
We do not want to miss these types of variants, and have to look into other tools if we can't avoid this behaivour with somaticseq.
Best regards
Rebecka