Skip to content

Are multi-nucleotide and complex variants ignored? #129

@rebber

Description

@rebber

Hi,

We use somaticseq to just merge variants from Mutect2 and HMF Tools SAGE (the latter as "arbitrary" vcf's), the classification module is not used currently. However we were missing some multi-nucleotide variants (MNVs) in the somaticseq output, so I looked into the somaticseq code for how they are handled. I found that it seems any variants in input vcf's with both REF and ALT with length >1 base are ignored.

I see the following division into SNVs or indels, both in modify_ssMuTect2.py and splitVcf.py (for preparation of arbitrary vcf's):

if len(vcf_i.refbase) == 1 and len(vcf_i.altbase) == 1:
    snv_out.write( new_line + '\n' )
elif len(vcf_i.refbase) == 1 or len(vcf_i.altbase) == 1:
    indel_out.write( new_line + '\n' )

And any other variants, i.e len(vcf_i.refbase) > 1 and len(vcf_i.altbase) > 1, will be skipped.

Is it a correct observation that MNVs and complex variants are ignored? What was the reasoning behind setting it up like this? Is there any way to go around it?

We do not want to miss these types of variants, and have to look into other tools if we can't avoid this behaivour with somaticseq.

Best regards
Rebecka

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions