Skip to content

Handling ‘*’ Alleles in MSMC2 Input Files #58

@GNWEN

Description

@GNWEN

Hello @stschiff ,

First of all, thank you for developing and maintaining such a great tool!

I have a question regarding how MSMC2 handles the ‘*’ allele in input files. My VCF file is original generated from the GATK pipeline and contains some ‘*’ alleles, as shown in the example below:

Chr01    126405    2855    C*  
Chr01    126850    233    C*  
Chr01    128017    449    TC  
Chr01    128686    540    GA  
Chr01    128723    37    A*  
Chr01    129065    35    AT  

In the context of GATK's HaplotypeCaller, the ‘*’ allele represents a spanning deletion, indicating that a deletion in one sample overlaps a variant site in another sample. This notation is used to maintain consistency in variant calls across multiple samples.

I am wondering how MSMC2 processes such sites. In my case, when running MSMC2 on individual samples, those with ‘*’ alleles in their input seem to run normally and produce results. However, for samples where the input starts with a ‘*’ allele (as the example I gave above), the program fails with the following error message:

error in parsing command line: object.Exception@model/data.d(57): could not parse line: Chr01      126405  2855    C*
----------------
??:? pure @safe void std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, const(char[])) [0x52ec51]
??:? @safe std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch std.exception.enforce!(Exception, std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch).enforce(std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch, lazy const(char)[], immutable(char)[], ulong) [0x548729]
??:? void model.data.checkDataLine(const(char[])) [0x51f3d2]
??:? ulong model.data.getNrHaplotypesFromFile(immutable(char)[]) [0x51f4f0]
??:? void msmc2.parseCommandLine(immutable(char)[][]) [0x572d41]
??:? _Dmain [0x5728d3]
??:? _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv [0x5cbc56]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x5cbbac]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll() [0x5cbc12]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x5cbbac]
??:? _d_run_main [0x5cbb09]
??:? main [0x57bbc5]
??:? __libc_start_main [0xdb524554]

Could you please advise on the correct way to handle these ‘*’ alleles when preparing input files for MSMC2? Any guidance would be greatly appreciated!

Best regards,
Guannan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions