-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hello @stschiff ,
First of all, thank you for developing and maintaining such a great tool!
I have a question regarding how MSMC2 handles the ‘*’ allele in input files. My VCF file is original generated from the GATK pipeline and contains some ‘*’ alleles, as shown in the example below:
Chr01 126405 2855 C*
Chr01 126850 233 C*
Chr01 128017 449 TC
Chr01 128686 540 GA
Chr01 128723 37 A*
Chr01 129065 35 AT
In the context of GATK's HaplotypeCaller, the ‘*’ allele represents a spanning deletion, indicating that a deletion in one sample overlaps a variant site in another sample. This notation is used to maintain consistency in variant calls across multiple samples.
I am wondering how MSMC2 processes such sites. In my case, when running MSMC2 on individual samples, those with ‘*’ alleles in their input seem to run normally and produce results. However, for samples where the input starts with a ‘*’ allele (as the example I gave above), the program fails with the following error message:
error in parsing command line: object.Exception@model/data.d(57): could not parse line: Chr01 126405 2855 C*
----------------
??:? pure @safe void std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, const(char[])) [0x52ec51]
??:? @safe std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch std.exception.enforce!(Exception, std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch).enforce(std.regex.__T10RegexMatchTAxaS453std5regex8internal8thompson15ThompsonMatcherZ.RegexMatch, lazy const(char)[], immutable(char)[], ulong) [0x548729]
??:? void model.data.checkDataLine(const(char[])) [0x51f3d2]
??:? ulong model.data.getNrHaplotypesFromFile(immutable(char)[]) [0x51f4f0]
??:? void msmc2.parseCommandLine(immutable(char)[][]) [0x572d41]
??:? _Dmain [0x5728d3]
??:? _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv [0x5cbc56]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x5cbbac]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll() [0x5cbc12]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x5cbbac]
??:? _d_run_main [0x5cbb09]
??:? main [0x57bbc5]
??:? __libc_start_main [0xdb524554]
Could you please advise on the correct way to handle these ‘*’ alleles when preparing input files for MSMC2? Any guidance would be greatly appreciated!
Best regards,
Guannan