-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
it seems that ProPhex still keep truncating the output in case of very long reads (e.g., ~250kbp). As a result, ProPhyle reports errors and doesn't finish computation.
Here's an example:
$ prophex query -u -k 18 index.fa too_long_read.fq
klcp_loading 0.01s
U 1550652309_EX62d3b6e97_RDd732c4de_CH118 0 272044 0:36 GCGS0288-up1:25 0:19 GCGS0288-up1:15 0:19
.
.
.
0:16 GCGS0006,GCGS0034,GCGS0036,GCGS0052,GCGS0082,GCGS0112,GCGS0152,GCGS0196,GCGS0248,GCGS0254,GCGS0288,GCGS0292,GCGS0306,GCGS0358,GCGS0377,GCGS0385,GCGS0417,GCGS0423,GCGS0444,GCGS0474,GCGS0491,GCGS0496,GCGS0507,GCGS0511,GCGS0513,G
[prophex:query] match time: 8.28 sec
[prophex::query] Processed 1 reads in 8.193 CPU sec, 8.281 real sec
@salikhov-kamil Do you have any idea about what's going wrong?
In this case, the length of the field is 4,499,072 characters (4,499,123 for the entire line). Is it possible that there's a hard-coded limit on the string size?
As a quick and dirty solution, ProPhex could detect a buffer overflow and print the results in a parsable way – i.e., find last full correct block (in this case 0:16 ) and then add one block with the remaining bases, marked as unclassified.