Skip to content

Truncated output in case of very long reads (probably buffer overflow) #23

@karel-brinda

Description

@karel-brinda

it seems that ProPhex still keep truncating the output in case of very long reads (e.g., ~250kbp). As a result, ProPhyle reports errors and doesn't finish computation.

Here's an example:

$ prophex query -u -k 18 index.fa too_long_read.fq
klcp_loading	0.01s
U	1550652309_EX62d3b6e97_RDd732c4de_CH118	0	272044	0:36 GCGS0288-up1:25 0:19 GCGS0288-up1:15 0:19 
.
.
.
 0:16 GCGS0006,GCGS0034,GCGS0036,GCGS0052,GCGS0082,GCGS0112,GCGS0152,GCGS0196,GCGS0248,GCGS0254,GCGS0288,GCGS0292,GCGS0306,GCGS0358,GCGS0377,GCGS0385,GCGS0417,GCGS0423,GCGS0444,GCGS0474,GCGS0491,GCGS0496,GCGS0507,GCGS0511,GCGS0513,G
[prophex:query] match time: 8.28 sec
[prophex::query] Processed 1 reads in 8.193 CPU sec, 8.281 real sec

@salikhov-kamil Do you have any idea about what's going wrong?

In this case, the length of the field is 4,499,072 characters (4,499,123 for the entire line). Is it possible that there's a hard-coded limit on the string size?

As a quick and dirty solution, ProPhex could detect a buffer overflow and print the results in a parsable way – i.e., find last full correct block (in this case 0:16 ) and then add one block with the remaining bases, marked as unclassified.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions