-
Notifications
You must be signed in to change notification settings - Fork 2
segfault with high depth samples #6
Description
Hi,
Thanks for your tool. We've been using it successfully for a while now. However, now we have started to sequence samples with higher depth (~400k merged read pairs), and Rbec seems to fail in two distinct ways with these samples. As a test, I downsampled this file in 100k increments, resulting in files with 100k, 200k, 300k, and 400k amplicons. Rbec runs fine until the 300k sample, where I get the error message:
Error in toupper(seqs) : invalid input 'T<AF>U' in 'utf8towcs'
Calls: Rbec -> consis_err -> toupper
Running the 400k sample, I get the error message:
*** caught segfault ***
address 0x7f708b7d513f, cause 'invalid permissions'
Traceback:
1: vroom_(file, delim = "\001", col_names = "V1", col_types = cols(col_character()), id = NULL, skip = skip, col_select = col_select, name_repair = "minimal", na = na, quote = "", trim_ws = FALSE, escape_double = FALSE, escape_backslash = FALSE, comment = "", skip_empty_rows = skip_empty_rows, locale = locale, guess_max = 0, n_max = n_max, altrep = vroom_altrep(altrep), num_threads = num_threads, progress = progress)
The errors have been rather cryptic, but I think this seems to happen in the "calculation of error generating probabilities" step.
One workaround for others encountering a similar error is to downsample your amplicons when you start seeing these kinds of errors. From my test you can downsample to at least 200k merged amplicons - maybe a bit higher if you look into it. The ceiling is somewhere between 200k and 300k.
Happy to share problematic files if it helps.
Thanks!
-shane