-
Notifications
You must be signed in to change notification settings - Fork 36
Description
The GraphAligner 1.0.20 release fixed several assertion failure errors (warnings) and a crash which I was getting with the previous release, but I can still get it to fail an assertion. Based on the crashing behavior after assertion failures with the previous release, I'm not sure it's really safe to continue when assertion failures come up, so I'm interested in getting it fixed.
I ran:
/usr/bin/time -v singularity run -B /private:/private docker://quay.io/biocontainers/graphaligner:1.0.20--h06902ac_0 GraphAligner -t 62 -g /private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v2.0-mc-chm13-eval.d46.gfa -f /private/groups/patenlab/anovak/projects/hprc/lr-giraffe/reads/real/r10y2025/HG002/HG002_PAW70337.full.fq.gz --seeds-mxm-length 30 --seeds-mem-count 10000 --bandwidth 15 --multimap-score-fraction 0.99 --precise-clipping 0.85 --min-alignment-score 100 --clip-ambiguous-ends 100 --overlap-incompatible-cutoff 0.15 --max-trace-count 5 --mem-index-no-wavelet-tree -a ./output/graphaligner.gam 2>&1 | tee ./output/graphaligner.log
(Most of the parameters came from the suggestion to my collaborator @xchang1 in #106 (comment). I'm using Singularity and Biocontainers here because I'm not certain about having the required licences to use Conda.)
I let it run overnight and I got this:
INFO: Using cached SIF image
GraphAligner bioconda 1.0.20-
GraphAligner bioconda 1.0.20-
Load graph from /private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v2.0-mc-chm13-eval.d46.gfa
Build MUM/MEM seeder from the graph
Build alignment graph
MEM seeds, min length 30, max count 10000
Seed cluster size 1
Extend up to 5 seed clusters
Alignment bandwidth 15
Clip alignment ends with identity < 85%
X-drop DP score cutoff 33333
Backtrace from 5 highest scoring local maxima per cluster
write alignments to ./output/graphaligner.gam
Align
src/GraphAlignerBitvectorCommon.h:1134: Assertion 'previous.node(neighbor).endSlice.scoreEnd >= scoreHere-(eq?0:1)' failed. Read: 14e0a73d-736e-4c42-abac-dbfd965f07d6. Seed: 0+,0,0,0
At that point I stopped the run to report the issue.
I've uploaded the GFA file (gzip-compressed) and the offending read as a single-read FASTQ to:
https://public.gi.ucsc.edu/~anovak/outbox/tracks/big/graphaligner_assert/
Our security certificate expired last week, but it should hopefully be renewed soon.
I'm working with 2025-era R10 reads, and a prototype Human Pangenome Reference Consortium v2.0 graph.