-
Notifications
You must be signed in to change notification settings - Fork 22
allpairs output not congruent with sphere clustering output? #13
Description
Hi Eduard,
I have run the latest starcode version from the allpairs branch on an input file of distinct sequences (same as referenced in "cluster size" bug report) with the command line:
starcode -d8 -t22 -i sort_uniqed_reads.txt -o starcode_edges.out
I count 96,071 distinct sequences (nodes) in the output file. There are 134,492 distinct sequences in the input file. So I infer from that 38,421 (134492 - 96,071) sequences are singletons, i. e. have no tau-match with any other sequence in the input.
To check that, I looked into the output from a run with the newest (i. e. cluster size corrected) version of starcode from the master branch created with the following command:
starcode -d8 -t22 -s --print-clusters -i sort_uniqed_reads.txt -o starcode.out
I assume that with sphere clustering, any sequence that does not have a tau-match will be put in a cluster on its own. Counting the number of "clusters" with only one sequence I get 41,908. Shouldn't this rather be 38,421?
many thanks for your help,
claudius