Skip to content

allpairs output not congruent with sphere clustering output? #13

@claudiuskerth

Description

@claudiuskerth

Hi Eduard,

I have run the latest starcode version from the allpairs branch on an input file of distinct sequences (same as referenced in "cluster size" bug report) with the command line:

starcode -d8 -t22 -i sort_uniqed_reads.txt -o starcode_edges.out

I count 96,071 distinct sequences (nodes) in the output file. There are 134,492 distinct sequences in the input file. So I infer from that 38,421 (134492 - 96,071) sequences are singletons, i. e. have no tau-match with any other sequence in the input.

To check that, I looked into the output from a run with the newest (i. e. cluster size corrected) version of starcode from the master branch created with the following command:

starcode -d8 -t22 -s --print-clusters -i sort_uniqed_reads.txt -o starcode.out

I assume that with sphere clustering, any sequence that does not have a tau-match will be put in a cluster on its own. Counting the number of "clusters" with only one sequence I get 41,908. Shouldn't this rather be 38,421?

many thanks for your help,

claudius

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions