-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Speeding up hierarchical clustering
The python version of hierarchical clustering could potentially be faster. The code calculates the similarity in max_indices with gen_sim_dict. We are calculating all the similarity indices in the dictionary with the default k=1 value. If I am not mistaken, calculate_isim calculates the same for RR, JT, SM indices as gen_sim_dict in this case.
Proposed change
replace line 67 in iSIM/iSIM/clustering.py with:
s = calculate_isim(data=fp1+fp2, n_objects=n, n_ary=n_ary)
Initial timing results
This is for 50 fingerprints (size 2048):
gen_sim_dicttook 5.3 secondscalculate_isimtook 1.2 seconds
Potential issues
We would not be able to do hierarchical clustering with the other similarities in gen_sim_dict and/or try different k values.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request