Skip to content

Diversity picker bug #1

@stefdoerr

Description

@stefdoerr

There is a bug in the diversity picker which practically just returns random conformations.

dm = scispc.distance.pdist(aevs, 'sqeuclidean')
picker = rdSimDivPickers.MaxMinPicker()
seed_list = [i for i in range(Ngen)]
np.random.shuffle(seed_list)
ids = list(picker.Pick(dm, Ngen, Nkep, firstPicks=list(seed_list[0:5])))

Here you pass the results of pdist to MaxMinPicker. However if you read the RDKit blogpost on that class you will see the funny paragraph

Note that there are some examples of using this approach floating around on the web that calculate the distance matrix in the wrong order. I've done my best to find and either fix or remove them, but there are no doubt still some bad ones out there.

If you look at their code they pass the lower triangular matrix by iterating:

for i in range(1, N):
   for j in range(0, i):
      dists.append(d[i, j])

while if you read the pdist documentation it returns them in order i < j < m so in this form

for i in range(N):
   for j in range(i+1, N):
      dists.append(d[i, j])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions