Skip to content

Vectorize/Parallelize the clustering algorithm for the fit results  #44

@RodenLuo

Description

@RodenLuo

Hi Tom @tomgoddard,

DiffFit has been calling the same clustering algorithm as the fitmap command in ChimeraX. The algorithm relies on the Binned_Transforms class and its close_transforms and add_transform methods from chimerax.geometry.bins. This step is observed to be a rate-limiting step in quite some cases and I'm tempted to speed it up.

Before I knew the built-in chimerax.geometry.bins, I once implemented an algorithm to approximately solve it:

  1. Cluster the shifts (x, y, z)
  2. For each cluster from step 1, use the quaternions to transform two unit orthogonal vectors and get a 6-dimension vector (v1_x, v1_y, v1_z, v2_x, v2_y, v2_z,)
  3. For each cluster from step 1, cluster all the 6-dimension vectors

It was buggy and generated more clusters than chimerax.geometry.bins. Often, two clusters were actually within the threshold and thus should be just one cluster. (I unfortunately could not figure out why...)

I wonder if you have any idea how to speed up the clustering. Ideally, the required computation should be vectorized so that it can be done on PyTorch. Or at least, it should be possible to parallelize it through, for example, OpenMP.

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions