Skip to content

dimorphite_dl.protonate_smiles returns variants with inconsistent order #11

@haldami

Description

@haldami

First of all thank you for the great job you are doing with dimorphite_dl! It is simple to use and configure as well as to incorporate it to custom Python workflows!

My issue is about the order in which dimorphite_dl returns the protonated molecule variants. I noticed this when I run the dimorphite_dl from python environment with a molecule of methotrexate:

import dimorphite_dl

smiles = 'CN(Cc1cnc2[nH+]c(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1'
smiles_variants = dimorphite_dl.protonate_smiles(
    smiles, ph_min=6.0, ph_max=9.0, max_variants=512
)
with open('MT1.smi', "w") as f:
    for i, smi in enumerate(smiles_variants, 1):
        f.write(f"{smi} {i}\n")
    f.write(f"{smiles} orig\n")

I rerun it twice and the result was in a different order. The individual variants together were, however, same. When I sorted them (simply using smiles_variants.sorted()), the output files were exactly the same.

I use dimorphite_dl v2.0.2

Is this a feature or a bug? What is the reason of this behavior?

Thank you very much for your answer and for your ongoing job!

P.S.: Since the output files are a bit long, I'll paste here 10 first lines from each try.

Try no. 1:

C[NH+](Cc1c[nH+]c2nc(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 1
C[NH+](Cc1c[nH+]c2[nH+]c(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 2
C[NH+](Cc1c[nH+]c2[nH+]c(N)[nH+]c(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 3
CN(Cc1c[nH+]c2nc(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 4
CN(Cc1cnc2[nH+]c(N)nc(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 5
CN(Cc1cnc2nc(N)[nH+]c(N)c2[nH+]1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 6
CN(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 7
CN(Cc1c[nH+]c2nc(N)[nH+]c(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 8
C[NH+](Cc1c[nH+]c2nc(N)nc(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 9
C[NH+](Cc1cnc2nc(N)nc(N)c2[nH+]1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 10

Try no. 2:

CN(Cc1cnc2nc(N)[nH+]c(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 1
C[NH+](Cc1c[nH+]c2nc(N)nc(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 2
C[NH+](Cc1c[nH+]c2[nH+]c(N)nc(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 3
C[NH+](Cc1c[nH+]c2[nH+]c(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 4
CN(Cc1cnc2[nH+]c(N)nc(N)c2[nH+]1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 5
C[NH+](Cc1cnc2nc(N)nc(N)c2[nH+]1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 6
C[NH+](Cc1cnc2nc(N)[nH+]c(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 7
C[NH+](Cc1c[nH+]c2nc(N)[nH+]c(N)c2[nH+]1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 8
C[NH+](Cc1c[nH+]c2nc(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 9
C[NH+](Cc1cnc2[nH+]c(N)nc(N)c2n1)c1ccc(C(=O)[N-][C@@H](CCC(=O)[O-])C(=O)[O-])cc1 10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions