Skip to content

PubChem fingerprint implementation contains bad algorithm for features #116 to #255; and bad implementation of feature #60 #23

@ZlatomirTodorov

Description

@ZlatomirTodorov

Notes on PubChemFingerprints.py module

Issue in SMARTS implementation:
Feature #60 is a SMARTS for detection of chemical element cobalt (Co)
The PyBioMed implemented entry of smartsPatts dict is: 60:('[CO]', 0),
The implementation should be: 60:('[Co]', 0),

Issues on ring detection algorithm fatures:
Application case 1:
When applied on a "C=CCC1CCCC1" the features hit are:
144 >= 1 any ring size 5
145 >= 1 saturated or aromatic carbon-only ring size 5
146 >= 1 saturated or aromatic nitrogen-containing ring size 5
147 >= 1 saturated or aromatic heteroatom-containing ring size 5
Discussion on obtained fingerprint in Application case 1:
Hit no 144 is correct.
Hit on 145 is correct but this is by chance and the algorithm in func_2 is wrong (see. Proposed fixes)
Hit on 146 is wrong, the algorithm in func_3 is wrong (see. Proposed fixes)
Hit on 147 is wrong, the algorithm in func_4 is wrong (see. Proposed fixes)

Proposed fixes:
func_2(mol,bits):
The feature description: saturated or aromatic carbon-only ring
The PyBioMed algorithm test: (saturated) OR (aromatic AND carbon-only)
The algorithm test should be: (saturated AND carbon-only) OR (aromatic AND carbon-only) # or some other equivalent

func_3(mol,bits):
The feature description: saturated or aromatic nitrogen-containing
The PyBioMed algorithm test: (saturated) OR (aromatic AND nitrogen-containing)
The algorithm test should be: (saturated AND nitrogen-containing) OR (aromatic AND nitrogen-containing) # or some other equivalent

func_4(mol,bits):
The feature description: saturated or aromatic heteroatom-containing
The PyBioMed algorithm test: (saturated) OR (aromatic AND heteroatom-containing)
The algorithm test should be: (saturated AND heteroatom-containing) OR (aromatic AND heteroatom-containing) # or some other equivalent

PS: if anyone is interested in corrected implementation of these defs, provide me with contact information and I'll be happy to share my code with you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions