Skip to content

Mismatching pattern due to RDKit aromaticity model #17

@DrrDom

Description

@DrrDom

I started to play with different filters and found that many compounds were rejected by some of them and started to investigate the cases. One example is Filter82_pyridinium rule ([c,n]1[c,n][c,n][c,n][c,n]n(C)1) from Inpharmatica set.
RDKit aromatizes some compounds like in example below even with AROMATICITY_SIMPLE model. This results in matching the SMARTS pattern, what I consider a false positive result.
The question is whether it was expected that this pattern should remove all such compounds or this should be relevant only for compounds with charged nitrogen ([c,n]1[c,n][c,n][c,n][c,n][n+](C)1)?
Or there could be another workaround? Or this is more rdkit aromaticity model issue?

from rdkit import Chem

smi = 'COC1=C2N(C)C(=O)C3=C(OC(C)(C)C=C3)C2=CC=C1'
m = Chem.MolFromSmiles(smi, sanitize=False)
Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)
Chem.SetAromaticity(m, Chem.AROMATICITY_SIMPLE)
print(Chem.MolToSmiles(m))

sma = '[c,n]1[c,n][c,n][c,n][c,n][n](C)1'   # 
pat = Chem.MolFromSmarts(sma)

print(m.GetSubstructMatch(pat))

output

COc1cccc2c3c(c(=O)n(C)c12)C=CC(C)(C)O3
(3, 16, 9, 8, 6, 4, 5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions