Skip to content

Laboratoire-de-Chemoinformatique/SCOPE-DEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCOPE-DEL

SCOPE-DEL logo

Similarity & COverage Prioritization and Evaluation for DNA-Encoded Libraries (SCOPE-DEL): Notebooks, data, and results for GTM-driven chemography to prioritize DNA-Encoded Libraries (DELs) by balancing similarity and chemical-space coverage.

License: MIT Package manager: PDM


Table of contents


Repository structure

SCOPE-DEL/
├── notebooks/
│   └── Analysis_results.ipynb            # Metrics, comparisons, figure export
├── data/
│   ├── raw/                              # Input sources (unmodified)
│   └── processed/                        # Cleaned data ready for modeling
├── results/
│   ├── dels_100/                         # Metrics
├── docs/
│   └── assets/
│       └── SCOPE-del-logo.png            
├── pyproject.toml                        # PDM project config
├── pdm.lock                              # Locked dependency graph (generated)
├── LICENSE                               # MIT
└── README.md

Quick start (PDM)

Requires Python 3.10+ and PDM 2. If you don’t have PDM:

pipx install pdm   # recommended
# or: pip install -U pdm
  1. Create and activate the PDM virtualenv
# inside the repo root
pdm venv create -i python3.10     # or your preferred interpreter
pdm use -f .venv/bin/python        # on Windows: .venv\Scripts\python.exe
  1. Install dependencies
pdm install

Notes:

  • RDKit wheels are platform-dependent. The project uses rdkit/rdkit-pypi. If installation fails on your OS, see RDKit’s wheel/conda guidance or switch the fingerprinting backend in the notebooks.
  1. Run the notebooks
# (optional) register a Jupyter kernel named “SCOPE-del”
pdm run python -m ipykernel install --user --name SCOPE-del

# start Jupyter
pdm run jupyter lab     # or: pdm run jupyter notebook
  1. Recreate exports
  • Run notebooks/DataPreprocessing.ipynb to populate data/processed/.
  • Run notebooks/GTM_GMM_optimization_benchmark.ipynb for GTM map selection.
  • Run notebooks/Analysis_results.ipynb to produce figures/tables in results/.

Handy commands

# freeze dependencies for external runners/CI
pdm export -f requirements -o requirements.txt --without-hashes
# run quality tools if configured
pdm run black . && pdm run isort .

Data

  • data/raw/ contains references (or instructions) for obtaining large sources not committed to the repo.
  • data/processed/ is generated by DataPreprocessing.ipynb (fingerprints, deduped sets, splits).
  • For reproducibility, the preprocessing notebook documents exact retrieval, cleaning, and descriptor parameters.

Reproducing figures & tables

Analysis_results.ipynb exports:

  • Correlation heatmaps comparing GTM-space vs fingerprint-space metrics
  • Coverage-vs-similarity scatter plots
  • Density/class landscapes
  • Summary CSVs with overlaps, EF@k, and other selection metrics

Artifacts are written to results/figures/ and results/tables/ with filenames matching manuscript labels.


Results

Key takeaways reproduced by the notebooks:

  • GTM-derived, centroid-oriented metrics serve as practical proxies for pairwise fingerprint measures in DEL selection.
  • Visual GTM landscapes help balance similarity to a reference with broader chemical-space coverage.

Contributing

Issues and PRs are welcome. Please:

  • Keep notebook outputs deterministic where possible.
  • Note OS/Python/PDM versions when reporting issues.
  • Discuss large additions (new DEL panels, alternative metrics, new GTM maps) in an issue first.

License

This project is licensed under the MIT License. See LICENSE for details.


Contributors

  • Alexey Orlov aorlov@unistra.fr - contributor to the development of ChemographyKit; performed machine learning, interpreted the data, and contributed to manuscript writing.
  • Dragos Horvath dhorvath@unistra.fr - provided overarching guidance, conceived and planned the research, and supervised the overall project.
  • Alexandre Varnek varnek@unistra.fr - provided overarching guidance, conceived and planned the research, and supervised the overall project.
  • Louis Plyer louis.plyer@unistra.fr - contributor to the development of ChemographyKit; performed machine learning, interpreted the data, and contributed to manuscript writing.
  • Tagir Akhmetshin tagirshin@gmail.com - contributor to the development of ChemographyKit.
  • Erik Yeghyan varnek@unistra.fr - preparation of ChEMBL datasets used in this study.
  • Fanny Bonachera varnek@unistra.fr - preparation of ChEMBL datasets used in this study.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors