ChemSpace Copilot

Multi-agent system for chemical space analysis
Documentation · Preprint (ChemRxiv)

Warning This repository is under active development. APIs, agent behavior, and project structure may change without notice.

Overview

ChemSpace Copilot is a multi-agent system powered by the Agno framework. The default runtime team coordinates seven specialized AI agents for ChEMBL bioactivity download, unified GTM workflows, downstream chemoinformatics, report generation, small-molecule generation, peptide generation, and retrosynthetic planning. A separate robustness evaluation agent is available for analyzing prompt-robustness test outputs. The GTM engine is provided by chemographykit.

┌─────────────────────────────────────────┐
│  UI Layer (Chainlit)                    │  Real-time chat interface
├─────────────────────────────────────────┤
│  Agent Orchestration (teams.py)         │  Multi-agent coordination
├─────────────────────────────────────────┤
│  Specialized Agents (factories.py)      │  7 runtime agents + 1 evaluation agent
├─────────────────────────────────────────┤
│  Tools + Storage (toolkits + S3)        │  Domain logic & persistence
└─────────────────────────────────────────┘

Features

7 Runtime Agents + 1 Evaluation Agent — ChEMBL data download, unified GTM operations, chemoinformatics analysis, report generation, small-molecule generation, peptide WAE workflows, retrosynthetic planning, and robustness evaluation
Generative Topographic Mapping — Dimensionality reduction and visualization of chemical space via chemographykit
Molecular and Peptide Generation — LSTM autoencoder-based small-molecule generation plus peptide WAE generation, interpolation, and GTM-guided targeting
S3/MinIO Integration — Session-scoped cloud storage with local filesystem fallback
Chainlit Interface — WebSocket-based real-time chat with password authentication, file upload, and inline molecule rendering
Agentic Memory — SQLite-backed agentic state and recent session history shared across agent workflows
Robustness Testing — Framework for validating prompt variation handling with semantic similarity scoring

Quick Start

Option 1: Docker (Recommended)

# Build containers
docker compose build chainlit-app

# Run (prompts for DEEPSEEK_API_KEY only when using the DeepSeek provider)
./docker-start.sh

Access the application at http://localhost:8000

See the Docker guide for the full Docker deployment guide.

Option 2: Local Installation

Environment setup

For file-based configuration, copy .env.example to .env in the project root:

# Required only for the default DeepSeek provider
DEEPSEEK_API_KEY=your-api-key-here

# Optional model overrides (otherwise .modelconf is used)
# MODEL_PROVIDER=deepseek
# MODEL_ID=deepseek-chat
# OLLAMA_HOST=http://localhost:11434

# Optional — S3/MinIO storage (set USE_S3=true only when you want remote storage)
USE_S3=false
# When enabled:
S3_ENDPOINT_URL=http://localhost:9000
MINIO_ACCESS_KEY=cs_copilot
MINIO_SECRET_KEY=chempwd123
ASSETS_BUCKET=chatbot-assets

# Optional — ChEMBL local MySQL (faster queries, offline use)
# Download dump: https://chembl.gitbook.io/chembl-interface-documentation/downloads
# CHEMBL_MYSQL_HOST=localhost
# CHEMBL_MYSQL_PORT=3306
# CHEMBL_MYSQL_USER=chembl
# CHEMBL_MYSQL_PASSWORD=
# CHEMBL_MYSQL_DATABASE=chembl_36

The repository also includes a tracked .modelconf file. Edit it if you want to switch from the default DeepSeek backend to a local Ollama model.

Install dependencies

uv sync

S3/MinIO setup (optional)

# Run the interactive setup script
python scripts/setup_s3.py

# Or start MinIO manually
docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -v /mnt/data:/data \
  -e MINIO_ROOT_USER=cs_copilot \
  -e MINIO_ROOT_PASSWORD=chempwd123 \
  minio/minio server /data --console-address ":9001"

If the container already exists: docker start minio

Optional Chainlit Persistence

Chainlit persistence is disabled by default in chainlit.toml. Only set up PostgreSQL if you plan to enable Chainlit persistence manually.

docker run --name chainlit-pg -p 5432:5432 -d \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_DB=chainlit \
  postgres:16

export DATABASE_URL="postgresql://postgres:postgres@localhost:5432/chainlit"

If the container already exists: docker start chainlit-pg

Usage

Chainlit App

uv run chainlit run chainlit_app.py -w

Notes:

The bundled chainlit.toml currently has [persistence] enabled = false.
The app sets a per-thread title from your first message; you can rename it in the UI.

Jupyter Notebook

An example workflow is available in notebooks/cs_copilot.ipynb.

Architecture

The system uses a Factory Pattern + Registry for agent creation. The default team orchestrator coordinates seven runtime agents, and an eighth agent is available separately for robustness analysis:

Runtime Team

Agent	Role
ChEMBL Downloader	Downloads and filters bioactivity data from ChEMBL (REST API by default; optional local MySQL backend)
GTM Agent	Unified GTM operations: build, load, density analysis, activity landscapes, projection, and GTM sampling support
Chemoinformatician	Downstream chemoinformatics analysis including scaffold, similarity, clustering, and SAR workflows
Report Generator	Formats analysis results into reports and visual outputs
Autoencoder	Small-molecule generation via LSTM autoencoders, including standalone and GTM-guided modes
Peptide WAE	Peptide sequence generation, latent-space GTM workflows, and DBAASP-backed peptide activity landscapes
SynPlanner	Retrosynthetic planning and route visualization for target molecules

Separate Evaluation Agent

Agent	Role
Robustness Evaluation	Analyzes robustness test runs, score distributions, failures, and trends

Agents share state via session_state and persist memory in SQLite. All file I/O goes through a unified S3/local storage abstraction.

For full architectural details, see the documentation.

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Ensure code passes pre-commit run --all-files
Submit a pull request

See the Contributing Guide for code style conventions and detailed guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.chainlit		.chainlit
.github		.github
docs		docs
examples		examples
models/autoencoder		models/autoencoder
notebooks		notebooks
prisma		prisma
public		public
scripts		scripts
src/cs_copilot		src/cs_copilot
tests		tests
.chainlitignore		.chainlitignore
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.modelconf		.modelconf
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
chainlit.md		chainlit.md
chainlit.toml		chainlit.toml
chainlit_app.py		chainlit_app.py
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
docker-start.sh		docker-start.sh
mkdocs.yml		mkdocs.yml
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemSpace Copilot

Overview

Features

Quick Start

Option 1: Docker (Recommended)

Option 2: Local Installation

Usage

Chainlit App

Jupyter Notebook

Architecture

Runtime Team

Separate Evaluation Agent

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChemSpace Copilot

Overview

Features

Quick Start

Option 1: Docker (Recommended)

Option 2: Local Installation

Usage

Chainlit App

Jupyter Notebook

Architecture

Runtime Team

Separate Evaluation Agent

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages