Multi-agent system for chemical space analysis
Documentation ·
Preprint (ChemRxiv)
Warning This repository is under active development. APIs, agent behavior, and project structure may change without notice.
ChemSpace Copilot is a multi-agent system powered by the Agno framework. The default runtime team coordinates seven specialized AI agents for ChEMBL bioactivity download, unified GTM workflows, downstream chemoinformatics, report generation, small-molecule generation, peptide generation, and retrosynthetic planning. A separate robustness evaluation agent is available for analyzing prompt-robustness test outputs. The GTM engine is provided by chemographykit.
┌─────────────────────────────────────────┐
│ UI Layer (Chainlit) │ Real-time chat interface
├─────────────────────────────────────────┤
│ Agent Orchestration (teams.py) │ Multi-agent coordination
├─────────────────────────────────────────┤
│ Specialized Agents (factories.py) │ 7 runtime agents + 1 evaluation agent
├─────────────────────────────────────────┤
│ Tools + Storage (toolkits + S3) │ Domain logic & persistence
└─────────────────────────────────────────┘
- 7 Runtime Agents + 1 Evaluation Agent — ChEMBL data download, unified GTM operations, chemoinformatics analysis, report generation, small-molecule generation, peptide WAE workflows, retrosynthetic planning, and robustness evaluation
- Generative Topographic Mapping — Dimensionality reduction and visualization of chemical space via chemographykit
- Molecular and Peptide Generation — LSTM autoencoder-based small-molecule generation plus peptide WAE generation, interpolation, and GTM-guided targeting
- S3/MinIO Integration — Session-scoped cloud storage with local filesystem fallback
- Chainlit Interface — WebSocket-based real-time chat with password authentication, file upload, and inline molecule rendering
- Agentic Memory — SQLite-backed agentic state and recent session history shared across agent workflows
- Robustness Testing — Framework for validating prompt variation handling with semantic similarity scoring
# Build containers
docker compose build chainlit-app
# Run (prompts for DEEPSEEK_API_KEY only when using the DeepSeek provider)
./docker-start.shAccess the application at http://localhost:8000
See the Docker guide for the full Docker deployment guide.
Environment setup
For file-based configuration, copy .env.example to .env in the project root:
# Required only for the default DeepSeek provider
DEEPSEEK_API_KEY=your-api-key-here
# Optional model overrides (otherwise .modelconf is used)
# MODEL_PROVIDER=deepseek
# MODEL_ID=deepseek-chat
# OLLAMA_HOST=http://localhost:11434
# Optional — S3/MinIO storage (set USE_S3=true only when you want remote storage)
USE_S3=false
# When enabled:
S3_ENDPOINT_URL=http://localhost:9000
MINIO_ACCESS_KEY=cs_copilot
MINIO_SECRET_KEY=chempwd123
ASSETS_BUCKET=chatbot-assets
# Optional — ChEMBL local MySQL (faster queries, offline use)
# Download dump: https://chembl.gitbook.io/chembl-interface-documentation/downloads
# CHEMBL_MYSQL_HOST=localhost
# CHEMBL_MYSQL_PORT=3306
# CHEMBL_MYSQL_USER=chembl
# CHEMBL_MYSQL_PASSWORD=
# CHEMBL_MYSQL_DATABASE=chembl_36The repository also includes a tracked .modelconf file. Edit it if you want to switch from the default DeepSeek backend to a local Ollama model.
Install dependencies
uv syncS3/MinIO setup (optional)
# Run the interactive setup script
python scripts/setup_s3.py
# Or start MinIO manually
docker run -d --name minio \
-p 9000:9000 -p 9001:9001 \
-v /mnt/data:/data \
-e MINIO_ROOT_USER=cs_copilot \
-e MINIO_ROOT_PASSWORD=chempwd123 \
minio/minio server /data --console-address ":9001"If the container already exists: docker start minio
Optional Chainlit Persistence
Chainlit persistence is disabled by default in chainlit.toml. Only set up PostgreSQL if you plan to enable Chainlit persistence manually.
docker run --name chainlit-pg -p 5432:5432 -d \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_USER=postgres \
-e POSTGRES_DB=chainlit \
postgres:16
export DATABASE_URL="postgresql://postgres:postgres@localhost:5432/chainlit"If the container already exists: docker start chainlit-pg
uv run chainlit run chainlit_app.py -wNotes:
- The bundled
chainlit.tomlcurrently has[persistence] enabled = false. - The app sets a per-thread title from your first message; you can rename it in the UI.
An example workflow is available in notebooks/cs_copilot.ipynb.
The system uses a Factory Pattern + Registry for agent creation. The default team orchestrator coordinates seven runtime agents, and an eighth agent is available separately for robustness analysis:
| Agent | Role |
|---|---|
| ChEMBL Downloader | Downloads and filters bioactivity data from ChEMBL (REST API by default; optional local MySQL backend) |
| GTM Agent | Unified GTM operations: build, load, density analysis, activity landscapes, projection, and GTM sampling support |
| Chemoinformatician | Downstream chemoinformatics analysis including scaffold, similarity, clustering, and SAR workflows |
| Report Generator | Formats analysis results into reports and visual outputs |
| Autoencoder | Small-molecule generation via LSTM autoencoders, including standalone and GTM-guided modes |
| Peptide WAE | Peptide sequence generation, latent-space GTM workflows, and DBAASP-backed peptide activity landscapes |
| SynPlanner | Retrosynthetic planning and route visualization for target molecules |
| Agent | Role |
|---|---|
| Robustness Evaluation | Analyzes robustness test runs, score distributions, failures, and trends |
Agents share state via session_state and persist memory in SQLite. All file I/O goes through a unified S3/local storage abstraction.
For full architectural details, see the documentation.
This project is licensed under the MIT License.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Ensure code passes
pre-commit run --all-files - Submit a pull request
See the Contributing Guide for code style conventions and detailed guidelines.