Skip to content

Feature Request: Integrate ragbio for Knowledge-Aware Cluster Summarization #2

@man4ish

Description

@man4ish

We maintain two separate projects: scatlas-builder, the core single-cell analysis platform, and rag-gene-discovery-assistant (ragbio), the RAG-LLM–based biological knowledge extraction tool.
The objective is to integrate ragbio into scatlas-builder to generate biologically grounded, citation-rich summaries for cell clusters.

This integration is a key step toward evolving scatlas-builder into a full scientific discovery platform that combines computational analysis with literature-backed biological interpretation.


Acceptance Criteria

1. New API Endpoint

Create a new FastAPI endpoint to provide LLM-augmented summaries for cluster marker genes.

  • Path: /summarize/cluster/{dataset_id}

  • Method: POST

  • Input:
    JSON containing the dataset_id and a dictionary of cluster markers, e.g.:

    {
      "Cluster 0": ["GeneA", "GeneB", "GeneC"],
      "Cluster 1": ["GeneD", "GeneE"]
    }
  • Output:
    JSON with biological interpretations and PubMed-derived citations for each cluster.


2. Service Layer Integration

The new endpoint must call a dedicated service function—for example:
llm_service.run_rag_summary()—that invokes the ragbio RAG pipeline and returns structured summaries.


3. Deployment Updates

Modify:

  • docker/Dockerfile
  • requirements.txt

to ensure that ragbio and its dependencies (Ollama runtime, FAISS index, BioBERT embeddings, etc.) are available within the scatlas-builder deployment environment.
Alternatively, configure scatlas-builder to communicate with an external Ollama/RAG service.


4. Diagram + Architectural Explanation

Provide a clear architectural explanation and a System Diagram showing how:

  • client requests flow through the new FastAPI route,
  • marker genes reach the analysis layer (Scanpy),
  • results pass into the LLM orchestration layer,
  • ragbio performs retrieval and LLM reasoning,
  • summaries return to the client.

System Integration Diagram (Reference Implementation)

This diagram illustrates the expected data and control flow needed to integrate the ragbio knowledge layer into the scatlas-builder analysis layer.

System Architecture

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions