Composer Agents for Bioinformatics

Overview

This repository aims to build effective "composer" agents for bioinformatics, particularly in the domains of single-cell and spatial single-cell analysis within neurobiology and cancer research.

The concept of a "composer" agent is to assemble pieces (code snippets, domain knowledge, policies) from a catalogue to generate an integrated bioinformatics agent capable of:

Loading and preprocessing data (e.g., scRNA-seq, spatial transcriptomics)
Performing standard QC and filtering
Running downstream analyses (clustering, dimensional reduction, cell type annotation)
Executing specialized tasks (like spatial deconvolution, cell-cell interaction analysis, RNA velocity, batch correction, etc.)

By customizing different catalogue entries for specific tasks, species, or data types, the composer can produce a final agent with all the necessary instructions and domain knowledge to tackle a specific user request or workflow.

The Problem

Standardizing bioinformatics workflows for single-cell and spatial single-cell data analysis can be time-consuming and error-prone, especially when integrating multiple datasets, applying new tools, or dealing with specialized biological questions (neurobiology, oncology, etc.).

Typical challenges include:

Selecting consistent code snippets or libraries for each analysis step.
Incorporating relevant domain knowledge to interpret results (e.g., neuron-specific markers, cancer pathways).
Enforcing agent policies (e.g., code-only responses, confidentiality, HPC usage).
Handling batch effects, cell type annotations, or multi-sample integration with reproducible pipelines.

Solution: Our composer agent dynamically stitches appropriate catalogue items (policies, domain knowledge, code templates, etc.) to generate a complete and specialized agent. This allows a flexible, modular approach to constructing analysis workflows.

Inputs

Agent Catalogue:
- A JSON-based repository of code snippets, domain knowledge, policies, specialized functions for single-cell/spatial analysis, etc.
- Each catalogue item contains a title (hinting at its purpose) and text (the actual content injected into the final system prompt).
User/Planner Instructions:
- A higher-level plan or user instructions specifying which pieces to assemble, in what order, and any specialized parameters (e.g., choosing cancer vs. neuro domain, using scRNA-seq vs. scATAC-seq modules).
Data Context (optional):
- File paths or data references (e.g., .h5ad, .mtx, coordinates) to inform the composer how to load and process the dataset.

Outputs

Agent Prompt (System Prompt for the Final Agent):
- A consolidated prompt that merges:
  - The relevant policies (professional conduct, code rules, HPC usage, etc.).
  - Domain-specific knowledge (markers, workflows, interpretive context for neuro or cancer).
  - Step-by-step function definitions or code snippets needed to complete the analysis.
- This final prompt is automatically generated by the composer, ensuring internal consistency and adherence to the provided plan.
Executable Bioinformatics Agent:
- Once the final prompt is deployed into a code-capable environment (e.g., a Jupyter kernel), it executes each step, loading data, performing QC, clustering, and more.
- The agent also returns interpretations of each step’s outputs (e.g., images, stats, differential expression results).

How to Use

Review the Catalogue: Examine the available pieces. Decide which are needed for your specific single-cell or spatial data task (e.g., Load Data, QC, Spatial Deconvolution, etc.).
Assemble: Combine the chosen pieces into a final agent prompt. This is done via the composer logic, which merges the text from each relevant catalogue entry.
Execute: Provide the final system prompt to a code execution environment. The resulting agent is now specialized to run your single-cell or spatial workflow.
Iterate: If the user needs additional analyses or domain expansions, insert new or updated pieces into the catalogue, then rebuild the agent.

Example Scenario

A user has Slide-seq data for brain tissue. They want to integrate scRNA-seq data to deconvolute spot-level cell types.
The composer selects:
- Core Global Agent Policy
- Spatial Data Loading
- Neurobiology Domain Knowledge
- Tangram Deconvolution
- Agent Policy: Final Reporting
The composer merges these items and injects the final text into a system prompt.
The agent proceeds with data loading, QC, clustering, and Tangram-based mapping, returning interpretative text for each step.

Contributing

Add new catalogue items (policies, domain knowledge, code templates) by following the existing JSON format.
Edit or refine existing items for more precise coverage (e.g., new QC methods, domain expansions for immunotherapy, neurodegenerative diseases, etc.).
Test your changes by assembling a new agent and running an end-to-end analysis pipeline.

License

This project is distributed under the terms of [License Name Here]. Refer to LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
chat_gpt_solution		chat_gpt_solution
README.md		README.md
catalogue.json		catalogue.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Composer Agents for Bioinformatics

Overview

The Problem

Inputs

Outputs

How to Use

Example Scenario

Contributing

License

About

Uh oh!

Releases

Packages

Languages

rgopalan01/BioAgentBuilder

Folders and files

Latest commit

History

Repository files navigation

Composer Agents for Bioinformatics

Overview

The Problem

Inputs

Outputs

How to Use

Example Scenario

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages