Skip to content

An LLM using retrieval augmented generation over a biomedical knowledge graph to answer questions. Served as a website.

Notifications You must be signed in to change notification settings

TonyinBio/bio-chat

Repository files navigation

Bio Chat

An LLM using retrieval augmented generation over a biomedical knowledge graph to answer questions. Served as a website.

Set Neo4j password:

export NEO4J_PASSWORD=...

Run the app:

docker compose up -d

Enter docker container:

docker exec -it bio-chat-devcontainer-1 bash

Setup env without docker:

conda create -n bio_chat python=3.11 -y
conda activate bio_chat
pip install -r requirements.txt

Also download Ollama

Process text and upload to neo4j:

python ingest.py

Run a prompt:

python starter.py

TODO: Citations

  • ingest should add a new edge property
  • retrieve should return edge property

TODO: Improve prompt?

TODO: Create NER evaluation dataset

TODO: Mess with CLIP/soft prompting?

TODO: Decide on predicates

  • chemical
  • source
  • disposition
  • exposure_route
  • food
  • health_effect
  • organoleptic_effect
  • process
  • role

Note to self: plan:

  • VectorIndex (VectorIndexRetriever) -> interfaces with embeddings in neo4j
  • GraphIndex (KGRAGRetriever) -> interfaces with triplets and subgraph in neo4j
  • GRetriever -> combines the two
  • will use both neo4jstores but have them point to the same db

About

An LLM using retrieval augmented generation over a biomedical knowledge graph to answer questions. Served as a website.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published