This project provides a Streamlit web application to analyze biological pathways from a list of genes. It uses a combination of public biological databases and a local Large Language Model (LLM) via Ollama to build, reconcile, and visualize a knowledge graph of genes and their associated pathways.
- Easy Gene Input: Paste a list of genes separated by newlines or commas.
- 1-Click Analysis: Automatically fetches data, builds a knowledge graph, and runs analysis.
- LLM-Powered Insights: Generates hypotheses and biological insights from the network structure.
- Interactive Visualization: Displays the resulting gene-pathway graph.
- Data Fetching: Gathers pathway and interaction data from KEGG, Reactome, UniProt, and STRING-DB.
- LLM Reconciliation: Uses a local LLM (via Ollama) to clean, merge, and reconcile the data from the different sources into a coherent set of pathways.
- Graph Analysis: Builds a network graph and analyzes it to find central nodes and communities.
- Insight Generation: Feeds the analysis results back into the LLM to generate plain-language hypotheses and summaries.
- Streamlit UI: Provides a simple web interface for users to input genes and see the results.
These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.
Before you begin, ensure you have the following installed on your system:
- Git: To clone the repository.
- Python 3.9+: To run the application.
- Ollama: To run the local Large Language Model.
- Download and install Ollama for your operating system (macOS, Linux, or Windows).
- After installing, you must pull at least one model. Open your terminal and run:
ollama pull gemma3:1b
- Default Model Note:
gemma3:1bis a relatively small and manageable model, suitable for most modern laptops. - Optional Powerful Model: For more detailed and nuanced insights, you can also pull a larger model:
ollama pull gpt:oss120b-cloud
gpt:oss120b-cloudis a very large model and requires significant computational resources (e.g., a powerful CPU and ample RAM, or a GPU) to run effectively. If you choose to use this model, you will need to update thesrc/adapters/ollama_adapter.pyfile to specifygpt:oss120b-cloudas the model name.
Follow these steps to set up the project environment.
1. Clone the Repository
Open your terminal or command prompt and clone the repository to your local machine:
git clone https://github.com/Hami0095/bio-pathway-mapper.git
cd bio-pathway-mapper/biological_kg2. Create and Activate a Python Virtual Environment
It's highly recommended to use a virtual environment to manage project dependencies.
- On macOS/Linux:
python3 -m venv .venv source .venv/bin/activate - On Windows:
python -m venv .venv .venv\Scripts\activate
3. Install Dependencies
With your virtual environment activated, install the required Python packages:
pip install -r requirements.txtOnce the setup is complete, you can run the Streamlit application.
1. Ensure Ollama is Serving the Model
Make sure the Ollama application is running on your machine. You can check this by looking for the Ollama icon in your system's menu bar or taskbar.
2. Launch the Streamlit App
In your terminal (with the virtual environment still activated), run the following command:
streamlit run streamlit_app.pyThis will open the application in a new tab in your default web browser. You can now start using the tool!
- Abdur Rehman - LinkedIn
We plan to create an "Automated Setup Agent" that will download and complete the setup of installing the model on users' computers and will launch the Streamlit app on their browsers along with the Ollama server. You can track the progress of this feature in Issue #1 (this is a placeholder link).