Created by supermax01
A specialized medical question-answering system that leverages large language models (LLMs) and retrieval-augmented generation (RAG) to provide accurate medical information based on trusted medical literature. This system runs entirely on your local machine, giving you full control over your data and privacy.
This project creates an end-to-end local RAG (Retrieval-Augmented Generation) medical chatbot that:
- Extracts knowledge from medical PDFs and documents
- Processes and chunks the text into manageable segments
- Generates embeddings using sentence transformers
- Stores vectors in Pinecone for efficient semantic search
- Retrieves relevant context when a medical question is asked
- Generates accurate answers using Ollama's LLMs with the retrieved context
The system is designed to provide factual, context-based responses to medical queries while citing the sources of information, making it suitable for educational purposes and preliminary medical information lookup.
RAG combines the power of large language models with information retrieval systems to generate more accurate, factual, and contextually relevant responses:
-
Retrieval: When a question is asked, the system searches through a knowledge base (in this case, your medical PDFs) to find the most relevant information.
-
Augmentation: The retrieved information is added to the prompt sent to the language model.
-
Generation: The language model generates a response based on both its pre-trained knowledge and the specific information retrieved from your documents.
- Accuracy: Responses are grounded in specific documents you provide, reducing hallucinations
- Privacy: Your medical documents and queries never leave your computer
- Customization: You control exactly what knowledge the system has access to
- Transparency: The system shows you the sources it used to generate each answer
- Cost-effective: No need for expensive API calls to cloud-based LLMs
-
Document Ingestion: The system reads and processes any PDF files placed in the
data/directory. These can be medical textbooks, research papers, or any text-based medical information. -
Knowledge Base Creation: The content is split into chunks and converted into vector embeddings, which are stored in Pinecone.
-
Question Processing: When you ask a question, the system:
- Converts your question into an embedding
- Searches Pinecone for the most relevant text chunks
- Retrieves these chunks to use as context
-
Answer Generation: The system uses Ollama to run a local LLM (like llama3.2 or phi4-mini) that:
- Receives your question and the retrieved context
- Generates an answer based only on the provided context
- Cites the sources used to create the response
- Fully Local Processing: All components run on your machine, with no data sent to external APIs
- Document Processing: Automatically extracts and processes text from medical PDFs
- Vector Search: Uses semantic search to find the most relevant information for each query
- Context-Aware Responses: Generates answers based only on the retrieved medical literature
- Source Attribution: Provides the sources of information used to generate each answer
- Modular Architecture: Easily extensible with new data sources or models
- Interactive Web Interface: User-friendly Streamlit interface for asking questions
- LLM: Ollama (with models like llama3.2, phi4-mini) - runs locally on your machine
- Embeddings: HuggingFace Sentence Transformers - processed locally
- Vector Database: Pinecone - for efficient similarity search
- Document Processing: LangChain document loaders and text splitters
- Web Interface: Streamlit - runs locally in your browser
- Language: Python 3.9+
End-to-End-Medical-Chatbot/
├── data/ # Directory for medical PDF files
│ └── README.md # Instructions for adding medical PDFs
├── src/ # Source code
│ ├── embeddings/ # Embeddings generation module
│ ├── llm/ # LLM integration with Ollama
│ ├── retrieval/ # Pinecone vector search module
│ ├── utils/ # Document processing and QA utilities
│ ├── app.py # Streamlit web application
│ └── check_setup.py # Setup verification script
├── .env # Environment variables (API keys)
├── .gitignore # Git ignore file
├── requirements.txt # Dependencies
├── LICENSE # License file
└── README.md # This file
Follow these steps to set up and run the medical chatbot on your device:
git clone https://github.com/supermax01/End-to-End-Medical-Chatbot.git
cd End-to-End-Medical-ChatbotIt's recommended to use a virtual environment:
# For conda (recommended)
conda create -n mchatbot python=3.9
conda activate mchatbot
# OR for venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt- Create a free account at Pinecone
- Create a new project and get your API key
- Create a
.envfile in the project root with your API key:
PINECONE_API_KEY=your_pinecone_api_key
- Download and install Ollama from ollama.ai for your operating system
- Start the Ollama service:
- On macOS: Ollama will start automatically after installation
- On Linux: Run
ollama servein a terminal - On Windows: Ollama will start automatically after installation
- Pull the models you want to use:
ollama pull llama3.2 # Recommended model
# OR
ollama pull phi4-mini # Alternative model- Verify Ollama is running and models are available:
ollama list- Place your medical PDF files in the
data/directory- The app will create this directory automatically if it doesn't exist
- You can use medical textbooks, research papers, or any PDF with medical information
- Make sure you have the appropriate rights to use these documents
Run the setup check script to verify that everything is properly configured:
python src/check_setup.pyThis script will:
- Check if Python version is compatible
- Verify all required dependencies are installed
- Check if environment variables are set correctly
- Verify Ollama is installed and running
- Check if the data directory exists and contains PDF files
If any issues are found, the script will provide guidance on how to fix them.
streamlit run src/app.pyThe application will:
- Automatically open in your default web browser at http://localhost:8501
- Process all PDF files in your
data/directory - Generate and store embeddings in Pinecone
- Start the chat interface where you can ask medical questions
If you want to access the app from other devices on your network:
streamlit run src/app.py --server.address 0.0.0.0Then access it from other devices using your computer's IP address: http://YOUR_IP_ADDRESS:8501
If you encounter any issues:
- Ollama not running: Make sure Ollama is installed and running
- Pinecone API key error: Check your
.envfile has the correct API key - No PDF files found: Add PDF files to the
data/directory - Import errors: Make sure all dependencies are installed correctly
- Memory issues: Try using smaller PDF files or fewer files
You can modify which Ollama model is used by editing src/llm/ollama_llm.py:
# Change the default model and parameters
def get_ollama_llm(
model_name="llama3.2", # Change to any model you've pulled in Ollama
temperature=0.5, # Adjust for more/less creative responses
num_predict=256, # Maximum tokens to generate
top_k=40, # Sampling parameter
top_p=0.9, # Sampling parameter
repeat_penalty=1.18 # Penalty for repetition
):You can modify how questions are formatted for the LLM by editing src/utils/qa_chain.py:
def get_qa_prompt():
template = """
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Question: {question}
Context: {context}
Only answer in the context of the provided information. Be concise and accurate.
"""
# You can customize this prompt template to change how the model responds- "What are the symptoms of diabetes?"
- "How is pneumonia diagnosed?"
- "What treatments are available for migraines?"
- "What are the side effects of ibuprofen?"
- The chatbot can only answer based on the information in the PDF files you provide
- It is designed for informational purposes only and should not replace professional medical advice
- Response quality depends on the quality and coverage of the source documents
- The system requires Ollama to be installed and running on your machine
- While the LLM and processing run locally, Pinecone is a cloud service that stores your vector embeddings
- Integration with medical knowledge graphs
- Support for multi-modal inputs (images, lab results)
- User feedback loop for answer quality improvement
- Expanded medical document corpus
- Option for fully local vector database (like Chroma or FAISS) instead of Pinecone


