- Python 3.8 or higher
- Git
- Clone the repository:
git clone https://github.com/SCAI-BIO/primekg-rag.git
cd primekg-rag- Create and activate a Python virtual environment:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows (Git Bash):
source venv/Scripts/activate
# On Windows (Command Prompt):
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Set up environment file:
# Create .env file with correct paths
echo "OPENAI_API_KEY=your_key_here" > .envReplace your_key_here with your actual OpenAI API key
- Download and set up databases:
cd primekg-rag
python setup_databases.pyThis will download ~450MB of data including CSV files and database collections
- Run the application:
# Make sure you're in the primekg-rag directory
streamlit run app.pyThe application will be available at http://localhost:8501
"streamlit: command not found"
- Make sure the virtual environment is activated:
source venv/Scripts/activate
"PUBMED_DB_PATH not set in .env file"
- Ensure the .env file is in the root directory (not in primekg-rag/)
- Check that PUBMED_DB_PATH=pubmed_db (not primekg-rag/pubmed_db)
"Collection [pubmed_abstracts] does not exist"
- Run the database setup again:
cd primekg-rag && python setup_databases.py - Verify the .env file has the correct PUBMED_DB_PATH
Missing CSV files warnings
- The setup script should download nodes.csv and best_question_matches.csv automatically
- If missing, re-run:
cd primekg-rag && python setup_databases.py
- Docker and Docker Compose installed
- Git
- Clone the repository:
git clone https://github.com/SCAI-BIO/primekg-rag.git
cd primekg-rag- Set up environment file:
echo "OPENAI_API_KEY=your_key_here" > .envReplace your_key_here with your actual OpenAI API key
- Build and run with Docker Compose:
docker-compose up --buildThe application will be available at http://localhost:8501
- Make sure to replace
your_key_herewith your actual OpenAI API key in the.envfile - The first run may take longer as it downloads and sets up the required databases
- For development, you may want to use the local setup instead of Docker for faster iteration