Skip to content

lutz-he/local_pdf_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF RAG Tool

A very simple local Retrieval-Augmented Generation (RAG) system for querying PDF documents using LLMs. This tool uses Ollama for running local language models and ChromaDB for vector storage.

Installation

  1. Install Python dependencies:

    pip install -r requirements.txt
  2. Install Ollama:

Usage

1. Ingest PDF(s) into the database

Place your PDF files in the data/ directory. Then run:

python src/db_ingest.py

This will process all PDFs in data/ and store their embeddings in ChromaDB. Use --clean to clean the database before ingestion.

2. Query

2.1 CLI

Run the main script and follow the prompts:

python main.py

Enter your question when prompted. The system will retrieve relevant document chunks and generate an answer using the local LLM via Ollama.

Streamlit App

Run the app:

streamlit run app.py

Notes

  • Ensure Ollama is running before querying.
  • The database is stored in the chroma/ directory.
  • Models are configured in src.utils.Models
  • For advanced usage, see comments in the source files in src/.
    • e.g. python main.py --top_k 5 --similarity_threshold 0.25

Ideas

  • Interface: add file browser and db_ingest
  • Testing & Eval
  • Include Metadata

Special thanks to pixegami for the informative tutorial that inspired this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages