PDF RAG Tool

A very simple local Retrieval-Augmented Generation (RAG) system for querying PDF documents using LLMs. This tool uses Ollama for running local language models and ChromaDB for vector storage.

Installation

Install Python dependencies:
```
pip install -r requirements.txt
```
Install Ollama:
- Download and install Ollama from https://ollama.com/download
- Start Ollama:
```
ollama serve
```
- Pull required models (see src/utils.py):
```
ollama pull llama3
```

Usage

1. Ingest PDF(s) into the database

Place your PDF files in the data/ directory. Then run:

python src/db_ingest.py

This will process all PDFs in data/ and store their embeddings in ChromaDB. Use --clean to clean the database before ingestion.

2. Query

2.1 CLI

Run the main script and follow the prompts:

python main.py

Enter your question when prompted. The system will retrieve relevant document chunks and generate an answer using the local LLM via Ollama.

Streamlit App

Run the app:

streamlit run app.py

Notes

Ensure Ollama is running before querying.
The database is stored in the chroma/ directory.
Models are configured in src.utils.Models
For advanced usage, see comments in the source files in src/.
- e.g. python main.py --top_k 5 --similarity_threshold 0.25

Ideas

Interface: add file browser and db_ingest
Testing & Eval
Include Metadata

Special thanks to pixegami for the informative tutorial that inspired this project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF RAG Tool

Installation

Usage

1. Ingest PDF(s) into the database

2. Query

2.1 CLI

Streamlit App

Notes

Ideas

About

Uh oh!

Releases

Packages

Languages

lutz-he/local_pdf_rag

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Tool

Installation

Usage

1. Ingest PDF(s) into the database

2. Query

2.1 CLI

Streamlit App

Notes

Ideas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages