Skip to content

luphone04/SourceTrace-rag

Repository files navigation

title SourceTrace - Advanced RAG System
emoji 🔍
colorFrom blue
colorTo purple
sdk docker
pinned false
app_port 7860

🔍 SourceTrace

Your Documents. Your Questions. Zero BS.

Tired of Ctrl+F-ing through 47 PDFs at 2 AM? We've all been there.

You know the answer is somewhere in those documents. But where? Page 73? Page 142? That other PDF you downloaded last week?

Stop the madness. Just ask SourceTrace.

Upload your docs. Ask questions like a normal human. Get answers with receipts (aka citations, because we're not making stuff up).

🚀 Try It Now


🛠️ The Tech

Look, we could've slapped together a basic keyword search and called it a day. But you deserve better than that.

Here's what's under the hood:

🧠 The Brain: GPT-4 Turbo

Because sometimes you need the smartest AI in the room. We're using OpenAI's GPT-4 to actually understand your questions and generate coherent answers—not just spit out random sentences.

🔍 The Search Engine: Hybrid Search (Vector + BM25)

Ever searched for "machine learning" but the document only says "ML"? Traditional search chokes. Vector search gets it.

But wait—what if you're looking for the exact phrase "Contract #4829"? Vector search might get creative. That's where BM25 (keyword search) saves the day.

We use both. At the same time. It's like having a bloodhound AND a detective on your case.

📚 The Memory: ChromaDB

Your documents get chopped into digestible chunks, turned into mathematical vectors, and stored in ChromaDB. Think of it as a really smart filing cabinet that actually remembers where you put stuff.

🎯 The Proof: Citation Tracking

Every answer comes with receipts. We don't just tell you the answer—we show you exactly which part of which document we got it from. Because "trust me bro" isn't a valid source.

📊 The Report Card: RAGAS Evaluation

How do we know this thing actually works? RAGAS metrics. It's like a report card for AI:

  • Faithfulness: Is the answer actually based on your documents? (Not hallucinating?)
  • Relevancy: Does it answer what you asked? (Or just ramble?)
  • Precision & Recall: Did we find the right stuff? All of it?

Tech Stack:

  • 🤖 LLM: OpenAI GPT-4 Turbo
  • 🔢 Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • 💾 Vector DB: ChromaDB
  • 🔎 Search: Hybrid (RRF fusion of Vector + BM25)
  • 🧩 Framework: LangChain 0.3
  • ⚙️ Backend: FastAPI
  • 🎨 Frontend: Streamlit
  • 📈 Evaluation: RAGAS 0.2
  • ☁️ Storage: Supabase

About

AI-powered document Q&A with hybrid search and source citations. Ask questions, get answers with receipts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors