| title | SourceTrace - Advanced RAG System |
|---|---|
| emoji | 🔍 |
| colorFrom | blue |
| colorTo | purple |
| sdk | docker |
| pinned | false |
| app_port | 7860 |
Tired of Ctrl+F-ing through 47 PDFs at 2 AM? We've all been there.
You know the answer is somewhere in those documents. But where? Page 73? Page 142? That other PDF you downloaded last week?
Stop the madness. Just ask SourceTrace.
Upload your docs. Ask questions like a normal human. Get answers with receipts (aka citations, because we're not making stuff up).
Look, we could've slapped together a basic keyword search and called it a day. But you deserve better than that.
Here's what's under the hood:
Because sometimes you need the smartest AI in the room. We're using OpenAI's GPT-4 to actually understand your questions and generate coherent answers—not just spit out random sentences.
Ever searched for "machine learning" but the document only says "ML"? Traditional search chokes. Vector search gets it.
But wait—what if you're looking for the exact phrase "Contract #4829"? Vector search might get creative. That's where BM25 (keyword search) saves the day.
We use both. At the same time. It's like having a bloodhound AND a detective on your case.
Your documents get chopped into digestible chunks, turned into mathematical vectors, and stored in ChromaDB. Think of it as a really smart filing cabinet that actually remembers where you put stuff.
Every answer comes with receipts. We don't just tell you the answer—we show you exactly which part of which document we got it from. Because "trust me bro" isn't a valid source.
How do we know this thing actually works? RAGAS metrics. It's like a report card for AI:
- Faithfulness: Is the answer actually based on your documents? (Not hallucinating?)
- Relevancy: Does it answer what you asked? (Or just ramble?)
- Precision & Recall: Did we find the right stuff? All of it?
Tech Stack:
- 🤖 LLM: OpenAI GPT-4 Turbo
- 🔢 Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- 💾 Vector DB: ChromaDB
- 🔎 Search: Hybrid (RRF fusion of Vector + BM25)
- 🧩 Framework: LangChain 0.3
- ⚙️ Backend: FastAPI
- 🎨 Frontend: Streamlit
- 📈 Evaluation: RAGAS 0.2
- ☁️ Storage: Supabase