🔍 SourceTrace

title	SourceTrace - Advanced RAG System
emoji	🔍
colorFrom	blue
colorTo	purple
sdk	docker
pinned	false
app_port	7860

🔍 SourceTrace

Your Documents. Your Questions. Zero BS.

Tired of Ctrl+F-ing through 47 PDFs at 2 AM? We've all been there.

You know the answer is somewhere in those documents. But where? Page 73? Page 142? That other PDF you downloaded last week?

Stop the madness. Just ask SourceTrace.

Upload your docs. Ask questions like a normal human. Get answers with receipts (aka citations, because we're not making stuff up).

🛠️ The Tech

Look, we could've slapped together a basic keyword search and called it a day. But you deserve better than that.

Here's what's under the hood:

🧠 The Brain: GPT-4 Turbo

Because sometimes you need the smartest AI in the room. We're using OpenAI's GPT-4 to actually understand your questions and generate coherent answers—not just spit out random sentences.

🔍 The Search Engine: Hybrid Search (Vector + BM25)

Ever searched for "machine learning" but the document only says "ML"? Traditional search chokes. Vector search gets it.

But wait—what if you're looking for the exact phrase "Contract #4829"? Vector search might get creative. That's where BM25 (keyword search) saves the day.

We use both. At the same time. It's like having a bloodhound AND a detective on your case.

📚 The Memory: ChromaDB

Your documents get chopped into digestible chunks, turned into mathematical vectors, and stored in ChromaDB. Think of it as a really smart filing cabinet that actually remembers where you put stuff.

🎯 The Proof: Citation Tracking

Every answer comes with receipts. We don't just tell you the answer—we show you exactly which part of which document we got it from. Because "trust me bro" isn't a valid source.

📊 The Report Card: RAGAS Evaluation

How do we know this thing actually works? RAGAS metrics. It's like a report card for AI:

Faithfulness: Is the answer actually based on your documents? (Not hallucinating?)
Relevancy: Does it answer what you asked? (Or just ramble?)
Precision & Recall: Did we find the right stuff? All of it?

Tech Stack:

🤖 LLM: OpenAI GPT-4 Turbo
🔢 Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
💾 Vector DB: ChromaDB
🔎 Search: Hybrid (RRF fusion of Vector + BM25)
🧩 Framework: LangChain 0.3
⚙️ Backend: FastAPI
🎨 Frontend: Streamlit
📈 Evaluation: RAGAS 0.2
☁️ Storage: Supabase

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.streamlit		.streamlit
config		config
data/chroma_db		data/chroma_db
src		src
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
sample_document.md		sample_document.md
test_questions_for_sample.csv		test_questions_for_sample.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 SourceTrace

Your Documents. Your Questions. Zero BS.

🛠️ The Tech

🧠 The Brain: GPT-4 Turbo

🔍 The Search Engine: Hybrid Search (Vector + BM25)

📚 The Memory: ChromaDB

🎯 The Proof: Citation Tracking

📊 The Report Card: RAGAS Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 SourceTrace

Your Documents. Your Questions. Zero BS.

🛠️ The Tech

🧠 The Brain: GPT-4 Turbo

🔍 The Search Engine: Hybrid Search (Vector + BM25)

📚 The Memory: ChromaDB

🎯 The Proof: Citation Tracking

📊 The Report Card: RAGAS Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages