A production-ready, microservices-based Retrieval-Augmented Generation (RAG) system that transforms static Obsidian markdown vaults into an interactive, context-aware API powered by Google Gemini 2.5 and serverless vector databases.
(Link: https://github.com/Shreyansh15624/obs-rag/raw/main/video/obs-rag.mp4)
I built this system to supercharge my personal knowledge management workflow while demonstrating a robust, cloud-native backend architecture. As a Python backend developer, transitioning this from a simple local script to a resilient, deployable service utilizing FastAPI, asynchronous concurrency, and stateless compute patterns was a primary goal. It bridges the gap between simple static file storage and intelligent, scalable retrieval.
1. Local Environment Setup
Utilize uv for lightning-fast dependency resolution to eliminate environment mismatches.
git clone [https://github.com/Shreyansh15624/obs-rag](https://github.com/Shreyansh15624/obs-rag)
cd obs-rag
uv sync2. Environment Variables
Create a .env file at the root. Do not use quotation marks around values to ensure cross-platform Docker compatibility:
GOOGLE_API_KEY=your_gemini_key
VAULT_PATH=/path/to/your/obsidian/vault
API_GATEWAY_KEY=your_custom_security_password
QDRANT_API_KEY=your_custom_qdrant_instance_api
QDRANT_END_POINT_URL=https://qdrant's_end_point_url_here3. Data Ingestion & Execution
- Embedding the Local Markdown into Vector DB on the Qdrant Cloud Instance.
# Make Sure to setup Qdrant APIs in the '.env'
# Ingest local markdown into the Qdrant's Vector DB Instance
uv run seed_qdrant.py
# Reason: The current Local Markdown Ingestion Pipeline is being renovated - Start the Backend Server, accessible at localhost:8080/docs
# Spin up the asynchronous FastAPI Backend Server
uv run server.py-
Begin a new Terminal Instance / just open a new Terminal Window / Program.
-
Then go to the project's directory! And, then to the
uidirectory within the project's directory.
cd ui- Start the Frontend UI Process, accessible at localhost:3000/chat
uv run reflex runThe backend cleanly isolates the API to port 8080 (or 8000 locally). All incoming requests are protected by a middleware authentication layer and must pass strict X-API-Key validation before the LangChain agent is initialized.
1. Querying the Vault (POST /chat)
This is the core generative endpoint. It expects a JSON payload containing your question and an optional top_k parameter.
curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-H "X-API-Key: your_custom_security_password" \
-d '{"question": "What is my latest project?", "top_k": 3}'Trick for deep dives: You can maintain conversation context across multiple turns by passing an optional history list (containing message objects) within the JSON request body!
2. Saving Context (POST /api/notes/save)
You can seamlessly write AI interactions or generated summaries back into your local storage by hitting the save endpoint with a JSON payload containing the filename, content, and target folder.
Contributions are highly welcome, especially as this transitions further into a production cloud environment!
Where help is needed most:
- Reflex UI Framework: The local client UI is built with Reflex, but UI development is not the primary focus of this project. Any optimizations, structural improvements, or enhancements to the Reflex codebase are greatly appreciated.
- Render Deployment Connections: While the backend deployment pipeline via Docker and
uv exportis functional, successfully connecting the deployed Reflex frontend client to the FastAPI backend on Render is currently pending. PRs addressing this service connection are a top priority.
How to contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/Optimization). - Commit your changes (
git commit -m 'Add Optimization'). - Push to the branch (
git push origin feature/Optimization). - Open a Pull Request.