This project is a persistent memory layer for AI agents. It uses the Model Context Protocol (MCP) and LangChain to allow an LLM to search and recall information from personal notes (Obsidian/Notion).
The system is built for speed and privacy, using a local ChromaDB vector store and a custom hashing pipeline to ensure only new or modified notes are processed, significantly reducing API costs.
To prevent redundant embedding calls to OpenAI, I implemented a hashing-based ingestion pipeline.
- Every file is assigned a unique MD5 fingerprint.
- A local
file_hashes.jsontracks which files have already been embedded. - Only "dirty" (new or changed) files are processed, saving ~90% on token costs for large vaults.
I utilized the Model Context Protocol (MCP) to decouple the data source from the AI logic. This ensures that the agent can be plugged into any LLM environment (Cursor, Claude, or custom apps) without rewriting the core data-fetching logic.
Using LangChain's ObsidianLoader, I created a RAG (Retrieval-Augmented Generation) pipeline that:
- Chunks notes semantically to preserve context.
- Stores high-dimensional vectors in a local ChromaDB.
- Allows the agent to find information based on "meaning" rather than just keywords.
- Decision Recall: "Why did we choose this architecture last month?"
- Technical Troubleshooting: Search through personal logs of past bug fixes.
- SOP Retrieval: "What is my process for setting up a new Python project?"
- Management: uv (Rust-based Python manager)
- Orchestration: LangChain
- Protocol: MCP (Model Context Protocol)
- Database: ChromaDB (Local Vector Store)
- LLM: GPT-4o / Claude 3.5 Sonnet
- Ensure
uvis installed. - Clone the repo and run:
uv sync - Add your
OBSIDIAN_PATHto a.envfile. - Run the ingestion:
uv run ingest.py - Chat with your agent:
uv run main.py