Skip to content

peterzdhuang/SecondBrainMCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project: Agentic Second Brain

Local-First RAG with LangChain, MCP, and ChromaDB

Overview

This project is a persistent memory layer for AI agents. It uses the Model Context Protocol (MCP) and LangChain to allow an LLM to search and recall information from personal notes (Obsidian/Notion).

The system is built for speed and privacy, using a local ChromaDB vector store and a custom hashing pipeline to ensure only new or modified notes are processed, significantly reducing API costs.


Technical Process

1. Cost-Optimized Ingestion (MD5 Hashing)

To prevent redundant embedding calls to OpenAI, I implemented a hashing-based ingestion pipeline.

  • Every file is assigned a unique MD5 fingerprint.
  • A local file_hashes.json tracks which files have already been embedded.
  • Only "dirty" (new or changed) files are processed, saving ~90% on token costs for large vaults.

2. Standardized Data Layer (MCP)

I utilized the Model Context Protocol (MCP) to decouple the data source from the AI logic. This ensures that the agent can be plugged into any LLM environment (Cursor, Claude, or custom apps) without rewriting the core data-fetching logic.

3. Vector Memory (ChromaDB)

Using LangChain's ObsidianLoader, I created a RAG (Retrieval-Augmented Generation) pipeline that:

  • Chunks notes semantically to preserve context.
  • Stores high-dimensional vectors in a local ChromaDB.
  • Allows the agent to find information based on "meaning" rather than just keywords.

Core Use Cases

  • Decision Recall: "Why did we choose this architecture last month?"
  • Technical Troubleshooting: Search through personal logs of past bug fixes.
  • SOP Retrieval: "What is my process for setting up a new Python project?"

Tech Stack

  • Management: uv (Rust-based Python manager)
  • Orchestration: LangChain
  • Protocol: MCP (Model Context Protocol)
  • Database: ChromaDB (Local Vector Store)
  • LLM: GPT-4o / Claude 3.5 Sonnet

How to Run

  1. Ensure uv is installed.
  2. Clone the repo and run: uv sync
  3. Add your OBSIDIAN_PATH to a .env file.
  4. Run the ingestion: uv run ingest.py
  5. Chat with your agent: uv run main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages