Project: Agentic Second Brain

Local-First RAG with LangChain, MCP, and ChromaDB

Overview

This project is a persistent memory layer for AI agents. It uses the Model Context Protocol (MCP) and LangChain to allow an LLM to search and recall information from personal notes (Obsidian/Notion).

The system is built for speed and privacy, using a local ChromaDB vector store and a custom hashing pipeline to ensure only new or modified notes are processed, significantly reducing API costs.

Technical Process

1. Cost-Optimized Ingestion (MD5 Hashing)

To prevent redundant embedding calls to OpenAI, I implemented a hashing-based ingestion pipeline.

Every file is assigned a unique MD5 fingerprint.
A local file_hashes.json tracks which files have already been embedded.
Only "dirty" (new or changed) files are processed, saving ~90% on token costs for large vaults.

2. Standardized Data Layer (MCP)

I utilized the Model Context Protocol (MCP) to decouple the data source from the AI logic. This ensures that the agent can be plugged into any LLM environment (Cursor, Claude, or custom apps) without rewriting the core data-fetching logic.

3. Vector Memory (ChromaDB)

Using LangChain's ObsidianLoader, I created a RAG (Retrieval-Augmented Generation) pipeline that:

Chunks notes semantically to preserve context.
Stores high-dimensional vectors in a local ChromaDB.
Allows the agent to find information based on "meaning" rather than just keywords.

Core Use Cases

Decision Recall: "Why did we choose this architecture last month?"
Technical Troubleshooting: Search through personal logs of past bug fixes.
SOP Retrieval: "What is my process for setting up a new Python project?"

Tech Stack

Management: uv (Rust-based Python manager)
Orchestration: LangChain
Protocol: MCP (Model Context Protocol)
Database: ChromaDB (Local Vector Store)
LLM: GPT-4o / Claude 3.5 Sonnet

How to Run

Ensure uv is installed.
Clone the repo and run: uv sync
Add your OBSIDIAN_PATH to a .env file.
Run the ingestion: uv run ingest.py
Chat with your agent: uv run main.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
ingest.py		ingest.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Agentic Second Brain

Local-First RAG with LangChain, MCP, and ChromaDB

Overview

Technical Process

1. Cost-Optimized Ingestion (MD5 Hashing)

2. Standardized Data Layer (MCP)

3. Vector Memory (ChromaDB)

Core Use Cases

Tech Stack

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project: Agentic Second Brain

Local-First RAG with LangChain, MCP, and ChromaDB

Overview

Technical Process

1. Cost-Optimized Ingestion (MD5 Hashing)

2. Standardized Data Layer (MCP)

3. Vector Memory (ChromaDB)

Core Use Cases

Tech Stack

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages