A portable, local-first RAG memory layer for Claude Code projects. Gives Claude semantic search over your project's documentation, decisions, and history — using local models, with no external API dependencies.
Think of it as long-term memory that persists across Claude Code sessions, queryable via MCP.
I use Claude Code for more than writing code. I run specialized agents that maintain context across infrastructure, documentation, content pipelines, and operational workflows — sometimes six or more projects simultaneously. Each project accumulates hundreds of markdown files: architecture decisions, task logs, lessons learned, archived roadmaps.
Grepping through all of that wastes tokens and misses semantic matches. "What did we decide about the auth flow?" doesn't match "JWT was chosen over session tokens because..." — not with grep, anyway.
So I built institutional memory for AI agents. pmem indexes your project's documentation into a local vector store, and Claude queries it by meaning instead of by keyword. No data leaves your machine. Setup takes two minutes.
Read more about the methodology behind this: Cognitive Offloading and The Governance Documents.
Most Claude Code memory tools — claude-mem, claude-brain, supermemory — solve session continuity: what did Claude do last time? They capture Claude's actions, compress conversation history, and replay it into future sessions.
pmem solves a different problem: what does the project know?
Your project has architecture decisions, task logs, lessons learned, archived roadmaps, and governance documents accumulated over months. That institutional knowledge exists in files, not in session transcripts. When you ask "why did we choose this auth approach?" the answer isn't in what Claude did yesterday — it's in an ADR you wrote three months ago.
| Session memory tools | pmem | |
|---|---|---|
| Remembers | What Claude did | What the project documented |
| Source data | Session transcripts, tool usage | Markdown, text, code files in your repo |
| Search method | Keyword / hybrid over sessions | Semantic (vector) search over project docs |
| Requires | Cloud API or session capture hooks | Local only — Ollama + ChromaDB, no API keys |
| Use case | "Continue where we left off" | "What did we decide about X six months ago?" |
pmem doesn't replace session memory. It fills the gap that session memory can't: retrieving decisions, context, and rationale from your project's documentation by meaning, not by keyword.
Same query — "identify governance-related blog posts" — run against a project with 500+ markdown files:
| pmem (index-based) | Fresh search (Explore agent) | |
|---|---|---|
| Results found | 18 posts | 11 posts |
| Time | ~20 seconds | ~90 seconds |
| Token cost | ~5,500 | ~20,000–24,000 |
| Missed | — | 7 posts (governance as supporting theme) |
The fresh search cost roughly 4× the tokens and found 7 fewer results. The posts it missed were the ones where governance was woven into the argument without being the headline topic — exactly the kind of semantic connection that keyword search can't make.
For the full breakdown — architecture decisions, the prompt that built it, and token cost analysis — read the build story on the blog.
Claude Code → MCP tool call → pmem server
↓
embed query (Ollama)
↓
search ChromaDB (local)
↓
(optional) synthesize answer via local LLM
↓
return answer + sources to Claude
pmem indexes your project's markdown and text files into a local vector database (ChromaDB). When Claude needs context, it queries the memory via MCP tools — no copy-pasting, no manual file pointing.
- Python 3.11+
- Ollama installed and running
Pull the embedding model (~274MB, one-time):
ollama pull nomic-embed-textPyPI package coming soon. Once published, installation will be just
pip install pmem. For now, install from source:
git clone https://github.com/avanrossum/pmem-project-memory-tool-for-claude.git
cd pmem-project-memory-tool-for-claude
pip install -e .
pmem install-skillsAdd to ~/.claude.json (global, all projects) or .mcp.json (per-project):
{
"mcpServers": {
"project-memory": {
"command": "/full/path/to/pmem",
"args": ["serve"]
}
}
}Important: Use the full path to
pmem, not just"pmem". Claude Code spawns MCP servers as subprocesses without your shell profile, so pyenv shims and other version managers won't work. Runwhich pmemto get the path.pmem initprints the correct snippet automatically.Note: MCP servers go in
~/.claude.jsonor.mcp.json, NOT in~/.claude/settings.json(which is for permissions and hooks only).
cd ~/your-project
pmem init
pmem indexThat's it. Claude Code can now query your project's memory.
pmem init Create .memory/config.json with sensible defaults
pmem index Incremental index (only changed files)
pmem index --force Full reindex (re-embed everything)
pmem index --dry-run Show what would be indexed
pmem query "your question" Query memory from the terminal
pmem query "..." --no-llm Return raw chunks (no LLM synthesis)
pmem status Show index state, stale files, config
pmem exclude "snapshots/**" Add a pattern to the exclude list
pmem include "**/*.py" Add a pattern to the include list
pmem serve Start the MCP server (used by Claude Code)
pmem config Print current config
pmem config --edit Open config in $EDITOR
pmem config --global Show global config
pmem config --init-global Create global config at ~/.config/pmem/config.json
pmem watch Poll for changes and reindex automatically (every 5s)
pmem install-skills Install /welcome, /sleep, /reindex to Claude Code
pmem install-skills --link Symlink instead of copy (macOS/Linux)
Note: Don't run
pmem indexfrom the terminal while Claude Code is active on the same project — use thememory_reindexMCP tool (or/reindexskill) instead.pmem watchuses polling (not filesystem events) so it works reliably on all platforms.
Once registered, Claude Code has access to four tools:
| Tool | Description |
|---|---|
memory_query |
Ask a natural language question — retrieves relevant chunks and optionally synthesizes an answer via a local LLM |
memory_search |
Search for matching chunks with source locations (no synthesis) |
memory_status |
Check index state: file count, chunk count, stale files, config |
memory_reindex |
Trigger a reindex from within Claude Code |
pmem init creates .memory/config.json in your project root:
{
"project_name": "my-project",
"embedding": {
"endpoint": "http://localhost:11434",
"model": "nomic-embed-text",
"provider": "ollama"
},
"llm": {
"endpoint": "http://localhost:1234/v1",
"model": "local-model",
"provider": "openai_compatible",
"enabled": false
},
"indexing": {
"include": ["**/*.md", "**/*.txt"],
"exclude": [".memory/**", "**/.git/**", "**/node_modules/**", "*.lock"],
"chunk_size": 400,
"chunk_overlap": 80,
"split_on_headers": true
},
"query": {
"top_k": 8,
"auto_reindex_on_query": false
},
"update_channel": "stable"
}| Provider | Config | Notes |
|---|---|---|
ollama (default) |
endpoint: "http://localhost:11434" |
Uses /api/embed (batch). Free, local. |
openai_compatible |
Any OpenAI-compatible endpoint | Uses /v1/embeddings. Works with LMStudio, vLLM, etc. |
When used via MCP with Claude Code, LLM synthesis is unnecessary — Claude interprets the raw chunks directly. Synthesis is disabled by default.
For standalone terminal use (pmem query), you can enable synthesis by setting llm.enabled: true and pointing at any OpenAI-compatible endpoint (LMStudio, Ollama's OpenAI mode, vLLM, etc.). This sends retrieved chunks to a local LLM for a summarized answer.
include— glob patterns for files to index (default:**/*.md,**/*.txt)exclude— glob patterns to skip (default:.memory/**,.git/**,node_modules/**,*.lock)chunk_size— target chunk size in words (default: 400)chunk_overlap— overlap between chunks in words (default: 80)split_on_headers— split markdown at H1/H2/H3 boundaries before splitting by size (default: true). When a section is too large for a single chunk, it's split by size — but each sub-chunk retains the heading path from its parent section, so query results always show where in the document a chunk came from.
pmem indexes markdown and text files by default, but you can add any file type:
pmem include "**/*.py"
pmem include "**/*.js"
pmem include "**/*.apex"This writes to your project's .memory/config.json — it only affects the current project, not other projects using pmem.
Non-markdown files are chunked by size (word count with overlap), since there are no header boundaries to split on. This works well for most code and documentation formats. Language-aware chunking (splitting on function/class boundaries) is on the roadmap but not yet implemented — size-based splitting is good enough for semantic retrieval in practice.
After adding new patterns, reindex to pick up the new files:
pmem indextop_k— number of chunks to retrieve per query (default: 8)auto_reindex_on_query— check for stale files before every query and re-embed if needed (default: false —/welcome,/sleep, andpmem watchhandle freshness)
pmem checks GitHub for new releases once per day and shows a notice when an update is available — both in pmem status output and in MCP tool responses (so Claude will tell you).
By default, only stable releases trigger notifications. To opt into beta (pre-release) notifications:
{
"update_channel": "beta"
}Set this in .memory/config.json (per-project) or ~/.config/pmem/config.json (global).
Warning: Beta releases may contain breaking changes, incomplete features, or bugs. Use at your own risk. If something breaks, pin back to the last stable version with
git checkout v<version> && pip install -e .
your-project/
└── .memory/
├── config.json ← commit this (your project's memory config)
├── chroma/ ← gitignore (generated vector store)
└── index_state.json ← gitignore (file hash registry)
Add to your .gitignore (done automatically by pmem init):
.memory/
Note: Older versions of pmem added individual entries (
.memory/chroma/,.memory/index_state.json). The single.memory/entry is preferred — it catches transient files like lock files that the specific entries miss.
pmem ships with three Claude Code slash command skills:
/welcome— Run at the start of each session. Reads governance files, runs incremental reindex, confirms readiness./sleep— Run at the end of each session. Full governance pass: updates tasks, docs, changelog, memory, and reindexes./reindex— Quick trigger to refresh the memory index mid-session.
# Recommended: use the built-in installer
pmem install-skills
# Or with symlinks (stays in sync with repo, macOS/Linux only)
pmem install-skills --linkOr manually:
# Copy
cp skills/welcome.md ~/.claude/commands/welcome.md
cp skills/sleep.md ~/.claude/commands/sleep.md
cp skills/reindex.md ~/.claude/commands/reindex.md
# Or symlink (macOS/Linux only)
ln -sf "$(pwd)/skills/welcome.md" ~/.claude/commands/welcome.md
ln -sf "$(pwd)/skills/sleep.md" ~/.claude/commands/sleep.md
ln -sf "$(pwd)/skills/reindex.md" ~/.claude/commands/reindex.mdAdd this to any project using pmem so Claude knows it's available:
## Project Memory
This project has a local RAG memory index via `pmem`. Use the `memory_query` MCP tool when:
- Looking for past decisions, context, or rationale ("why did we do X?")
- Searching for historical task context or outcomes
- Finding documented gotchas or lessons learned
Do NOT use memory_query for: reading specific known files, checking current code
state, or anything derivable from `git log`. The index updates at session start
(`/welcome`) and session end (`/sleep`), so it may be slightly behind mid-session.
If results seem stale, run `memory_reindex` to refresh.| Setup | Embedding | LLM synthesis |
|---|---|---|
| Any Mac (even 8GB) | Runs locally — nomic-embed-text is tiny | Point at a remote machine or disable |
| 32GB+ Mac | Runs locally | Run 8B–32B model locally via Ollama/LMStudio |
| Dedicated server (Mac Studio, etc.) | Runs locally | Run 70B+ model, expose via Cloudflare tunnel |
- Local-first — no data leaves your machine. No API keys required.
- Portable — install once globally,
pmem initin any project. - Low friction — setup takes under 2 minutes. Querying is automatic via MCP.
- Minimal dependencies — no LangChain, no LlamaIndex. Just ChromaDB, httpx, click, pathspec, and the MCP SDK.
- Cognitive Offloading — The methodology behind deliberate memory externalization
- The Governance Documents — ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md — the files pmem was built to index
- What Is Pass@1? — The development methodology where governance documents are thorough enough that AI generates correct implementations on the first attempt
Built by Alex van Rossum — systems architect, fractional CTO, and the kind of person who builds tools when the existing ones waste too many tokens.
MIT