Skip to content

avanrossum/pmem-project-memory-tool-for-claude-code

Repository files navigation

pmem — Project Memory Tool

A portable, local-first RAG memory layer for Claude Code projects. Gives Claude semantic search over your project's documentation, decisions, and history — using local models, with no external API dependencies.

Think of it as long-term memory that persists across Claude Code sessions, queryable via MCP.

Why this exists

I use Claude Code for more than writing code. I run specialized agents that maintain context across infrastructure, documentation, content pipelines, and operational workflows — sometimes six or more projects simultaneously. Each project accumulates hundreds of markdown files: architecture decisions, task logs, lessons learned, archived roadmaps.

Grepping through all of that wastes tokens and misses semantic matches. "What did we decide about the auth flow?" doesn't match "JWT was chosen over session tokens because..." — not with grep, anyway.

So I built institutional memory for AI agents. pmem indexes your project's documentation into a local vector store, and Claude queries it by meaning instead of by keyword. No data leaves your machine. Setup takes two minutes.

Read more about the methodology behind this: Cognitive Offloading and The Governance Documents.

How pmem differs from session memory tools

Most Claude Code memory tools — claude-mem, claude-brain, supermemory — solve session continuity: what did Claude do last time? They capture Claude's actions, compress conversation history, and replay it into future sessions.

pmem solves a different problem: what does the project know?

Your project has architecture decisions, task logs, lessons learned, archived roadmaps, and governance documents accumulated over months. That institutional knowledge exists in files, not in session transcripts. When you ask "why did we choose this auth approach?" the answer isn't in what Claude did yesterday — it's in an ADR you wrote three months ago.

Session memory tools pmem
Remembers What Claude did What the project documented
Source data Session transcripts, tool usage Markdown, text, code files in your repo
Search method Keyword / hybrid over sessions Semantic (vector) search over project docs
Requires Cloud API or session capture hooks Local only — Ollama + ChromaDB, no API keys
Use case "Continue where we left off" "What did we decide about X six months ago?"

pmem doesn't replace session memory. It fills the gap that session memory can't: retrieving decisions, context, and rationale from your project's documentation by meaning, not by keyword.

Real-world comparison

Same query — "identify governance-related blog posts" — run against a project with 500+ markdown files:

pmem (index-based) Fresh search (Explore agent)
Results found 18 posts 11 posts
Time ~20 seconds ~90 seconds
Token cost ~5,500 ~20,000–24,000
Missed 7 posts (governance as supporting theme)

The fresh search cost roughly 4× the tokens and found 7 fewer results. The posts it missed were the ones where governance was woven into the argument without being the headline topic — exactly the kind of semantic connection that keyword search can't make.

For the full breakdown — architecture decisions, the prompt that built it, and token cost analysis — read the build story on the blog.

How it works

Claude Code → MCP tool call → pmem server
                                  ↓
                        embed query (Ollama)
                                  ↓
                        search ChromaDB (local)
                                  ↓
                        (optional) synthesize answer via local LLM
                                  ↓
                        return answer + sources to Claude

pmem indexes your project's markdown and text files into a local vector database (ChromaDB). When Claude needs context, it queries the memory via MCP tools — no copy-pasting, no manual file pointing.

Quick start

1. Prerequisites

  • Python 3.11+
  • Ollama installed and running

Pull the embedding model (~274MB, one-time):

ollama pull nomic-embed-text

2. Install pmem

PyPI package coming soon. Once published, installation will be just pip install pmem. For now, install from source:

git clone https://github.com/avanrossum/pmem-project-memory-tool-for-claude.git
cd pmem-project-memory-tool-for-claude
pip install -e .
pmem install-skills

3. Register the MCP server

Add to ~/.claude.json (global, all projects) or .mcp.json (per-project):

{
  "mcpServers": {
    "project-memory": {
      "command": "/full/path/to/pmem",
      "args": ["serve"]
    }
  }
}

Important: Use the full path to pmem, not just "pmem". Claude Code spawns MCP servers as subprocesses without your shell profile, so pyenv shims and other version managers won't work. Run which pmem to get the path. pmem init prints the correct snippet automatically.

Note: MCP servers go in ~/.claude.json or .mcp.json, NOT in ~/.claude/settings.json (which is for permissions and hooks only).

4. Initialize in your project

cd ~/your-project
pmem init
pmem index

That's it. Claude Code can now query your project's memory.

CLI reference

pmem init                       Create .memory/config.json with sensible defaults
pmem index                      Incremental index (only changed files)
pmem index --force              Full reindex (re-embed everything)
pmem index --dry-run            Show what would be indexed
pmem query "your question"      Query memory from the terminal
pmem query "..." --no-llm       Return raw chunks (no LLM synthesis)
pmem status                     Show index state, stale files, config
pmem exclude "snapshots/**"     Add a pattern to the exclude list
pmem include "**/*.py"          Add a pattern to the include list
pmem serve                      Start the MCP server (used by Claude Code)
pmem config                     Print current config
pmem config --edit              Open config in $EDITOR
pmem config --global            Show global config
pmem config --init-global       Create global config at ~/.config/pmem/config.json
pmem watch                      Poll for changes and reindex automatically (every 5s)
pmem install-skills             Install /welcome, /sleep, /reindex to Claude Code
pmem install-skills --link      Symlink instead of copy (macOS/Linux)

Note: Don't run pmem index from the terminal while Claude Code is active on the same project — use the memory_reindex MCP tool (or /reindex skill) instead. pmem watch uses polling (not filesystem events) so it works reliably on all platforms.

MCP tools

Once registered, Claude Code has access to four tools:

Tool Description
memory_query Ask a natural language question — retrieves relevant chunks and optionally synthesizes an answer via a local LLM
memory_search Search for matching chunks with source locations (no synthesis)
memory_status Check index state: file count, chunk count, stale files, config
memory_reindex Trigger a reindex from within Claude Code

Configuration

pmem init creates .memory/config.json in your project root:

{
  "project_name": "my-project",
  "embedding": {
    "endpoint": "http://localhost:11434",
    "model": "nomic-embed-text",
    "provider": "ollama"
  },
  "llm": {
    "endpoint": "http://localhost:1234/v1",
    "model": "local-model",
    "provider": "openai_compatible",
    "enabled": false
  },
  "indexing": {
    "include": ["**/*.md", "**/*.txt"],
    "exclude": [".memory/**", "**/.git/**", "**/node_modules/**", "*.lock"],
    "chunk_size": 400,
    "chunk_overlap": 80,
    "split_on_headers": true
  },
  "query": {
    "top_k": 8,
    "auto_reindex_on_query": false
  },
  "update_channel": "stable"
}

Embedding providers

Provider Config Notes
ollama (default) endpoint: "http://localhost:11434" Uses /api/embed (batch). Free, local.
openai_compatible Any OpenAI-compatible endpoint Uses /v1/embeddings. Works with LMStudio, vLLM, etc.

LLM synthesis (optional, disabled by default)

When used via MCP with Claude Code, LLM synthesis is unnecessary — Claude interprets the raw chunks directly. Synthesis is disabled by default.

For standalone terminal use (pmem query), you can enable synthesis by setting llm.enabled: true and pointing at any OpenAI-compatible endpoint (LMStudio, Ollama's OpenAI mode, vLLM, etc.). This sends retrieved chunks to a local LLM for a summarized answer.

Indexing options

  • include — glob patterns for files to index (default: **/*.md, **/*.txt)
  • exclude — glob patterns to skip (default: .memory/**, .git/**, node_modules/**, *.lock)
  • chunk_size — target chunk size in words (default: 400)
  • chunk_overlap — overlap between chunks in words (default: 80)
  • split_on_headers — split markdown at H1/H2/H3 boundaries before splitting by size (default: true). When a section is too large for a single chunk, it's split by size — but each sub-chunk retains the heading path from its parent section, so query results always show where in the document a chunk came from.

Indexing non-markdown files

pmem indexes markdown and text files by default, but you can add any file type:

pmem include "**/*.py"
pmem include "**/*.js"
pmem include "**/*.apex"

This writes to your project's .memory/config.json — it only affects the current project, not other projects using pmem.

Non-markdown files are chunked by size (word count with overlap), since there are no header boundaries to split on. This works well for most code and documentation formats. Language-aware chunking (splitting on function/class boundaries) is on the roadmap but not yet implemented — size-based splitting is good enough for semantic retrieval in practice.

After adding new patterns, reindex to pick up the new files:

pmem index

Query options

  • top_k — number of chunks to retrieve per query (default: 8)
  • auto_reindex_on_query — check for stale files before every query and re-embed if needed (default: false — /welcome, /sleep, and pmem watch handle freshness)

Update notifications

pmem checks GitHub for new releases once per day and shows a notice when an update is available — both in pmem status output and in MCP tool responses (so Claude will tell you).

By default, only stable releases trigger notifications. To opt into beta (pre-release) notifications:

{
  "update_channel": "beta"
}

Set this in .memory/config.json (per-project) or ~/.config/pmem/config.json (global).

Warning: Beta releases may contain breaking changes, incomplete features, or bugs. Use at your own risk. If something breaks, pin back to the last stable version with git checkout v<version> && pip install -e .

What gets created in your project

your-project/
└── .memory/
    ├── config.json        ← commit this (your project's memory config)
    ├── chroma/            ← gitignore (generated vector store)
    └── index_state.json   ← gitignore (file hash registry)

Add to your .gitignore (done automatically by pmem init):

.memory/

Note: Older versions of pmem added individual entries (.memory/chroma/, .memory/index_state.json). The single .memory/ entry is preferred — it catches transient files like lock files that the specific entries miss.

Skills (optional)

pmem ships with three Claude Code slash command skills:

  • /welcome — Run at the start of each session. Reads governance files, runs incremental reindex, confirms readiness.
  • /sleep — Run at the end of each session. Full governance pass: updates tasks, docs, changelog, memory, and reindexes.
  • /reindex — Quick trigger to refresh the memory index mid-session.

Install skills

# Recommended: use the built-in installer
pmem install-skills

# Or with symlinks (stays in sync with repo, macOS/Linux only)
pmem install-skills --link

Or manually:

# Copy
cp skills/welcome.md ~/.claude/commands/welcome.md
cp skills/sleep.md ~/.claude/commands/sleep.md
cp skills/reindex.md ~/.claude/commands/reindex.md

# Or symlink (macOS/Linux only)
ln -sf "$(pwd)/skills/welcome.md" ~/.claude/commands/welcome.md
ln -sf "$(pwd)/skills/sleep.md" ~/.claude/commands/sleep.md
ln -sf "$(pwd)/skills/reindex.md" ~/.claude/commands/reindex.md

Recommended CLAUDE.md snippet

Add this to any project using pmem so Claude knows it's available:

## Project Memory

This project has a local RAG memory index via `pmem`. Use the `memory_query` MCP tool when:
- Looking for past decisions, context, or rationale ("why did we do X?")
- Searching for historical task context or outcomes
- Finding documented gotchas or lessons learned

Do NOT use memory_query for: reading specific known files, checking current code
state, or anything derivable from `git log`. The index updates at session start
(`/welcome`) and session end (`/sleep`), so it may be slightly behind mid-session.

If results seem stale, run `memory_reindex` to refresh.

Hardware notes

Setup Embedding LLM synthesis
Any Mac (even 8GB) Runs locally — nomic-embed-text is tiny Point at a remote machine or disable
32GB+ Mac Runs locally Run 8B–32B model locally via Ollama/LMStudio
Dedicated server (Mac Studio, etc.) Runs locally Run 70B+ model, expose via Cloudflare tunnel

Design principles

  • Local-first — no data leaves your machine. No API keys required.
  • Portable — install once globally, pmem init in any project.
  • Low friction — setup takes under 2 minutes. Querying is automatic via MCP.
  • Minimal dependencies — no LangChain, no LlamaIndex. Just ChromaDB, httpx, click, pathspec, and the MCP SDK.

Related reading

  • Cognitive Offloading — The methodology behind deliberate memory externalization
  • The Governance Documents — ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md — the files pmem was built to index
  • What Is Pass@1? — The development methodology where governance documents are thorough enough that AI generates correct implementations on the first attempt

Author

Built by Alex van Rossum — systems architect, fractional CTO, and the kind of person who builds tools when the existing ones waste too many tokens.

License

MIT

About

Local-first RAG memory for Claude Code. Semantic search over your project's docs, decisions, and history. No external APIs. Setup in 2 minutes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages