Claude Code MCP server that gives your AI agent a local vector database for RAG knowledge retrieval and long-term memory — powered by zvec.
- What it does
- Installation
- Registering with Claude Code
- Embedding backends
- Configuration reference
- Tools reference
- Usage examples
- Architecture
- Data storage
- Troubleshooting
- License
zvec-mcp runs as a Model Context Protocol server over stdio. Once connected, Claude Code gets 11 new tools for storing, searching, and managing a local vector database — completely offline, no data leaves your machine.
- Python 3.10 or later
- macOS, Linux, or Windows
- An embedding source (one of):
- A local server like LM Studio, Ollama, or vLLM (recommended)
- sentence-transformers installed locally
- An OpenAI API key
The fastest way to get started. Download the pre-built .mcpb file and drag it into Claude Code.
- Download
zvec-mcp-0.1.0.mcpbfrom the latest release - Open Claude Code
- Go to the Extensions tab
- Drag the
.mcpbfile into the drop zone - Configure your embedding backend in the settings that appear
Note: The
.mcpbbundle defaults to the HTTP embedding backend and does not include sentence-transformers. To use local embeddings, install via pip instead (Option B).
pip install zvec-mcpThis installs the zvec-mcp command-line entry point. To also get local offline embeddings:
pip install zvec-mcp sentence-transformersFor OpenAI embeddings:
pip install "zvec-mcp[openai]"git clone https://github.com/cluster2600/zvec-mcp.git
cd zvec-mcp
# Create a virtual environment and install
uv venv --python 3.10
uv pip install -e .
# Optional: local embeddings
uv pip install sentence-transformers
# Optional: OpenAI embeddings
uv pip install openaiAfter installing, you need to tell Claude Code where to find the server.
claude mcp add-json zvec-mcp \
'{"type":"stdio","command":"zvec-mcp","args":[],"env":{"ZVEC_MCP_DATA_DIR":"'"$HOME"'/.zvec-mcp","ZVEC_MCP_EMBEDDING":"http"}}' \
--scope userIf you installed from source, use the full path to the venv binary:
claude mcp add-json zvec-mcp \
'{"type":"stdio","command":"'"$(pwd)"'/.venv/bin/zvec-mcp","args":[],"env":{"ZVEC_MCP_DATA_DIR":"'"$HOME"'/.zvec-mcp","ZVEC_MCP_EMBEDDING":"http"}}' \
--scope userCreate or edit .mcp.json in your project root:
{
"mcpServers": {
"zvec-mcp": {
"command": "zvec-mcp",
"args": [],
"env": {
"ZVEC_MCP_DATA_DIR": "${HOME}/.zvec-mcp",
"ZVEC_MCP_EMBEDDING": "http",
"ZVEC_MCP_HTTP_URL": "http://127.0.0.1:1234/v1/embeddings",
"ZVEC_MCP_HTTP_MODEL": "text-embedding-nomic-embed-text-v1.5@f16",
"ZVEC_MCP_HTTP_DIM": "768"
}
}
}
}Replace
"zvec-mcp"with the full path if the command is not on yourPATH(e.g./path/to/zvec-mcp/.venv/bin/zvec-mcp).
claude mcp list
# zvec-mcp: zvec-mcp - ✓ ConnectedYou can also check from inside Claude Code by asking:
"What's the status of zvec-mcp?"
Claude will call the zvec_status tool and display the backend, dimensions, and collection stats.
zvec-mcp supports three embedding backends. You choose one via the ZVEC_MCP_EMBEDDING environment variable.
Calls any OpenAI-compatible /v1/embeddings endpoint. This is the recommended setup — it keeps embeddings local, runs on GPU if available, and requires no heavy Python dependencies (uses Python's built-in urllib).
Setup with LM Studio:
- Install LM Studio
- Download an embedding model (e.g.
nomic-embed-text-v1.5) - Start the local server (LM Studio → Developer → Start Server)
- Note the port (default
1234) and API key if you enabled authentication
{
"mcpServers": {
"zvec-mcp": {
"command": "zvec-mcp",
"args": [],
"env": {
"ZVEC_MCP_EMBEDDING": "http",
"ZVEC_MCP_HTTP_URL": "http://127.0.0.1:1234/v1/embeddings",
"ZVEC_MCP_HTTP_MODEL": "text-embedding-nomic-embed-text-v1.5@f16",
"ZVEC_MCP_HTTP_API_KEY": "your-api-key",
"ZVEC_MCP_HTTP_DIM": "768"
}
}
}
}Setup with Ollama:
- Install Ollama
- Pull an embedding model:
ollama pull nomic-embed-text - Ollama serves on port
11434by default
{
"env": {
"ZVEC_MCP_EMBEDDING": "http",
"ZVEC_MCP_HTTP_URL": "http://127.0.0.1:11434/v1/embeddings",
"ZVEC_MCP_HTTP_MODEL": "nomic-embed-text",
"ZVEC_MCP_HTTP_DIM": "768"
}
}Important: The
ZVEC_MCP_HTTP_DIMvalue must match the output dimension of your chosen model. Common values:768for nomic-embed-text,384for all-MiniLM-L6-v2,1024for BGE-large.
Runs all-MiniLM-L6-v2 entirely offline on your machine. 384 dimensions, uses MPS acceleration on Apple Silicon. No API key needed.
Requires: pip install sentence-transformers (pulls in PyTorch, ~500 MB)
{
"env": {
"ZVEC_MCP_EMBEDDING": "local"
}
}The model is downloaded automatically on first use (~80 MB, cached in ~/.cache/torch/).
Uses the OpenAI Embeddings API. Requires a valid API key.
{
"env": {
"ZVEC_MCP_EMBEDDING": "openai",
"OPENAI_API_KEY": "sk-..."
}
}You can customize the model and dimension:
{
"env": {
"ZVEC_MCP_EMBEDDING": "openai",
"OPENAI_API_KEY": "sk-...",
"ZVEC_MCP_OPENAI_MODEL": "text-embedding-3-large",
"ZVEC_MCP_OPENAI_DIM": "3072"
}
}All settings are driven by environment variables, passed through the env block in your MCP config.
| Variable | Default | Description |
|---|---|---|
ZVEC_MCP_DATA_DIR |
~/.zvec-mcp |
Root directory for vector database collections |
ZVEC_MCP_EMBEDDING |
local |
Embedding backend: local, openai, or http |
ZVEC_MCP_CHUNK_SIZE |
512 |
Characters per chunk for knowledge ingestion |
ZVEC_MCP_CHUNK_OVERLAP |
64 |
Overlap between consecutive chunks |
| HTTP backend | ||
ZVEC_MCP_HTTP_URL |
http://127.0.0.1:1234/v1/embeddings |
OpenAI-compatible embedding endpoint |
ZVEC_MCP_HTTP_MODEL |
text-embedding-nomic-embed-text-v1.5@f16 |
Model name sent to the endpoint |
ZVEC_MCP_HTTP_API_KEY |
— | Bearer token (optional, depends on your server) |
ZVEC_MCP_HTTP_DIM |
768 |
Embedding dimension (must match your model) |
| OpenAI backend | ||
OPENAI_API_KEY |
— | Required when using OpenAI backend |
ZVEC_MCP_OPENAI_MODEL |
text-embedding-3-small |
OpenAI model name |
ZVEC_MCP_OPENAI_DIM |
1536 |
Embedding dimension |
| Tool | Description |
|---|---|
knowledge_ingest |
Chunk, embed, and store text for later retrieval. Accepts text and an optional source label. |
knowledge_ingest_file |
Read a file from disk and ingest its contents. Accepts a file path. |
knowledge_search |
Semantic search over ingested documents. Accepts a query and optional topk (default 5). |
knowledge_delete_source |
Remove all chunks from a given source. |
knowledge_stats |
Return collection statistics (doc count, embedding dimension, path). |
| Tool | Description |
|---|---|
memory_remember |
Store a fact, preference, or observation. Accepts text and optional category (default "general"). |
memory_recall |
Semantic recall by query. Accepts query, optional topk (default 5), optional category filter. |
memory_forget |
Delete a specific memory by its memory_id. |
memory_forget_category |
Delete all memories in a category. |
memory_stats |
Return memory store statistics. |
| Tool | Description |
|---|---|
zvec_status |
Return server config, embedding backend, and stats for both collections. |
Once registered, you can use the tools naturally in conversation with Claude Code:
Ingest documentation for RAG:
"Ingest the contents of
docs/api-reference.mdinto the knowledge base."
Claude will call knowledge_ingest_file with that path. You can then ask questions like:
"How does the authentication flow work?"
Claude will call knowledge_search to retrieve relevant chunks and use them in its answer.
Store project context as memory:
"Remember that this project uses PostgreSQL 16 and the primary database is called
appdb."
Claude will call memory_remember with category "project". Later, in any conversation:
"What database does this project use?"
Claude will call memory_recall and retrieve the stored fact.
Organize memories by category:
Categories help you organize different types of information:
"preference"— coding style, tool choices, naming conventions"project"— architecture decisions, database schemas, API contracts"person"— team member roles, contact info"decision"— past decisions with rationale
"Remember that we decided to use JWT for auth instead of sessions — the main reason was stateless scaling."
Claude stores this with the "decision" category.
Clean up:
"Delete all knowledge from source
docs/old-api.md.""Forget all memories in the
projectcategory."
┌─────────────┐ stdio/JSON-RPC ┌──────────────────────────┐
│ Claude Code │◄───────────────────►│ zvec-mcp (FastMCP) │
│ (client) │ │ │
└─────────────┘ │ ┌──────────┐ │
│ │ knowledge │ chunk → │
│ │ manager │ embed → │
│ │ │ store │
│ └─────┬────┘ │
│ │ │
│ ┌─────▼────┐ │
│ │ zvec │ vector │
│ │collection │ search │
│ └─────┬────┘ │
│ │ │
│ ┌─────▼────┐ │
│ │ memory │ remember │
│ │ manager │ recall │
│ └──────────┘ │
│ │
│ ┌──────────┐ │
│ │embeddings│ local / │
│ │ (lazy) │ openai / │
│ │ │ http │
│ └──────────┘ │
└──────────────────────────┘
src/zvec_mcp/
├── server.py # FastMCP server — 11 tools registered via @mcp.tool()
├── config.py # Env-driven dataclass, reads all ZVEC_MCP_* variables
├── embeddings.py # Lazy-loaded embedding singleton (local, OpenAI, or HTTP)
├── knowledge.py # RAG pipeline: chunk → embed → upsert → query
└── memory.py # Semantic memory: remember → recall → forget
- Text is split into overlapping chunks at sentence boundaries (configurable size and overlap)
- Each chunk is embedded via the configured backend
- Chunks are stored in a zvec
Collectionwith fields:source,chunk_idx,text,created_at - Chunk IDs are content-addressed (
sha256(source:idx)) — re-ingesting the same source updates in place
- Each fact/observation is embedded and stored with a category tag
- Memory IDs are content-addressed (
sha256(text)) — storing the same text twice is a no-op (deduplication) - Recall uses cosine similarity with optional category filtering
- Categories let you organize memories (e.g.
preference,project,person,decision)
All data lives under ZVEC_MCP_DATA_DIR (default ~/.zvec-mcp/):
~/.zvec-mcp/
├── knowledge/ # zvec collection for RAG chunks
└── memory/ # zvec collection for memories
Collections are created automatically on first use. To reset everything, delete the directory:
rm -rf ~/.zvec-mcp# Check MCP registration
claude mcp list
# Test the server directly (should print JSON-RPC on stdout)
zvec-mcp
# Or from source:
/path/to/zvec-mcp/.venv/bin/zvec-mcpIf the command is not found, make sure the installation directory is on your PATH, or use the full path in your MCP config.
- Connection refused: Make sure your LM Studio / Ollama / vLLM server is running
- 401 Unauthorized: Set
ZVEC_MCP_HTTP_API_KEYto your server's API key - Dimension mismatch: If you get errors when searching after changing models, delete your data directory (
rm -rf ~/.zvec-mcp) and re-ingest — you cannot mix embeddings of different dimensions in the same collection
The all-MiniLM-L6-v2 model (~80 MB) is downloaded on first use. Subsequent runs load from cache (~/.cache/torch/).
The local backend requires sentence-transformers. Install it:
pip install sentence-transformersOr switch to the http backend if you have LM Studio or Ollama running.
To start fresh, delete the data directory:
rm -rf ~/.zvec-mcpCollections are recreated automatically on next use.
To build the desktop extension from source:
# Install mcpb CLI
npm install -g @anthropic-ai/mcpb
# Install bundled dependencies (core only, no torch/sentence-transformers)
# IMPORTANT: let pip resolve versions — do NOT use --no-deps
pip install --target bundle/server/lib "mcp[cli]>=1.0.0" zvec numpy
# Copy zvec_mcp source into the bundle
cp -r src/zvec_mcp bundle/server/lib/zvec_mcp
# Strip caches (keep .dist-info — needed for importlib.metadata)
cd bundle/server/lib
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find . -name "*.pyc" -delete 2>/dev/null
cd ../../..
# Validate manifest and pack
mcpb validate bundle/manifest.json
mcpb pack bundle/ zvec-mcp-0.1.0.mcpbDo not strip
.dist-infodirectories — the MCP SDK usesimportlib.metadata.version("mcp")at import time and will fail without them. Do not use--no-deps— pydantic and pydantic-core have strict version coupling and must be resolved together.
| Package | Role |
|---|---|
| zvec | Vector database engine |
| mcp | Model Context Protocol SDK (FastMCP) |
| sentence-transformers | Local embeddings (optional) |
| openai | OpenAI embeddings (optional) |
Apache-2.0