Comprehensive documentation for installation, configuration, deployment, and operations.
For a quick overview, see the README.
- Data Ingestion
- Agent Names
- Building Shared Knowledge
- Configuration
- Using Local Embeddings
- Production Deployment
- Database Schema
- Project Structure
- Troubleshooting
- Testing
Claw Recall indexes .jsonl session files from two agent platforms:
- OpenClaw —
~/.openclaw/agents/(active) and~/.openclaw/agents-archive/(completed) - Claude Code —
~/.claude/projects/(auto-detected by path and JSON structure)
Real-time indexing (recommended):
python3 -m claw_recall.indexing.watcher # Uses inotify — indexes on every file changeCron-based indexing (alternative):
*/15 * * * * cd /path/to/claw-recall && python3 -m claw_recall.indexing.indexer --source ~/.openclaw/agents-archive/ --incremental --embeddingsRemote machine indexing — for agents on a different machine, the watcher script monitors local session files and pushes them to the Claw Recall server via HTTP:
pip3 install watchdog requests
python3 scripts/cc_session_watcher.pyConfigure the watcher with environment variables:
| Variable | Default | Description |
|---|---|---|
RECALL_SSH_LOCAL_PORT |
18765 |
Local port for SSH tunnel |
RECALL_SSH_REMOTE_HOST |
127.0.0.1 |
Remote bind address |
RECALL_SSH_REMOTE_PORT |
8765 |
Remote Claw Recall port |
RECALL_SSH_HOST |
your-server |
SSH host for tunnel |
python3 -m claw_recall.capture.sources gmail # Poll Gmail
python3 -m claw_recall.capture.sources drive # Poll Google Drive
python3 -m claw_recall.capture.sources slack # Poll Slack
python3 -m claw_recall.capture.sources all # Everything
python3 -m claw_recall.capture.sources status # Show capture statistics
python3 -m claw_recall.capture.sources gmail --backfill --days 90 # Historical importAlready have agent conversations from before Claw Recall? Import them:
# Index all archived sessions (with embeddings for semantic search)
python3 -m claw_recall.indexing.indexer --source ~/.openclaw/agents-archive/ --embeddings
# Incremental re-index (safe to run repeatedly — skips already-indexed files)
python3 -m claw_recall.indexing.indexer --source ~/.openclaw/agents-archive/ --incremental --embeddings
# Backfill embeddings for messages that were indexed without them
python3 scripts/backfill_embeddings.py --limit 2000To skip noisy or unwanted session files during indexing, create an exclude.conf file:
cp exclude.conf.example exclude.conf
# Edit exclude.conf — one glob pattern per lineTo remove already-indexed sessions that match your exclusion patterns:
python3 scripts/cleanup_excluded.py --dry-run # Preview what would be removed
python3 scripts/cleanup_excluded.py # Actually remove themClaw Recall detects agents from session file paths:
| Path Pattern | Agent |
|---|---|
~/.claude/projects/ |
Claude Code → "CC" |
~/.openclaw/agents/<slot>/sessions/ |
OpenClaw → slot name |
~/.openclaw/agents-archive/<slot>-*.jsonl |
OpenClaw → slot name |
Customize display names in agents.json:
cp agents.json.example agents.json{
"agent_names": {
"main": "Butler",
"assistant": "Helper",
"claude-code": "CC"
}
}Both slot IDs and display names work in search queries:
./recall "deployment" --agent main # Resolves to display name
./recall "deployment" --agent Butler # Direct matchAgents should proactively capture insights whenever they discover something useful. This builds a shared knowledge base that every agent can search:
# Agent discovers a database gotcha
./recall capture "SQLite PRAGMA journal_mode=WAL must be set before any concurrent reads"
# Agent finds an API limitation
./recall capture "Rate limit on /api/search is 60 req/min — batch requests for bulk data"
# Via MCP
mcp__claw-recall__capture_thought content="pytest session-scoped fixtures share state — use function scope for isolation" agent="my-agent"Capture: Reusable insights, working solutions, gotchas, API discoveries, tool limitations. Skip: Session-specific minutiae, temporary state, things already in documentation.
All settings are configured via environment variables. Store them in .env or a systemd EnvironmentFile.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | Enables semantic search (~$0.02 per 30K messages) |
CLAW_RECALL_DB |
./convo_memory.db |
SQLite database path |
CLAW_RECALL_AGENT_DIRS |
— | Colon-separated agent workspace dirs for file search |
CLAW_RECALL_REMOTE_HOME |
— | Remote machine home dir (for agent detection in HTTP-pushed sessions) |
| Variable | Default | Description |
|---|---|---|
CLAW_RECALL_EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model name |
CLAW_RECALL_EMBEDDING_DIM |
1536 |
Embedding dimensions |
CLAW_RECALL_EMBEDDING_BATCH |
20 |
Batch size for embedding API calls |
CLAW_RECALL_MIN_CONTENT_LENGTH |
20 |
Minimum message length to embed |
| Variable | Default | Description |
|---|---|---|
CLAW_RECALL_WEB_HOST |
127.0.0.1 |
Web API bind address |
CLAW_RECALL_WEB_PORT |
8765 |
Web API port |
MCP_SSE_HOST |
0.0.0.0 |
MCP SSE bind address |
MCP_SSE_PORT |
8766 |
MCP SSE port |
MCP_SSE_ALLOWED_HOSTS |
— | Comma-separated additional allowed origins for SSE |
The health check script (scripts/health-check.sh) is configured via environment variables passed in the cron job:
| Variable | Description |
|---|---|
CLAW_RECALL_SSE_URL |
URL to test MCP SSE endpoint |
CLAW_RECALL_WEB_URL |
URL to test Web API endpoint |
CLAW_RECALL_DB |
Database path for index freshness check |
CLAW_RECALL_ALERT_SCRIPT |
Path to alert script (receives title + message args) |
Any OpenAI-compatible embedding endpoint works — Ollama, vLLM, or text-embeddings-inference:
export OPENAI_BASE_URL="http://localhost:11434/v1" # Ollama
export OPENAI_API_KEY="not-needed" # Required by SDK but unused
export CLAW_RECALL_EMBEDDING_MODEL="nomic-embed-text"
export CLAW_RECALL_EMBEDDING_DIM="768"Common models:
| Model | Dimensions | Provider |
|---|---|---|
| text-embedding-3-small | 1536 | OpenAI (default) |
| nomic-embed-text | 768 | Ollama |
| mxbai-embed-large | 1024 | Ollama |
| all-MiniLM-L6-v2 | 384 | HuggingFace / TEI |
Note: If you change the embedding model after indexing, run
python3 scripts/backfill_embeddings.pyto regenerate embeddings with the new model. Existing embeddings from a different model will produce poor semantic search results.
For always-on operation, run Claw Recall as systemd services that auto-start on boot. Three services cover the full stack:
| Service | What It Runs | Port |
|---|---|---|
claw-recall-watcher |
Real-time file indexing via inotify | — |
claw-recall-web |
REST API + web UI | 8765 |
claw-recall-mcp |
MCP SSE server for remote agents | 8766 |
1. Create your environment file with your settings:
# Copy the example and edit it
cp /path/to/claw-recall/.env.example /path/to/claw-recall/.env
# Edit .env — at minimum, set OPENAI_API_KEY if you want semantic search2. Create the service files. In each file below, replace YOUR_USERNAME with your Linux username and /path/to/claw-recall with the actual path to the cloned repo.
/etc/systemd/system/claw-recall-web.service:
[Unit]
Description=Claw Recall Web API
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/claw-recall
EnvironmentFile=/path/to/claw-recall/.env
ExecStart=/usr/bin/python3 -m claw_recall.api.web --host 127.0.0.1 --port 8765
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target/etc/systemd/system/claw-recall-mcp.service:
[Unit]
Description=Claw Recall MCP SSE Server
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/claw-recall
EnvironmentFile=/path/to/claw-recall/.env
ExecStart=/usr/bin/python3 -m claw_recall.api.mcp_sse
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target/etc/systemd/system/claw-recall-watcher.service:
[Unit]
Description=Claw Recall Session File Watcher
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/claw-recall
EnvironmentFile=/path/to/claw-recall/.env
ExecStart=/usr/bin/python3 -m claw_recall.indexing.watcher
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target3. Enable and start the services:
sudo systemctl daemon-reload
sudo systemctl enable --now claw-recall-web claw-recall-mcp claw-recall-watcher4. Verify they're running:
sudo systemctl status claw-recall-web # Should show "active (running)"
sudo systemctl status claw-recall-mcp # Should show "active (running)"
sudo systemctl status claw-recall-watcher # Should show "active (running)"
# Check logs if something went wrong:
sudo journalctl -u claw-recall-web -n 20These services will now auto-start on boot, restart if they crash, and log to the system journal.
# Check service health (MCP SSE, Web API, watcher, indexing pipeline)
bash scripts/health-check.sh
# Run via cron (every 15 minutes)
*/15 * * * * CLAW_RECALL_SSE_URL=http://localhost:8766/sse CLAW_RECALL_WEB_URL=http://localhost:8765/status /bin/bash /path/to/claw-recall/scripts/health-check.sh# External source polling
*/15 * * * * cd /path/to/claw-recall && python3 -m claw_recall.capture.sources gmail --quiet
*/30 * * * * cd /path/to/claw-recall && python3 -m claw_recall.capture.sources slack --quiet
0 */2 * * * cd /path/to/claw-recall && python3 -m claw_recall.capture.sources drive --quiet
# Backfill any messages missing embeddings
*/30 * * * * cd /path/to/claw-recall && python3 scripts/backfill_embeddings.py --limit 2000 --quiet
# Health check
*/15 * * * * /bin/bash /path/to/claw-recall/scripts/health-check.shSQLite with WAL mode. Created automatically on first use.
| Table | Purpose |
|---|---|
sessions |
Conversation metadata (agent, timestamps, source file) |
messages |
Individual messages with FTS5 full-text index |
embeddings |
Semantic vectors (1536-dim default, float32) |
thoughts |
Captured notes, emails, documents with FTS5 index |
thought_embeddings |
Thought semantic vectors |
capture_log |
External source tracking (prevents re-ingestion) |
index_log |
Session file indexing tracking (prevents re-indexing) |
claw-recall/
recall # Bash CLI wrapper
requirements.txt # Python dependencies
agents.json.example # Agent name mapping template
exclude.conf.example # Session exclusion template
claw_recall/ # Python package (all source code)
config.py # Settings: DB path, embedding config, server ports
database.py # Connection manager, schema initialization
cli.py # CLI entry point (search, recent, capture)
search/
engine.py # Keyword (FTS5) + semantic (cosine) search
files.py # Markdown file search across agent workspaces
capture/
thoughts.py # Thought capture with embeddings
sources.py # Gmail, Google Drive, Slack polling
indexing/
indexer.py # Session file indexer (.jsonl -> DB)
watcher.py # Real-time watchdog daemon (inotify)
api/
web.py # Flask HTTP API + web UI (port 8765)
mcp_stdio.py # MCP server — stdio transport (local agents)
mcp_sse.py # MCP server — SSE/HTTP transport (remote agents)
scripts/
cc_session_watcher.py # Remote machine watcher (push via HTTP)
backfill_embeddings.py # Batch embed messages missing embeddings
cleanup_excluded.py # Remove excluded sessions from DB
health-check.sh # Service health monitoring
quick-index.sh # Manual re-index script
hooks/
quick-index.sh # Hook-triggered incremental index
full-index.sh # Full re-index of all archives
tests/
test_claw_recall.py # 123 unit tests
templates/ # Web UI Jinja templates
docs/ # Documentation and screenshots
All components are invoked as Python modules, not script files:
| Component | Command |
|---|---|
| CLI | python3 -m claw_recall.cli "query" (or ./recall "query") |
| Web API | python3 -m claw_recall.api.web --host 127.0.0.1 --port 8765 |
| MCP stdio | python3 -m claw_recall.api.mcp_stdio |
| MCP SSE | python3 -m claw_recall.api.mcp_sse |
| Indexer | python3 -m claw_recall.indexing.indexer --source /path --incremental --embeddings |
| Watcher | python3 -m claw_recall.indexing.watcher |
| Source capture | python3 -m claw_recall.capture.sources gmail |
The database is created automatically when you first run any command. If you see a "database not found" error, check the CLAW_RECALL_DB environment variable — it may point to a non-existent path.
- Verify the config is in
~/.claude.json(not~/.claude/settings.json) - Check the SSE server is running:
curl -s --max-time 3 http://your-server:8766/sse - Restart Claude Code after adding/changing MCP configs
- Check for project-level overrides in
~/.claude.jsonunderprojects.<path>.mcpServers
- Check the database has data:
curl http://localhost:8765/status - Try keyword mode explicitly:
./recall "query" --keyword - For agent-filtered searches, use display names (from
agents.json), not internal slot IDs
- Check the service is running:
sudo systemctl status claw-recall-watcher - Check logs:
sudo journalctl -u claw-recall-watcher -n 30 - Manual test:
python3 -m claw_recall.indexing.indexer --source ~/.openclaw/agents-archive/ --incremental --embeddings
- Check the process:
ps aux | grep cc_session_watcher - Check the SSH tunnel: the watcher manages its own tunnel
- Test the VPS endpoint:
curl http://your-server:8765/index-sessionshould return 400 "No file provided"
- Check
OPENAI_API_KEYis set (orOPENAI_BASE_URLfor local models) - Check embedding count:
curl http://localhost:8765/status—db_embeddingsshould be > 0 - If embeddings are missing, run:
python3 scripts/backfill_embeddings.py --limit 2000
cd /path/to/claw-recall
python3 -m pytest tests/test_claw_recall.py -v # All 123 tests
python3 -m pytest tests/test_claw_recall.py -v -k browse # Browse recent tests
python3 -m pytest tests/test_claw_recall.py -v -k capture # Capture tests
python3 -m pytest tests/test_claw_recall.py -v -k search # Search tests
python3 -m pytest tests/test_claw_recall.py -v -k mcp # MCP tests
python3 -m pytest tests/test_claw_recall.py -v -k source # Source capture tests
python3 -m pytest tests/test_claw_recall.py -v -k watcher # Watcher helper testsNo external services needed — tests use an isolated in-memory database.