Releases: rodbland2021/claw-recall
v2.4.0 — DB Cleanup, Cross-Session Dedup, Ingest Filtering
Database Cleanup & Data Quality Pipeline
This release adds a complete cleanup system for detecting and removing duplicate, noise, and junk data — plus ingest-time prevention to stop bloat before it starts.
Highlights
- Cleanup Web UI (
/cleanup) — scan, review, and delete duplicates, noise, junk, orphaned embeddings, and cross-session copies - Cross-session duplicate detection with similarity scoring (Exact/High/Medium) and expandable visual comparison
- Ingest-time prevention — noise messages filtered at indexing, cross-session dedup prevents same session from being indexed twice
- Quick-action delete buttons with chunked progress for all categories
- Snapshot cache for instant page loads (0.3s cached, 1.5s fresh)
Data Quality Pipeline (3 layers)
- Ingest filtering — noise content + cross-session dedup at indexing time
- File exclusions — configurable patterns via
exclude.conf - Cleanup UI — on-demand detection and removal with visual review
See the full changelog for details.
v2.3.0 — context_chars param, 12x faster startup, README features
Added
context_charsparameter forsearch_memory— control result context length (default 500, max 2000)- Disk cache for embedding matrix — 12x faster startup (80s → 6.5s)
- Key Features section in README
Fixed
- Health check no longer breaks active MCP sessions
- Stateless HTTP mode for MCP server — eliminates session tracking errors
- Search reliability: MCP preloads cache on startup, health check validates results
Changed
- Discord server management scripts moved to separate repo
- PA review fixes: atomic disk writes, count accuracy, health check scope
Full changelog: CHANGELOG.md
v2.2.1 — Bug fixes & improvements
Bug fixes & improvements
Enhanced secret redaction reporting with per-type counting, improved redact_historical.py output, updated Discord invite link.
Commits since v2.2.0
v2.2.0 — Secret Redaction
Secret Redaction
Claw Recall now automatically strips sensitive data (API keys, OAuth tokens, passwords, SSH keys, etc.) from all content before it enters the database. This covers every ingestion path:
- Session indexing — messages from OpenClaw and Claude Code sessions
- Thought capture — CLI, HTTP, MCP captures
- External sources — Gmail, Google Drive, Slack polling
Built-in patterns (18 categories)
Google OAuth (client ID + secret), Tailscale keys, AWS keys, generic API keys/tokens, Bearer tokens, passwords, cookie secrets, SSH private keys, Slack tokens, GitHub tokens, OpenAI keys, Anthropic keys, Stripe keys, Sendgrid keys, connection strings with embedded passwords, and custom header tokens.
Custom patterns
Add your own regex patterns to redact_patterns.conf (one per line). They're auto-loaded at startup.
Historical cleanup
Run the migration script to scan and redact existing records:
python3 -m scripts.redact_historical # Dry run
python3 -m scripts.redact_historical --apply # Apply changesv2.1.1 — Fix Web UI Template Path
Fixed
- Web UI was completely broken after v2.1.0 package refactor —
_REPO_DIRresolved one level too shallow, causingTemplateNotFound: index.htmlon every request
v2.1.0 — Package Refactor
[2.1.0] — 2026-03-08
Package refactor: all code consolidated into claw_recall/ Python package with proper subpackages.
Changed
Package Structure
- All source code moved from root-level
.pyfiles intoclaw_recall/package with 4 subpackages:search/,capture/,indexing/,api/ - All components now invoked via
python3 -m claw_recall.xxxinstead ofpython3 filename.py - Config centralized in
claw_recall/config.py— single source of truth for DB_PATH, embedding settings, agent name mappings - Database connection management in
claw_recall/database.pywithget_db()context manager - Systemd service files updated to use module execution
- CLI wrapper (
recall) updated to callpython3 -m claw_recall.cli
Documentation
- README rewritten for beginners — numbered Quick Start steps, verification at each stage, exact MCP config file paths for Claude Code and OpenClaw, "Keep It Running" section (systemd/screen/cron), Quick Troubleshooting table
- Prerequisites section moved before Quick Start with platform notes (WSL/Linux/macOS)
- MCP section explains what MCP is, what stdio vs SSE means, where config files go
- Comprehensive installation/operations guide split into
docs/guide.md - Guide Production Deployment section rewritten with step-by-step systemd setup
- CONTRIBUTING.md updated with correct test commands
- Internal reference doc (
claw-recall-reference) updated with package layout
Root Cleanup
- 14 root-level Python files removed (replaced by package modules)
- Scripts moved to
scripts/directory - Tests moved to
tests/directory
Fixed
- mcporter MCP stdio config updated to reference new package module path
- All 123 tests updated for new import paths and passing
v2.0.0 — MCP Integration, External Sources, Production Hardening
Major release: MCP integration, external source capture, SSE transport, health monitoring, and production hardening.
Highlights
- MCP Integration — 8 tools via stdio and SSE transport for local and remote agent access
- External Source Capture — Gmail, Google Drive, and Slack indexing with backfill support
- Real-Time Indexing — inotify-based watcher + remote HTTP push for cross-machine sync
- Thought Capture — Persistent insights that survive context compaction
- Health Monitoring — Service health checks with embedding gap detection
- Production Ready — systemd services, CSP headers, security hardening
What's New
MCP Tools
search_memory · search_thoughts · capture_thought · browse_recent · browse_activity · memory_stats · poll_sources · capture_source_status
External Sources
- Gmail with full body extraction and PDF attachment parsing
- Google Drive document indexing with noise filtering
- Slack message capture
- Historical backfill (
--backfill --days 90)
Infrastructure
- inotify file watcher with 5s debounce
- Remote machine watcher via HTTP push
- Incremental indexing (only new messages)
- Production systemd service files
/healthendpoint for monitoring
Bug Fixes
- Shell injection vulnerability in
.envloading - Memory leak in embedding cache (5.5GB → 123MB)
- WSL agent misattribution
- Resource leaks in watcher and session pusher
- Improved error handling and logging across codebase
See CHANGELOG.md for full details.