diff --git a/CHANGELOG.md b/CHANGELOG.md index 0f5fc55..c2f6db9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,16 +8,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.5.2] - 2026-02-16 ### Changed + - **Claude Code 5.1+ minimum**: Documented Claude Code 5.1+ as a prerequisite. The async hook support added in v0.5.1 requires it. - **Removed `prepublishOnly` script**: Build and test are handled by CI, not by npm lifecycle hooks. ## [0.5.1] - 2026-02-16 ### Changed + - **Async hooks**: `PreCompact` and both `SessionEnd` hooks now run with `async: true` (fire-and-forget), so they no longer block Claude Code sessions. `SessionStart` remains synchronous since its stdout delivers memory context. - **Hook dedup logic**: `causantic init` now compares full hook objects (not just command strings), so existing installs pick up the async flag on re-init. ### Fixed + - **N+1 cluster query in SessionStart**: Replaced per-cluster loop (`getClusterChunkIds` + `getChunksByIds` × N clusters) with a single batch SQL query (`getClusterProjectRelevance`). 50 clusters now costs 1 query instead of 100+. - **SessionStart loads all chunks to slice last N**: New `getRecentChunksBySessionSlug()` uses SQL `ORDER BY ... DESC LIMIT` instead of loading every chunk then slicing in JS. - **SessionStart loads all clusters then filters**: New `getClustersWithDescriptions()` uses SQL `WHERE description IS NOT NULL` instead of loading all clusters then filtering in JS. @@ -25,35 +28,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.5.0] - 2026-02-16 ### Fixed + - **MCP config written to wrong file**: `causantic init` was writing MCP server configuration to `~/.claude/settings.json`, but Claude Code reads MCP servers from `~/.claude.json`. MCP config now writes to `~/.claude.json`; hooks remain in `settings.json`. Includes automatic migration of existing entries from `settings.json` to `~/.claude.json` on re-init. - **Uninstall cleanup**: `causantic uninstall` now removes MCP entries from both `~/.claude.json` (current) and `~/.claude/settings.json` (legacy). ## [0.4.3] - 2026-02-15 ### Fixed + - **MCP server notification handling**: Server now silently ignores JSON-RPC notifications (e.g. `notifications/initialized`) instead of returning METHOD_NOT_FOUND errors, which caused Claude Code to fail loading the MCP server ### Changed + - **Reference docs**: Synced `docs/reference/skills.md` descriptions with README and updated version in `docs/reference/mcp-tools.md` ## [0.4.2] - 2026-02-15 ### Changed + - **README**: Clarified that Anthropic API key is optional (only used for cluster topic labeling via Haiku); all core retrieval works without it - **Skill descriptions**: Sharpened all skill descriptions in README and CLAUDE.md block to clearly differentiate each tool — recall (backward chain walk), predict (forward chain walk), reconstruct (replay), summary (recap), retro (patterns) ## [0.4.1] - 2026-02-15 ### Added + - **CLI commands reference in CLAUDE.md block**: Claude Code now knows all 16 CLI commands without needing to run `causantic --help`. Eliminates repeated help lookups during sessions. ### Fixed + - README Key Differentiators numbering (duplicate "5." corrected to "4." and "5.") - SECURITY.md supported versions updated to v0.4.x only ## [0.4.0] - 2026-02-15 ### Changed + - **Episodic Retrieval Pipeline**: Redesigned recall/predict from graph traversal to chain walking. Seeds found by semantic search; the causal graph unfolds them into ordered narrative chains; chains ranked by aggregate semantic relevance per token. - **Sequential edge structure**: Replaced m×n all-pairs edges with sequential linked-list (intra-turn C1→C2→C3, inter-turn last→first, cross-session last→first). All edges stored as single `forward` rows with uniform weight. - **MCP tools**: Replaced `explain` with `search` (semantic discovery). `recall` and `predict` now return episodic chain narratives with search-style fallback. Added `hook-status`, `stats`, and `forget` tools. MCP server now exposes 9 tools. @@ -64,6 +74,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **SessionStart error context**: Fallback message now includes a classified error hint (database busy, database not found, embedder unavailable, internal error) instead of a generic static string. ### Added + - **`/causantic-forget` skill**: Guided memory deletion by topic, time range, or session with dry-run preview and confirmation workflow. - **Skills reference documentation** (`docs/reference/skills.md`): Reference page for all 14 skills with parameters, usage examples, and decision guide. - **Semantic deletion for `forget` tool**: Added `query` and `threshold` parameters for topic-based deletion (e.g., "forget everything about authentication"). Uses vector-only search for precision. Dry-run shows top matches with similarity scores and score distribution. Combinable with time/session filters via AND logic. @@ -86,6 +97,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **Removed skill cleanup**: `causantic init` now deletes directories for removed skills (e.g., `causantic-context`) on re-init. ### Removed + - **`explain` MCP tool**: Subsumed by `recall` (both walk backward). - **`/causantic-context` skill**: Merged into `/causantic-explain`, which now handles both "why" questions and area briefings. - **Sum-product traverser**: Replaced by chain walker. Deleted `src/retrieval/traverser.ts`. @@ -99,13 +111,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.3.6] - 2026-02-15 ### Fixed + - **MCP error messages**: Tool failure responses now include the actual error message instead of generic "Tool execution failed", making transient errors diagnosable without opt-in stderr logging. ### Changed + - **CI formatting enforcement**: Added `format:check` step to CI workflow so formatting drift is caught before merge. - **Circular dependencies resolved**: Extracted shared types into `src/maintenance/types.ts` and `src/dashboard/client/src/lib/constants.ts` to break 5 circular dependency cycles. ### Housekeeping + - Fixed 5 ESLint warnings (consistent-type-imports, unused imports). - Bumped typedoc 0.28.16 → 0.28.17 (fixes moderate ReDoS in markdown-it). - Synced package-lock.json. @@ -115,14 +130,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.3.0] - 2026-02-12 ### Changed + - **Time-based edge decay**: Replaced vector-clock hop counting with intrinsic time-based edge decay. Each edge's weight decays based on its age (milliseconds since creation), not logical hops. Backward edges use delayed-linear (60-minute hold), forward edges use exponential (10-minute half-life). - **Broadened vector TTL**: `cleanupExpired()` now applies to ALL vectors, not just orphaned ones. Vectors older than the TTL (default 90 days) are cleaned up regardless of edge status. - **Simplified traversal**: `traverse()` and `traverseMultiple()` use time-based decay configs directly. Sum-product rules unchanged. ### Added + - **FIFO vector cap**: New `vectors.maxCount` config option evicts oldest vectors when the collection exceeds the limit. Default: 0 (unlimited). ### Removed + - **Vector clocks**: Clock store, clock compactor, and vector-clock module deleted. Vector clock columns dropped from SQLite schema (v7 migration). - **Graph pruner**: `prune-graph` maintenance task removed. Edge cleanup happens via FK CASCADE when chunks are deleted by TTL/FIFO. Maintenance tasks reduced from 5 to 4. - **Orphan lifecycle**: Chunks no longer transition through an "orphaned" state. They go directly from active to expired when TTL elapses. @@ -130,6 +148,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.2.1] - 2026-02-11 ### Added + - **SessionEnd hook**: Triggers session ingestion on `/clear`, logout, and exit — closes the gap where chunks were lost between compaction events - Shared `ingestCurrentSession()` helper extracted from PreCompact, used by both PreCompact and SessionEnd hooks - Dynamic hook name logging in `causantic init` @@ -137,6 +156,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.2.0] - 2026-02-11 ### Added + - **Schema v6: Session Reconstruction**: Pure chronological SQLite queries for "what did I work on?" — composite index on `(session_slug, start_time)`, MCP tools `list-sessions` and `reconstruct` - **Project-Filtered Retrieval**: Federated approach with `projectFilter` on retrieval requests, cross-project graph traversal preserved - **Collection Benchmark Suite**: Self-service benchmarks for health, retrieval quality, graph value, and latency with scoring, tuning recommendations, and history tracking @@ -159,20 +179,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Per-chunk encryption (ChaCha20-Poly1305, key stored in system keychain) ### Changed + - **HDBSCAN rewrite**: Pure TypeScript implementation replacing hdbscan-ts — 130× speedup (65 min → 30 sec for 6,000 points) - **Direction-specific decay**: Backward and forward edges use different decay curves (empirically tuned) - **MCP tools**: Expanded from 3 to 6 tools (added list-projects, list-sessions, reconstruct) - **Clustering threshold**: Tuned default from 0.09 → 0.10 ### Fixed + - README config example: corrected stale `clustering.threshold` default value ### Infrastructure + - Utility deduplication and standardized logging - ESLint no-console rule for consistent log handling - Test coverage: 1,684 tests passing in vitest ### Research Findings + - Topic continuity detection: 0.998 AUC - Clustering threshold optimization: F1=0.940 at 0.09 - Graph traversal experiment: 4.65× context augmentation (v0.2 sum-product; replaced by chain walking in v0.4.0) @@ -182,6 +206,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.1.0] - 2026-02-08 ### Added + - Initial release - Session parsing and chunking - Embedding generation with jina-small diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 362bf38..2bea68e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -39,6 +39,7 @@ npm test ## Pull Request Process 1. **Create a branch** from `main` with a descriptive name: + ```bash git checkout -b feature/add-new-tool git checkout -b fix/clustering-performance @@ -49,11 +50,13 @@ npm test 3. **Write tests** for new functionality 4. **Run the test suite**: + ```bash npm test ``` 5. **Build and verify**: + ```bash npm run build ``` diff --git a/README.md b/README.md index eb406ef..4f4260e 100644 --- a/README.md +++ b/README.md @@ -38,29 +38,29 @@ Developers using Claude Code who want their AI assistant to **remember across se ## Why Causantic? -Most AI memory systems use vector embeddings for similarity search. Causantic does too — but adds a **causal graph** that tracks *relationships* between memory chunks, **BM25 keyword search** for exact matches, and **HDBSCAN clustering** for topic expansion. The result: - -| | Vector Search Only | Causantic | -|---|---|---| -| **Finds similar content** | Yes | Yes | -| **Finds lexically relevant content** | No | Yes (BM25 keyword search) | -| **Finds related context** | No | Yes (causal edges) | -| **Finds topically related context** | No | Yes (cluster expansion) | -| **Temporal awareness** | Wall-clock decay | Episodic chain walking | -| **Context augmentation** | 1× | **2.46×** (chain walking adds episodic narrative) | -| **Handles project switches** | Breaks continuity | Preserves causality | -| **Bidirectional queries** | Forward only | Backward + Forward | +Most AI memory systems use vector embeddings for similarity search. Causantic does too — but adds a **causal graph** that tracks _relationships_ between memory chunks, **BM25 keyword search** for exact matches, and **HDBSCAN clustering** for topic expansion. The result: + +| | Vector Search Only | Causantic | +| ------------------------------------ | ------------------ | ------------------------------------------------- | +| **Finds similar content** | Yes | Yes | +| **Finds lexically relevant content** | No | Yes (BM25 keyword search) | +| **Finds related context** | No | Yes (causal edges) | +| **Finds topically related context** | No | Yes (cluster expansion) | +| **Temporal awareness** | Wall-clock decay | Episodic chain walking | +| **Context augmentation** | 1× | **2.46×** (chain walking adds episodic narrative) | +| **Handles project switches** | Breaks continuity | Preserves causality | +| **Bidirectional queries** | Forward only | Backward + Forward | ### How It Compares -| System | Local-First | Temporal Decay | Graph Structure | Self-Benchmarking | -|--------|:-----------:|:--------------:|:--------------:|:-----------------:| -| **Causantic** | **Yes** | **Chain walking** | **Causal graph** | **Yes** | -| Mem0 | No (Cloud) | None | Paid add-on | No | -| Cognee | Self-hostable | None | Triplet extraction | No | -| Letta/MemGPT | Self-hostable | Summarization | None | No | -| Zep | Enterprise | Bi-temporal | Temporal KG | No | -| GraphRAG | Self-hostable | Static corpus | Hierarchical | No | +| System | Local-First | Temporal Decay | Graph Structure | Self-Benchmarking | +| ------------- | :-----------: | :---------------: | :----------------: | :---------------: | +| **Causantic** | **Yes** | **Chain walking** | **Causal graph** | **Yes** | +| Mem0 | No (Cloud) | None | Paid add-on | No | +| Cognee | Self-hostable | None | Triplet extraction | No | +| Letta/MemGPT | Self-hostable | Summarization | None | No | +| Zep | Enterprise | Bi-temporal | Temporal KG | No | +| GraphRAG | Self-hostable | Static corpus | Hierarchical | No | See [Landscape Analysis](docs/research/approach/landscape-analysis.md) for detailed per-system analysis. @@ -70,7 +70,7 @@ See [Landscape Analysis](docs/research/approach/landscape-analysis.md) for detai All data stays on your machine. Optional per-chunk encryption (ChaCha20-Poly1305) with keys stored in your system keychain. No cloud dependency. **2. Hybrid BM25 + Vector Search** -Vector search finds chunks that *look similar*. BM25 keyword search finds chunks with *exact lexical matches* — function names, error codes, CLI flags. Both run in parallel and fuse via Reciprocal Rank Fusion (RRF). +Vector search finds chunks that _look similar_. BM25 keyword search finds chunks with _exact lexical matches_ — function names, error codes, CLI flags. Both run in parallel and fuse via Reciprocal Rank Fusion (RRF). **3. Sequential Causal Graph with Episodic Chain Walking** Chunks are connected in a sequential linked list — intra-turn chunks chained sequentially, inter-turn edges linking last→first, cross-session edges bridging sessions. The `recall` tool walks this graph backward to reconstruct episodic narratives; `predict` walks forward. Chains are scored by cosine similarity per token, producing ordered narratives where each chunk adds new information. @@ -158,17 +158,17 @@ Measure how well your memory system is working with built-in benchmarks. Health, The MCP server exposes nine tools: -| Tool | Description | -|------|-------------| -| `search` | Semantic discovery — "what do I know about X?" Vector + keyword + RRF + cluster expansion. | -| `recall` | Episodic memory — "how did we solve X?" Seeds → backward chain walk → ordered narrative. Includes chain walk diagnostics on fallback. | -| `predict` | Forward episodic — "what's likely next?" Seeds → forward chain walk → ordered narrative. Includes chain walk diagnostics on fallback. | -| `list-projects` | Discover available projects with chunk counts and date ranges. | -| `list-sessions` | Browse sessions for a project with time filtering. | -| `reconstruct` | Rebuild session context chronologically — "what did I work on yesterday?" | -| `hook-status` | Check when hooks last ran and whether they succeeded. | -| `stats` | Memory statistics — version, chunk/edge/cluster counts, per-project breakdowns. | -| `forget` | Delete chunks by project, time range, session, or semantic query. Defaults to dry-run preview. | +| Tool | Description | +| --------------- | ------------------------------------------------------------------------------------------------------------------------------------- | +| `search` | Semantic discovery — "what do I know about X?" Vector + keyword + RRF + cluster expansion. | +| `recall` | Episodic memory — "how did we solve X?" Seeds → backward chain walk → ordered narrative. Includes chain walk diagnostics on fallback. | +| `predict` | Forward episodic — "what's likely next?" Seeds → forward chain walk → ordered narrative. Includes chain walk diagnostics on fallback. | +| `list-projects` | Discover available projects with chunk counts and date ranges. | +| `list-sessions` | Browse sessions for a project with time filtering. | +| `reconstruct` | Rebuild session context chronologically — "what did I work on yesterday?" | +| `hook-status` | Check when hooks last ran and whether they succeeded. | +| `stats` | Memory statistics — version, chunk/edge/cluster counts, per-project breakdowns. | +| `forget` | Delete chunks by project, time range, session, or semantic query. Defaults to dry-run preview. | ### Claude Code Integration @@ -191,22 +191,22 @@ Or run `npx causantic init` to configure automatically. Causantic installs 14 Claude Code slash commands (via `npx causantic init`) for natural-language interaction with memory: -| Skill | Description | -|-------|-------------| -| `/causantic-recall [query]` | Reconstruct how something happened — walks backward through causal chains (how did we solve X?) | -| `/causantic-search [query]` | Broad discovery — find everything memory knows about a topic (what do I know about X?) | -| `/causantic-predict ` | Surface what came after similar past situations — walks forward through causal chains (what's likely relevant next?) | -| `/causantic-explain [question]` | Answer "why" questions using memory + codebase (why does X work this way?) | -| `/causantic-debug [error]` | Search for prior encounters with an error (auto-extracts from conversation if no argument) | -| `/causantic-resume` | Resume interrupted work — start-of-session briefing | -| `/causantic-reconstruct [time]` | Replay a past session chronologically by time range | -| `/causantic-summary [time]` | Factual recap of what was done across recent sessions | -| `/causantic-list-projects` | Discover available projects in memory | -| `/causantic-status` | Check system health and memory statistics | -| `/causantic-crossref [pattern]` | Search across all projects for reusable patterns | -| `/causantic-retro [scope]` | Surface recurring patterns, problems, and decisions across sessions | -| `/causantic-cleanup` | Memory-informed codebase review and cleanup plan | -| `/causantic-forget [query]` | Delete memory by topic, time range, or session (always previews first) | +| Skill | Description | +| ------------------------------- | -------------------------------------------------------------------------------------------------------------------- | +| `/causantic-recall [query]` | Reconstruct how something happened — walks backward through causal chains (how did we solve X?) | +| `/causantic-search [query]` | Broad discovery — find everything memory knows about a topic (what do I know about X?) | +| `/causantic-predict ` | Surface what came after similar past situations — walks forward through causal chains (what's likely relevant next?) | +| `/causantic-explain [question]` | Answer "why" questions using memory + codebase (why does X work this way?) | +| `/causantic-debug [error]` | Search for prior encounters with an error (auto-extracts from conversation if no argument) | +| `/causantic-resume` | Resume interrupted work — start-of-session briefing | +| `/causantic-reconstruct [time]` | Replay a past session chronologically by time range | +| `/causantic-summary [time]` | Factual recap of what was done across recent sessions | +| `/causantic-list-projects` | Discover available projects in memory | +| `/causantic-status` | Check system health and memory statistics | +| `/causantic-crossref [pattern]` | Search across all projects for reusable patterns | +| `/causantic-retro [scope]` | Surface recurring patterns, problems, and decisions across sessions | +| `/causantic-cleanup` | Memory-informed codebase review and cleanup plan | +| `/causantic-forget [query]` | Delete memory by topic, time range, or session (always previews first) | Skills are installed to `~/.claude/skills/causantic-*/` and work as slash commands in Claude Code. They orchestrate the MCP tools above with structured prompts tailored to each use case. @@ -249,7 +249,7 @@ Create `causantic.config.json` in your project root: { "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json", "clustering": { - "threshold": 0.10, + "threshold": 0.1, "minClusterSize": 4 }, "vectors": { @@ -283,13 +283,13 @@ See [Security Guide](docs/guides/security.md). Built on rigorous experimentation across 75 sessions and 297+ queries: -| Experiment | Result | Notes | -|------------|--------|-------| -| Chain Walking (v0.3) | **2.46×** context | vs vector-only, 297 queries, 15 projects | -| Topic Detection | 0.998 AUC | near-perfect accuracy | -| Clustering | F1=0.940 | 100% precision | -| Thinking Block Removal | +0.063 AUC | embedding quality improvement | -| Collection Benchmark | **64/100** | health, retrieval, chain quality, latency | +| Experiment | Result | Notes | +| ---------------------- | ----------------- | ----------------------------------------- | +| Chain Walking (v0.3) | **2.46×** context | vs vector-only, 297 queries, 15 projects | +| Topic Detection | 0.998 AUC | near-perfect accuracy | +| Clustering | F1=0.940 | 100% precision | +| Thinking Block Removal | +0.063 AUC | embedding quality improvement | +| Collection Benchmark | **64/100** | health, retrieval, chain quality, latency | > **Note**: An earlier version (v0.2) reported 4.65× augmentation using sum-product graph traversal with m×n all-pairs edges (492 queries, 25 projects). That architecture was replaced in v0.3 after collection benchmarks showed graph traversal contributing only ~2% of results. See [lessons learned](docs/research/experiments/lessons-learned.md) for the full story. diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 5303522..400c6d0 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -40,6 +40,7 @@ npx causantic init ``` The interactive setup wizard will: + 1. Create the `~/.causantic/` directory structure 2. Offer to enable database encryption (recommended) 3. Initialize the database @@ -73,6 +74,7 @@ During setup, Causantic will detect existing Claude Code sessions in `~/.claude/ For large session histories, the initial import may take a few minutes. After importing sessions, Causantic automatically: + - **Builds clusters**: Groups related chunks by topic using HDBSCAN ### Cluster Labeling (Optional) @@ -82,6 +84,7 @@ Causantic can use Claude Haiku to generate human-readable descriptions for topic During setup, you'll be prompted to add your API key. The key is stored securely in your system keychain (macOS Keychain / Linux libsecret). You can add or update the API key later: + ```bash npx causantic config set-key anthropic-api-key npx causantic maintenance run refresh-labels diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index 2923baa..130c1e0 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -19,6 +19,7 @@ npx causantic batch-ingest ~/.claude/projects ``` This creates: + - A SQLite database at `~/.causantic/memory.db` - A vector store at `~/.causantic/vectors/` diff --git a/docs/guides/backup-restore.md b/docs/guides/backup-restore.md index 5aecf1f..0fde77b 100644 --- a/docs/guides/backup-restore.md +++ b/docs/guides/backup-restore.md @@ -83,21 +83,21 @@ CAUSANTIC_EXPORT_PASSWORD="your-secure-password" npx causantic import backup.cau ## What Gets Exported -| Data | Description | -|------|-------------| -| Chunks | Conversation segments with semantic content | -| Edges | Causal relationships (forward/backward links) with identity and link counts | +| Data | Description | +| -------- | ------------------------------------------------------------------------------ | +| Chunks | Conversation segments with semantic content | +| Edges | Causal relationships (forward/backward links) with identity and link counts | | Clusters | Topic groupings with centroids, exemplar IDs, distances, and membership hashes | -| Vectors | Embedding vectors for semantic search (skip with `--no-vectors`) | +| Vectors | Embedding vectors for semantic search (skip with `--no-vectors`) | ## Archive Format ### Version History -| Version | Changes | -|---------|---------| -| 1.1 | Added vector embeddings, full cluster data (centroid, distances, exemplars), gzip compression, edge identity | -| 1.0 | Initial format (chunks, edges, basic clusters) | +| Version | Changes | +| ------- | ------------------------------------------------------------------------------------------------------------ | +| 1.1 | Added vector embeddings, full cluster data (centroid, distances, exemplars), gzip compression, edge identity | +| 1.0 | Initial format (chunks, edges, basic clusters) | Archives are backward-compatible: v1.1 can import v1.0 archives (with a warning that vectors are missing). @@ -119,6 +119,7 @@ The archive format uses magic bytes (`CST\0`) to identify encrypted files. ### File Structure **Encrypted + compressed:** + ``` [Magic: 4 bytes "CST\0"] [Salt: 16 bytes] @@ -128,11 +129,13 @@ The archive format uses magic bytes (`CST\0`) to identify encrypted files. ``` **Unencrypted compressed (default):** + ``` [gzip(JSON)] ``` **Plain JSON (v1.0 backward compat):** + ```json { "format": "causantic-archive", @@ -151,6 +154,7 @@ The archive format uses magic bytes (`CST\0`) to identify encrypted files. ### Moving to a New Machine 1. Export on old machine: + ```bash npx causantic export --output ~/backup.causantic ``` @@ -158,6 +162,7 @@ The archive format uses magic bytes (`CST\0`) to identify encrypted files. 2. Transfer `backup.causantic` to new machine 3. Initialize Causantic on new machine: + ```bash npx causantic init ``` @@ -184,12 +189,14 @@ npx causantic import shared.causantic --merge ### "Archive is encrypted. Please provide a password." The file was encrypted but no password was provided. Either: + - Run interactively and enter the password when prompted - Set `CAUSANTIC_EXPORT_PASSWORD` environment variable ### "Invalid archive format" The file is not a valid Causantic archive. Check that: + - The file wasn't corrupted during transfer - It's the correct file (not a random JSON file) @@ -200,5 +207,6 @@ Wrong password. Re-enter the password carefully. ### "Archive version 1.0: no vector embeddings" The archive was created with v1.0 (before vector support). After import: + - Semantic search (`recall`, `search`, `predict`) won't work until vectors are regenerated - Run `npx causantic maintenance run scan-projects` to re-ingest and generate embeddings diff --git a/docs/guides/benchmarking.md b/docs/guides/benchmarking.md index a1c30a2..1f5f355 100644 --- a/docs/guides/benchmarking.md +++ b/docs/guides/benchmarking.md @@ -46,12 +46,12 @@ npx causantic benchmark-collection --full The overall score (0-100) is a weighted composite: -| Category | Weight | What it means | -|----------|--------|---------------| -| Health | 25% | Collection structure and organization | -| Retrieval | 35% | Can the system find the right context? | -| Chain Quality | 25% | Does episodic chain walking produce useful narratives? | -| Latency | 15% | Is query performance acceptable? | +| Category | Weight | What it means | +| ------------- | ------ | ------------------------------------------------------ | +| Health | 25% | Collection structure and organization | +| Retrieval | 35% | Can the system find the right context? | +| Chain Quality | 25% | Does episodic chain walking produce useful narratives? | +| Latency | 15% | Is query performance acceptable? | Only scored categories contribute; weights renormalize. A `--quick` run scores health only. @@ -127,16 +127,16 @@ The same seed produces identical query samples, making before/after comparisons ## Options Reference -| Flag | Description | Default | -|------|-------------|---------| -| `--quick` | Health only | - | -| `--standard` | Health + retrieval | (default) | -| `--full` | All categories | - | -| `--categories` | Comma-separated list | (from profile) | -| `--sample-size` | Queries to sample | 50 | -| `--seed` | Random seed | (random) | -| `--project` | Limit to one project | (all) | -| `--output` | Report directory | `./causantic-benchmark/` | -| `--json` | JSON output only | false | -| `--no-tuning` | Skip recommendations | false | -| `--history` | Show past trends | false | +| Flag | Description | Default | +| --------------- | -------------------- | ------------------------ | +| `--quick` | Health only | - | +| `--standard` | Health + retrieval | (default) | +| `--full` | All categories | - | +| `--categories` | Comma-separated list | (from profile) | +| `--sample-size` | Queries to sample | 50 | +| `--seed` | Random seed | (random) | +| `--project` | Limit to one project | (all) | +| `--output` | Report directory | `./causantic-benchmark/` | +| `--json` | JSON output only | false | +| `--no-tuning` | Skip recommendations | false | +| `--history` | Show past trends | false | diff --git a/docs/guides/dashboard.md b/docs/guides/dashboard.md index 6938f2b..5a679e2 100644 --- a/docs/guides/dashboard.md +++ b/docs/guides/dashboard.md @@ -65,18 +65,18 @@ Per-project views: The dashboard exposes a REST API that powers the UI. These routes can also be used programmatically: -| Route | Description | -|-------|-------------| -| `GET /api/stats` | Collection statistics (chunks, edges, clusters) | -| `GET /api/chunks` | List chunks with pagination | -| `GET /api/edges` | List edges with filtering | -| `GET /api/clusters` | List clusters with member counts | -| `GET /api/projects` | List projects with chunk counts | -| `GET /api/graph` | Graph data for visualization (nodes + edges) | -| `GET /api/search?q=` | Search memory with retrieval pipeline | -| `GET /api/sessions?project=` | List sessions for a project | -| `GET /api/benchmark-collection` | Run benchmark and return results | -| `GET /api/benchmark-collection/history` | Historical benchmark results | +| Route | Description | +| --------------------------------------- | ----------------------------------------------- | +| `GET /api/stats` | Collection statistics (chunks, edges, clusters) | +| `GET /api/chunks` | List chunks with pagination | +| `GET /api/edges` | List edges with filtering | +| `GET /api/clusters` | List clusters with member counts | +| `GET /api/projects` | List projects with chunk counts | +| `GET /api/graph` | Graph data for visualization (nodes + edges) | +| `GET /api/search?q=` | Search memory with retrieval pipeline | +| `GET /api/sessions?project=` | List sessions for a project | +| `GET /api/benchmark-collection` | Run benchmark and return results | +| `GET /api/benchmark-collection/history` | Historical benchmark results | ## Architecture diff --git a/docs/guides/integration.md b/docs/guides/integration.md index 095b357..58a63cf 100644 --- a/docs/guides/integration.md +++ b/docs/guides/integration.md @@ -11,11 +11,13 @@ Causantic uses Claude Code hooks to capture context at key moments: Fires when a new Claude Code session begins. **Actions:** + 1. Query memory for relevant context 2. Generate a memory summary 3. Update CLAUDE.md with relevant memories **Configuration:** + ```json { "hooks": { @@ -31,12 +33,14 @@ Fires when a new Claude Code session begins. Fires before Claude Code compacts the conversation history. **Actions:** + 1. Ingest current session content 2. Create chunks and edges 3. Generate embeddings 4. Preserve context that would be lost **Configuration:** + ```json { "hooks": { @@ -77,11 +81,11 @@ Add to your Claude Code MCP config: ### Available Tools -| Tool | Purpose | -|------|---------| -| `recall` | Semantic search with graph traversal | -| `explain` | Long-range historical context | -| `predict` | Proactive suggestions | +| Tool | Purpose | +| --------- | ------------------------------------ | +| `recall` | Semantic search with graph traversal | +| `explain` | Long-range historical context | +| `predict` | Proactive suggestions | See [MCP Tools Reference](../reference/mcp-tools.md) for details. @@ -95,6 +99,7 @@ Causantic can automatically update your project's CLAUDE.md with a memory sectio Recent topics: authentication flow, error handling, user settings Related sessions: + - Fixed login timeout issue (2 days ago) - Implemented OAuth integration (1 week ago) ``` diff --git a/docs/guides/maintenance.md b/docs/guides/maintenance.md index 1ff0cf5..a2edada 100644 --- a/docs/guides/maintenance.md +++ b/docs/guides/maintenance.md @@ -24,6 +24,7 @@ npx causantic maintenance run scan-projects **Frequency**: Hourly (or on-demand) **What it does**: + - Scans `~/.claude/projects/` for new sessions - Ingests new content into the memory store - Updates edge relationships @@ -39,6 +40,7 @@ npx causantic maintenance run update-clusters **Frequency**: Daily (configurable via `maintenance.clusterHour`) **What it does**: + - Full rebuild of cluster assignments using HDBSCAN - Identifies new topic groups and updates centroids - Refreshes cluster labels via Haiku (if Anthropic API key is configured) @@ -56,6 +58,7 @@ npx causantic maintenance run cleanup-vectors **Frequency**: Daily (1 hour after `update-clusters`) **What it does**: + - Finds vectors not accessed within the TTL period (default 90 days) - Deletes expired chunks (FK CASCADE removes edges and cluster assignments) - Deletes expired vectors @@ -73,6 +76,7 @@ npx causantic maintenance run vacuum **Frequency**: Weekly (Sundays at 5am) **What it does**: + - Runs SQLite VACUUM to reclaim disk space - Rebuilds internal data structures for better query performance @@ -105,6 +109,7 @@ npx causantic maintenance daemon ``` Uses cron-style scheduling (assuming default `clusterHour` of 2): + - `scan-projects`: Every hour - `update-clusters`: Daily at 2am - `cleanup-vectors`: Daily at 3am @@ -186,6 +191,7 @@ npx causantic health ``` Checks: + - Database connectivity - Vector store status - Cluster count @@ -198,6 +204,7 @@ npx causantic stats ``` Shows: + - Total chunks - Total edges (by type) - Cluster count diff --git a/docs/guides/security.md b/docs/guides/security.md index c64263b..7fb8748 100644 --- a/docs/guides/security.md +++ b/docs/guides/security.md @@ -6,13 +6,13 @@ Causantic stores sensitive data about your work patterns and conversation histor ### What Causantic Stores -| Data Type | Location | Risk if Exposed | -|-----------|----------|-----------------| -| Conversation text | `chunks` table | Direct content exposure | -| Embedding vectors | `vectors` table | Semantic reconstruction, topic inference | -| Causal relationships | `edges` table | Work patterns, debugging history | -| Topic clusters | `clusters` table | Project/feature groupings | -| Temporal ordering | `edges.created_at` timestamps | Activity timeline | +| Data Type | Location | Risk if Exposed | +| -------------------- | ----------------------------- | ---------------------------------------- | +| Conversation text | `chunks` table | Direct content exposure | +| Embedding vectors | `vectors` table | Semantic reconstruction, topic inference | +| Causal relationships | `edges` table | Work patterns, debugging history | +| Topic clusters | `clusters` table | Project/feature groupings | +| Temporal ordering | `edges.created_at` timestamps | Activity timeline | ### Why Encrypt Vectors? @@ -39,12 +39,14 @@ Causantic supports full database encryption using SQLCipher-compatible ciphers. ### Enable Encryption During initial setup: + ```bash npx causantic init # "Enable database encryption? [Y/n]" → y ``` For existing installations: + ```bash npx causantic encryption setup ``` @@ -78,13 +80,14 @@ hexdump -C ~/.causantic/memory.db | head -5 Causantic can retrieve the encryption key from multiple sources: -| Source | Best For | Configuration | -|--------|----------|---------------| -| `keychain` | Desktop use | Default on macOS/Linux with secret-tool | -| `env` | CI/CD, containers | Set `CAUSANTIC_DB_KEY` environment variable | -| `prompt` | Manual operations | CLI prompts for password | +| Source | Best For | Configuration | +| ---------- | ----------------- | ------------------------------------------- | +| `keychain` | Desktop use | Default on macOS/Linux with secret-tool | +| `env` | CI/CD, containers | Set `CAUSANTIC_DB_KEY` environment variable | +| `prompt` | Manual operations | CLI prompts for password | Configure in `causantic.config.json`: + ```json { "encryption": { @@ -125,12 +128,13 @@ This re-encrypts the entire database with a new key. Causantic supports two ciphers: -| Cipher | Speed | Best For | -|--------|-------|----------| -| `chacha20` | 2-3x faster on ARM | Apple Silicon, Raspberry Pi | -| `sqlcipher` | Standard | Intel/AMD, compatibility | +| Cipher | Speed | Best For | +| ----------- | ------------------ | --------------------------- | +| `chacha20` | 2-3x faster on ARM | Apple Silicon, Raspberry Pi | +| `sqlcipher` | Standard | Intel/AMD, compatibility | Configure in `causantic.config.json`: + ```json { "encryption": { @@ -152,6 +156,7 @@ Enable audit logging to track database access: ``` View audit log: + ```bash npx causantic encryption audit # 2024-01-15T10:30:00Z open Database opened successfully @@ -166,6 +171,7 @@ Audit logs are stored at `~/.causantic/audit.log`. ### Encrypted Exports Always use encrypted exports for backups: + ```bash npx causantic export --output backup.causantic # Prompts for password @@ -176,19 +182,21 @@ See [Backup & Restore](./backup-restore.md) for details. ### Transport Security When transferring backups: + - Use encrypted export files (`.causantic`) - Transfer over secure channels (SSH, HTTPS) - Delete temporary copies after import ## Environment Variables -| Variable | Purpose | -|----------|---------| -| `CAUSANTIC_DB_KEY` | Database encryption key (when keySource=env) | -| `CAUSANTIC_EXPORT_PASSWORD` | Export/import encryption password | -| `CAUSANTIC_SECRET_PASSWORD` | Fallback encrypted file store password | +| Variable | Purpose | +| --------------------------- | -------------------------------------------- | +| `CAUSANTIC_DB_KEY` | Database encryption key (when keySource=env) | +| `CAUSANTIC_EXPORT_PASSWORD` | Export/import encryption password | +| `CAUSANTIC_SECRET_PASSWORD` | Fallback encrypted file store password | For production/CI environments, use secrets management: + ```bash # GitHub Actions CAUSANTIC_DB_KEY="${{ secrets.CAUSANTIC_DB_KEY }}" npx causantic serve @@ -200,16 +208,19 @@ docker run -e CAUSANTIC_DB_KEY="$CAUSANTIC_DB_KEY" causantic-server ## Security Checklist ### Initial Setup + - [ ] Run `causantic init` with encryption enabled - [ ] Verify encryption with `causantic encryption status` - [ ] Back up encryption key with `causantic encryption backup-key` ### Ongoing + - [ ] Use encrypted exports for backups - [ ] Don't commit `.causantic/` directory to version control - [ ] Add `~/.causantic/` to backup encryption (Time Machine, etc.) ### Sharing/Migration + - [ ] Use `--redact-paths --redact-code` for shared exports - [ ] Transfer files over encrypted channels - [ ] Delete temporary decrypted copies @@ -227,6 +238,7 @@ docker run -e CAUSANTIC_DB_KEY="$CAUSANTIC_DB_KEY" causantic-server **Lost encryption key = lost data**. There is no recovery without the key. Mitigations: + - Back up key with `causantic encryption backup-key` - Store backup password in password manager - Keep unencrypted backup in secure location (optional) diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md index 33f49a2..6255f9c 100644 --- a/docs/guides/troubleshooting.md +++ b/docs/guides/troubleshooting.md @@ -9,6 +9,7 @@ Common issues and solutions for Causantic. Causantic requires Node.js 20+. **Solution**: + ```bash # Using nvm nvm install 20 @@ -23,17 +24,20 @@ node --version ### "MCP server not responding" **Check 1**: Verify the server starts: + ```bash npx causantic serve # Should output: "MCP server started on stdio" ``` **Check 2**: Test health endpoint: + ```bash npx causantic health ``` **Check 3**: Check Claude Code config: + ```json { "mcpServers": { @@ -50,6 +54,7 @@ npx causantic health **Cause**: No data ingested yet. **Solution**: + ```bash # Ingest existing sessions npx causantic batch-ingest ~/.claude/projects @@ -63,6 +68,7 @@ npx causantic stats ### "Queries are slow" **Cause 1**: Large database needs optimization. + ```bash npx causantic maintenance run vacuum ``` @@ -70,16 +76,19 @@ npx causantic maintenance run vacuum ### "Expected context not recalled" **Check 1**: Verify the session was ingested: + ```bash npx causantic list-sessions ``` **Check 2**: Search for specific content: + ```bash npx causantic search "your expected content" ``` **Check 3**: Check clustering: + ```bash npx causantic clusters list ``` @@ -93,6 +102,7 @@ If content is in a different cluster than expected, adjust `clustering.threshold **Cause**: Multiple processes accessing the database. **Solution**: Ensure only one Causantic process runs at a time: + ```bash # Kill any running Causantic processes pkill -f "causantic serve" @@ -104,6 +114,7 @@ npx causantic serve ### "Disk space running low" **Check storage**: + ```bash du -sh ~/.causantic/ du -sh ~/.causantic/memory.db @@ -111,6 +122,7 @@ du -sh ~/.causantic/vectors/ ``` **Solutions**: + 1. Run vacuum: `npx causantic maintenance run vacuum` 2. Configure `vectors.maxCount` to cap collection size 3. Lower `vectors.ttlDays` to expire old vectors sooner @@ -120,12 +132,14 @@ du -sh ~/.causantic/vectors/ ### "No API key found" **macOS**: + ```bash # Set key in Keychain npx causantic config set-key anthropic ``` **Linux**: + ```bash # Using secret-tool (GNOME) sudo apt install libsecret-tools @@ -144,11 +158,13 @@ export CAUSANTIC_ANTHROPIC_KEY="sk-ant-..." ### "Hook not firing" **Check Claude Code configuration**: + ```bash cat ~/.claude/settings.json ``` Verify hooks are configured: + ```json { "hooks": { @@ -162,6 +178,7 @@ Verify hooks are configured: ### "Hook fails silently" **Enable debug logging**: + ```bash export CAUSANTIC_DEBUG=1 npx causantic hook session-start diff --git a/docs/reference/cli-commands.md b/docs/reference/cli-commands.md index 58cf26d..558d953 100644 --- a/docs/reference/cli-commands.md +++ b/docs/reference/cli-commands.md @@ -20,11 +20,11 @@ npx causantic init [options] **Options**: -| Option | Description | -|--------|-------------| -| `--skip-mcp` | Skip MCP configuration (settings.json, project .mcp.json, skills, CLAUDE.md) | -| `--skip-encryption` | Skip the database encryption prompt | -| `--skip-ingest` | Skip the session import step | +| Option | Description | +| ------------------- | ---------------------------------------------------------------------------- | +| `--skip-mcp` | Skip MCP configuration (settings.json, project .mcp.json, skills, CLAUDE.md) | +| `--skip-encryption` | Skip the database encryption prompt | +| `--skip-ingest` | Skip the session import step | The wizard performs the following steps: @@ -42,6 +42,7 @@ The wizard performs the following steps: 12. Offers Anthropic API key setup for cluster labeling **Example**: + ```bash # Full interactive setup npx causantic init @@ -60,12 +61,13 @@ npx causantic serve [options] **Options**: -| Option | Description | -|--------|-------------| -| `--port ` | HTTP port (default: stdio) | +| Option | Description | +| ---------------- | ---------------------------- | +| `--port ` | HTTP port (default: stdio) | | `--health-check` | Enable health check endpoint | **Example**: + ```bash npx causantic serve npx causantic serve --health-check @@ -81,18 +83,19 @@ npx causantic ingest [options] **Arguments**: -| Argument | Description | -|----------|-------------| +| Argument | Description | +| -------------- | ----------------------------------------------- | | `session-path` | Path to session JSONL file or project directory | **Options**: -| Option | Description | -|--------|-------------| -| `--force` | Re-ingest even if already processed | -| `--dry-run` | Show what would be ingested | +| Option | Description | +| ----------- | ----------------------------------- | +| `--force` | Re-ingest even if already processed | +| `--dry-run` | Show what would be ingested | **Example**: + ```bash npx causantic ingest ~/.claude/projects/my-project/session-123.jsonl npx causantic ingest ~/.claude/projects/my-project/ @@ -108,12 +111,13 @@ npx causantic batch-ingest [options] **Options**: -| Option | Description | -|--------|-------------| +| Option | Description | +| ---------------- | --------------------------------------- | | `--parallel ` | Number of parallel workers (default: 4) | -| `--force` | Re-ingest all sessions | +| `--force` | Re-ingest all sessions | **Example**: + ```bash npx causantic batch-ingest ~/.claude/projects npx causantic batch-ingest ~/.claude/projects --parallel 8 @@ -129,12 +133,13 @@ npx causantic recall [options] **Options**: -| Option | Description | -|--------|-------------| +| Option | Description | +| ------------- | ----------------------------- | | `--limit ` | Maximum results (default: 10) | -| `--json` | Output as JSON | +| `--json` | Output as JSON | **Example**: + ```bash npx causantic recall "authentication flow" npx causantic recall "error handling" --limit 5 --json @@ -150,23 +155,24 @@ npx causantic maintenance [options] **Subcommands**: -| Subcommand | Description | -|------------|-------------| -| `run ` | Run a specific task | -| `run all` | Run all tasks | -| `status` | Show task status | -| `daemon` | Run as background daemon | +| Subcommand | Description | +| ------------ | ------------------------ | +| `run ` | Run a specific task | +| `run all` | Run all tasks | +| `status` | Show task status | +| `daemon` | Run as background daemon | **Tasks**: -| Task | Description | -|------|-------------| -| `scan-projects` | Discover and ingest new sessions | -| `update-clusters` | Re-run HDBSCAN clustering and refresh labels | +| Task | Description | +| ----------------- | -------------------------------------------------- | +| `scan-projects` | Discover and ingest new sessions | +| `update-clusters` | Re-run HDBSCAN clustering and refresh labels | | `cleanup-vectors` | Remove expired vectors and chunks (TTL + FIFO cap) | -| `vacuum` | Optimize database | +| `vacuum` | Optimize database | **Example**: + ```bash npx causantic maintenance run cleanup-vectors npx causantic maintenance run all @@ -184,14 +190,15 @@ npx causantic config [options] **Subcommands**: -| Subcommand | Description | -|------------|-------------| -| `show` | Display current configuration | -| `validate` | Validate configuration files | -| `set-key ` | Store an API key | -| `get-key ` | Retrieve an API key | +| Subcommand | Description | +| ---------------- | ----------------------------- | +| `show` | Display current configuration | +| `validate` | Validate configuration files | +| `set-key ` | Store an API key | +| `get-key ` | Retrieve an API key | **Example**: + ```bash npx causantic config show npx causantic config validate @@ -209,16 +216,17 @@ npx causantic encryption [options] **Subcommands**: -| Subcommand | Description | -|------------|-------------| -| `setup` | Enable encryption and generate a key | -| `status` | Show encryption status | -| `rotate-key` | Rotate the encryption key | -| `backup-key [path]` | Back up the encryption key to a password-protected file | -| `restore-key ` | Restore an encryption key from a backup file | -| `audit [limit]` | Show recent audit log entries | +| Subcommand | Description | +| -------------------- | ------------------------------------------------------- | +| `setup` | Enable encryption and generate a key | +| `status` | Show encryption status | +| `rotate-key` | Rotate the encryption key | +| `backup-key [path]` | Back up the encryption key to a password-protected file | +| `restore-key ` | Restore an encryption key from a backup file | +| `audit [limit]` | Show recent audit log entries | **Example**: + ```bash # Enable encryption npx causantic encryption setup @@ -249,16 +257,17 @@ npx causantic export [options] **Options**: -| Option | Description | -|--------|-------------| -| `--output ` | Output file path (default: `causantic-backup.causantic`) | -| `--no-encrypt` | Skip encryption | -| `--projects ` | Comma-separated project slugs to export | -| `--redact-paths` | Redact file paths in content | -| `--redact-code` | Redact code blocks in content | -| `--no-vectors` | Skip vector embeddings (smaller file, but semantic search won't work after import) | +| Option | Description | +| -------------------- | ---------------------------------------------------------------------------------- | +| `--output ` | Output file path (default: `causantic-backup.causantic`) | +| `--no-encrypt` | Skip encryption | +| `--projects ` | Comma-separated project slugs to export | +| `--redact-paths` | Redact file paths in content | +| `--redact-code` | Redact code blocks in content | +| `--no-vectors` | Skip vector embeddings (smaller file, but semantic search won't work after import) | **Example**: + ```bash npx causantic export --output backup.causantic npx causantic export --output backup.json --no-encrypt @@ -277,12 +286,13 @@ npx causantic import [options] **Options**: -| Option | Description | -|--------|-------------| -| `--merge` | Merge with existing data (default: replace) | -| `--dry-run` | Validate and report without importing | +| Option | Description | +| ----------- | ------------------------------------------- | +| `--merge` | Merge with existing data (default: replace) | +| `--dry-run` | Validate and report without importing | **Example**: + ```bash npx causantic import backup.causantic npx causantic import backup.causantic --merge @@ -299,11 +309,12 @@ npx causantic stats [options] **Options**: -| Option | Description | -|--------|-------------| +| Option | Description | +| -------- | -------------- | | `--json` | Output as JSON | **Example**: + ```bash npx causantic stats npx causantic stats --json @@ -319,11 +330,12 @@ npx causantic health [options] **Options**: -| Option | Description | -|--------|-------------| +| Option | Description | +| ----------- | -------------------- | | `--verbose` | Show detailed status | **Example**: + ```bash npx causantic health npx causantic health --verbose @@ -339,14 +351,15 @@ npx causantic hook [options] **Hooks**: -| Hook | Description | -|------|-------------| -| `session-start` | Session start hook — retrieves memory context | -| `session-end` | Session end hook — ingests the current session | -| `pre-compact` | Pre-compaction hook — ingests the current session before compaction | -| `claudemd-generator` | Update CLAUDE.md with memory context | +| Hook | Description | +| -------------------- | ------------------------------------------------------------------- | +| `session-start` | Session start hook — retrieves memory context | +| `session-end` | Session end hook — ingests the current session | +| `pre-compact` | Pre-compaction hook — ingests the current session before compaction | +| `claudemd-generator` | Update CLAUDE.md with memory context | **Example**: + ```bash npx causantic hook session-start npx causantic hook claudemd-generator @@ -362,11 +375,12 @@ npx causantic dashboard [options] **Options**: -| Option | Description | -|--------|-------------| +| Option | Description | +| --------------- | ------------------------- | | `--port ` | HTTP port (default: 3333) | **Example**: + ```bash # Launch on default port npx causantic dashboard @@ -389,21 +403,22 @@ npx causantic benchmark-collection [options] **Options**: -| Option | Description | -|--------|-------------| -| `--quick` | Health only (~1 second) | -| `--standard` | Health + retrieval (~30 seconds, default) | -| `--full` | All categories (~2-5 minutes) | -| `--categories ` | Comma-separated: health,retrieval,graph,latency | -| `--sample-size ` | Number of sample queries (default: 50) | -| `--seed ` | Random seed for reproducibility | -| `--project ` | Limit to one project | -| `--output ` | Output directory (default: ./causantic-benchmark/) | -| `--json` | Output JSON only (no markdown) | -| `--no-tuning` | Skip tuning recommendations | -| `--history` | Show trend from past runs | +| Option | Description | +| --------------------- | -------------------------------------------------- | +| `--quick` | Health only (~1 second) | +| `--standard` | Health + retrieval (~30 seconds, default) | +| `--full` | All categories (~2-5 minutes) | +| `--categories ` | Comma-separated: health,retrieval,graph,latency | +| `--sample-size ` | Number of sample queries (default: 50) | +| `--seed ` | Random seed for reproducibility | +| `--project ` | Limit to one project | +| `--output ` | Output directory (default: ./causantic-benchmark/) | +| `--json` | Output JSON only (no markdown) | +| `--no-tuning` | Skip tuning recommendations | +| `--history` | Show trend from past runs | **Example**: + ```bash # Quick health check npx causantic benchmark-collection --quick @@ -430,13 +445,14 @@ npx causantic uninstall [options] **Options**: -| Option | Description | -|--------|-------------| -| `--force` | Skip confirmation prompt and export offer | +| Option | Description | +| ------------- | ----------------------------------------------------- | +| `--force` | Skip confirmation prompt and export offer | | `--keep-data` | Remove integrations but preserve `~/.causantic/` data | -| `--dry-run` | Show what would be removed without making changes | +| `--dry-run` | Show what would be removed without making changes | Removes the following artifacts: + - CLAUDE.md Causantic memory block - `~/.claude.json` MCP server entry - `~/.claude/settings.json` legacy MCP server entry (pre-0.5.0) @@ -446,6 +462,7 @@ Removes the following artifacts: - `~/.causantic/` data directory (unless `--keep-data`) **Example**: + ```bash # Preview what would be removed npx causantic uninstall --dry-run @@ -464,20 +481,20 @@ npx causantic uninstall --force These options work with all commands: -| Option | Description | -|--------|-------------| -| `--config ` | Use specific config file | -| `--debug` | Enable debug logging | -| `--quiet` | Suppress non-error output | -| `--version` | Show version | -| `--help` | Show help | +| Option | Description | +| ----------------- | ------------------------- | +| `--config ` | Use specific config file | +| `--debug` | Enable debug logging | +| `--quiet` | Suppress non-error output | +| `--version` | Show version | +| `--help` | Show help | ## Exit Codes -| Code | Description | -|------|-------------| -| 0 | Success | -| 1 | General error | -| 2 | Invalid arguments | -| 3 | Configuration error | -| 4 | Database error | +| Code | Description | +| ---- | ------------------- | +| 0 | Success | +| 1 | General error | +| 2 | Invalid arguments | +| 3 | Configuration error | +| 4 | Database error | diff --git a/docs/reference/mcp-tools.md b/docs/reference/mcp-tools.md index b809840..773a060 100644 --- a/docs/reference/mcp-tools.md +++ b/docs/reference/mcp-tools.md @@ -20,14 +20,15 @@ Search memory semantically to discover relevant past context. Returns ranked res **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `query` | `string` | Yes | What to search for in memory. Be specific about what context you need. | -| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | +| Name | Type | Required | Description | +| --------- | -------- | -------- | ------------------------------------------------------------------------------------------------ | +| `query` | `string` | Yes | What to search for in memory. Be specific about what context you need. | +| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | **Response**: Plain text. Returns a header with chunk count and token count, followed by the assembled context text. Returns `"No relevant memory found."` if no matches. **Example**: + ``` Found 5 relevant memory chunks (1200 tokens): @@ -40,14 +41,15 @@ Recall episodic memory by walking backward through causal chains to reconstruct **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `query` | `string` | Yes | What to recall from memory. Be specific about what context you need. | -| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | +| Name | Type | Required | Description | +| --------- | -------- | -------- | ------------------------------------------------------------------------------------------------ | +| `query` | `string` | Yes | What to recall from memory. Be specific about what context you need. | +| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | **Response**: Plain text. Returns an ordered narrative (problem → solution). When the chain walker falls back to search, a diagnostic bracket is appended with details about what was attempted. **Example** (successful chain walk): + ``` Found 4 relevant memory chunks (900 tokens): @@ -55,6 +57,7 @@ Found 4 relevant memory chunks (900 tokens): ``` **Example** (fallback with diagnostics): + ``` Found 3 relevant memory chunks (650 tokens): @@ -69,10 +72,10 @@ Predict what context or topics might be relevant based on current discussion. Wa **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `context` | `string` | Yes | Current context or topic being discussed. | -| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | +| Name | Type | Required | Description | +| --------- | -------- | -------- | ------------------------------------------------------------------------------------------------ | +| `context` | `string` | Yes | Current context or topic being discussed. | +| `project` | `string` | No | Filter to a specific project. Omit to search all. Use `list-projects` to see available projects. | **Response**: Plain text. Returns `"Potentially relevant context (N items):"` followed by assembled text, or `"No predictions available based on current context."` if no matches. Uses half the token budget of recall/search. Includes chain walk diagnostics when falling back to search. @@ -85,6 +88,7 @@ List all projects in memory with chunk counts and date ranges. Use to discover a **Response**: Plain text list of projects with metadata. **Example**: + ``` Projects in memory: - my-app (142 chunks, Jan 2025 – Feb 2025) @@ -99,16 +103,17 @@ List sessions for a project with chunk counts, time ranges, and token totals. Us **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `project` | `string` | Yes | Project slug. Use `list-projects` to discover available projects. | -| `from` | `string` | No | Start date filter (ISO 8601). | -| `to` | `string` | No | End date filter (ISO 8601). | -| `days_back` | `number` | No | Look back N days from now. Alternative to `from`/`to`. | +| Name | Type | Required | Description | +| ----------- | -------- | -------- | ----------------------------------------------------------------- | +| `project` | `string` | Yes | Project slug. Use `list-projects` to discover available projects. | +| `from` | `string` | No | Start date filter (ISO 8601). | +| `to` | `string` | No | End date filter (ISO 8601). | +| `days_back` | `number` | No | Look back N days from now. Alternative to `from`/`to`. | **Response**: Plain text list of sessions with abbreviated IDs, timestamps, chunk counts, and token totals. **Example**: + ``` Sessions for "my-app" (3 total): - a1b2c3d4 (Feb 8, 2:30 PM – 4:15 PM, 12 chunks, 3400 tokens) @@ -124,16 +129,16 @@ Rebuild session context for a project by time range. Returns chronological chunk **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `project` | `string` | Yes | Project slug. Use `list-projects` to discover available projects. | -| `session_id` | `string` | No | Specific session ID to reconstruct. | -| `from` | `string` | No | Start date (ISO 8601). | -| `to` | `string` | No | End date (ISO 8601). | -| `days_back` | `number` | No | Look back N days from now. | -| `previous_session` | `boolean` | No | Get the session before the current one. | -| `current_session_id` | `string` | No | Current session ID (required when `previous_session` is true). | -| `keep_newest` | `boolean` | No | Keep newest chunks when truncating to fit token budget. Default: `true`. | +| Name | Type | Required | Description | +| -------------------- | --------- | -------- | ------------------------------------------------------------------------ | +| `project` | `string` | Yes | Project slug. Use `list-projects` to discover available projects. | +| `session_id` | `string` | No | Specific session ID to reconstruct. | +| `from` | `string` | No | Start date (ISO 8601). | +| `to` | `string` | No | End date (ISO 8601). | +| `days_back` | `number` | No | Look back N days from now. | +| `previous_session` | `boolean` | No | Get the session before the current one. | +| `current_session_id` | `string` | No | Current session ID (required when `previous_session` is true). | +| `keep_newest` | `boolean` | No | Keep newest chunks when truncating to fit token budget. Default: `true`. | **Response**: Plain text with chronological session context, including session boundary markers and chunk content. Token budget controlled by `tokens.mcpMaxResponse` config. @@ -154,6 +159,7 @@ Show memory statistics including version, chunk/edge/cluster counts, and per-pro **Response**: Formatted text with version, aggregate counts, and per-project details. **Example**: + ``` Causantic v0.4.2 @@ -173,24 +179,26 @@ Delete chunks from memory filtered by project, time range, session, or semantic **Parameters**: -| Name | Type | Required | Description | -|------|------|----------|-------------| -| `project` | `string` | Yes | Project slug. Use `list-projects` to see available projects. | -| `before` | `string` | No | Delete chunks before this ISO 8601 date. | -| `after` | `string` | No | Delete chunks on or after this ISO 8601 date. | -| `session_id` | `string` | No | Delete chunks from a specific session. | -| `query` | `string` | No | Semantic query for topic-based deletion (e.g., "authentication flow"). Finds similar chunks by embedding similarity. Can combine with `before`/`after`/`session_id` (AND logic). | -| `threshold` | `number` | No | Similarity threshold (0–1 or 0–100, default 0.6). Higher = more selective. Values >1 treated as percentages (e.g., `60` → `0.6`). Only used when `query` is provided. | -| `dry_run` | `boolean` | No | Preview without deleting (default: `true`). Set to `false` to actually delete. | +| Name | Type | Required | Description | +| ------------ | --------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `project` | `string` | Yes | Project slug. Use `list-projects` to see available projects. | +| `before` | `string` | No | Delete chunks before this ISO 8601 date. | +| `after` | `string` | No | Delete chunks on or after this ISO 8601 date. | +| `session_id` | `string` | No | Delete chunks from a specific session. | +| `query` | `string` | No | Semantic query for topic-based deletion (e.g., "authentication flow"). Finds similar chunks by embedding similarity. Can combine with `before`/`after`/`session_id` (AND logic). | +| `threshold` | `number` | No | Similarity threshold (0–1 or 0–100, default 0.6). Higher = more selective. Values >1 treated as percentages (e.g., `60` → `0.6`). Only used when `query` is provided. | +| `dry_run` | `boolean` | No | Preview without deleting (default: `true`). Set to `false` to actually delete. | **Response**: In dry-run mode without `query`, returns the count of chunks that would be deleted. With `query`, dry-run shows top matches with similarity scores, score distribution (min/max/median), and content previews. When `dry_run=false`, deletes the chunks along with their edges, cluster assignments, FTS entries (via CASCADE), and vector embeddings. **Example** (filter-based dry run): + ``` Dry run: 47 chunk(s) would be deleted from project "my-app". Set dry_run=false to proceed. ``` **Example** (semantic dry run): + ``` Dry run: 12 chunk(s) match query "authentication flow" (threshold: 60%, project: "my-app") Scores: 94% max, 63% min, 78% median @@ -206,6 +214,7 @@ Set dry_run=false to proceed with deletion. ``` **Example** (actual deletion): + ``` Deleted 47 chunk(s) from project "my-app" (vectors and related edges/clusters also removed). ``` @@ -214,30 +223,30 @@ Returns `"No chunks match the given filters."` if no chunks match (filter-based) ## Tool Selection Guidelines -| Scenario | Recommended Tool | -|----------|-----------------| -| Broad discovery — "what do I know about X?" | `search` | -| Episodic narrative — "how did we solve X?" | `recall` | -| Proactively surfacing relevant past context | `predict` | -| Discovering what projects exist in memory | `list-projects` | -| Browsing sessions before diving into one | `list-sessions` | -| "What did I work on yesterday/last session?" | `reconstruct` | -| Checking system health and memory usage | `stats` | -| Diagnosing hook issues | `hook-status` | +| Scenario | Recommended Tool | +| ----------------------------------------------- | -------------------------------------------------------------------- | +| Broad discovery — "what do I know about X?" | `search` | +| Episodic narrative — "how did we solve X?" | `recall` | +| Proactively surfacing relevant past context | `predict` | +| Discovering what projects exist in memory | `list-projects` | +| Browsing sessions before diving into one | `list-sessions` | +| "What did I work on yesterday/last session?" | `reconstruct` | +| Checking system health and memory usage | `stats` | +| Diagnosing hook issues | `hook-status` | | Deleting old or unwanted memory by time/session | `forget` (with `before`/`after`/`session_id`) or `/causantic-forget` | -| Deleting memory about a topic | `forget` (with `query`) or `/causantic-forget` | +| Deleting memory about a topic | `forget` (with `query`) or `/causantic-forget` | ## Chain Walk Diagnostics The `recall` and `predict` tools use episodic chain walking — following directed edges through the causal graph to build ordered narratives. When the chain walker cannot find a viable chain, it falls back to search results and appends a diagnostic bracket explaining why: -| Diagnostic Reason | Meaning | -|-------------------|---------| -| `No matching chunks in memory` | Search found 0 results — memory is empty or the query has no matches | -| `Search found chunks but none suitable as chain seeds` | Search returned results but none could seed a chain walk | -| `No edges found from seed chunks` | Seed chunks have no outgoing edges in the causal graph | -| `All chains had only 1 chunk (minimum 2 required)` | Edges exist but every chain was too short | -| `No chain met the qualifying threshold` | Chains were attempted but none scored well enough | +| Diagnostic Reason | Meaning | +| ------------------------------------------------------ | -------------------------------------------------------------------- | +| `No matching chunks in memory` | Search found 0 results — memory is empty or the query has no matches | +| `Search found chunks but none suitable as chain seeds` | Search returned results but none could seed a chain walk | +| `No edges found from seed chunks` | Seed chunks have no outgoing edges in the causal graph | +| `All chains had only 1 chunk (minimum 2 required)` | Edges exist but every chain was too short | +| `No chain met the qualifying threshold` | Chains were attempted but none scored well enough | These diagnostics help distinguish between "memory is empty" and "memory exists but lacks graph structure for episodic retrieval." diff --git a/docs/reference/skills.md b/docs/reference/skills.md index f292e5a..fb8748c 100644 --- a/docs/reference/skills.md +++ b/docs/reference/skills.md @@ -14,10 +14,10 @@ Skills are installed by `causantic init` to `~/.claude/skills/causantic-/S Reconstruct how something happened — walks backward through causal chains ("how did we solve X?") -| Parameter | Required | Description | -|-----------|----------|-------------| -| `query` | Yes | Natural language question about past work | -| `project` | No | Filter to a specific project slug | +| Parameter | Required | Description | +| --------- | -------- | ----------------------------------------- | +| `query` | Yes | Natural language question about past work | +| `project` | No | Filter to a specific project slug | **When to use**: User asks about past work, previous decisions, errors solved before, or context from prior sessions. Before saying "I don't have context from previous sessions" -- always try recall first. @@ -29,10 +29,10 @@ Reconstruct how something happened — walks backward through causal chains ("ho Broad discovery — find everything memory knows about a topic ("what do I know about X?") -| Parameter | Required | Description | -|-----------|----------|-------------| -| `query` | Yes | What to search for in memory | -| `project` | No | Filter to a specific project slug | +| Parameter | Required | Description | +| --------- | -------- | --------------------------------- | +| `query` | Yes | What to search for in memory | +| `project` | No | Filter to a specific project slug | **When to use**: Broad discovery, finding past context on a topic, as a starting point before using `recall` for deeper narrative. @@ -44,10 +44,10 @@ Broad discovery — find everything memory knows about a topic ("what do I know Surface what came after similar past situations — walks forward through causal chains ("what's likely relevant next?") -| Parameter | Required | Description | -|-----------|----------|-------------| -| `context` | Yes | Concise summary of the current task or topic | -| `project` | No | Filter to a specific project slug | +| Parameter | Required | Description | +| --------- | -------- | -------------------------------------------- | +| `context` | Yes | Concise summary of the current task or topic | +| `project` | No | Filter to a specific project slug | **When to use**: At the start of complex tasks to check for relevant prior work, when encountering patterns that might have been solved before. @@ -61,12 +61,13 @@ Surface what came after similar past situations — walks forward through causal Answer "why" questions using memory + codebase ("why does X work this way?") -| Parameter | Required | Description | -|-----------|----------|-------------| -| `query` | Yes | A "why" question or area/module name | -| `project` | No | Filter to a specific project slug | +| Parameter | Required | Description | +| --------- | -------- | ------------------------------------ | +| `query` | Yes | A "why" question or area/module name | +| `project` | No | Filter to a specific project slug | **Modes**: + - **Focused decision**: "Why does X..." / "What led to..." -- returns decision narrative (context, alternatives, rationale, trade-offs) - **Area briefing**: "Tell me about X" / area name / file path -- returns comprehensive briefing (purpose, key decisions, evolution, constraints) @@ -78,9 +79,9 @@ Answer "why" questions using memory + codebase ("why does X work this way?") Search past sessions for prior encounters with the current error, bug pattern, or issue. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `error` | No | Error text. If omitted, auto-extracts from the current conversation | +| Parameter | Required | Description | +| --------- | -------- | ------------------------------------------------------------------- | +| `error` | No | Error text. If omitted, auto-extracts from the current conversation | **When to use**: When stuck on an error after 2 failed attempts, debugging a recurring problem, or encountering a familiar-looking issue. @@ -94,9 +95,9 @@ Search past sessions for prior encounters with the current error, bug pattern, o Resume interrupted work -- start-of-session briefing. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `topic` | No | Topic to focus on, or time reference ("yesterday", "last week") | +| Parameter | Required | Description | +| --------- | -------- | --------------------------------------------------------------- | +| `topic` | No | Topic to focus on, or time reference ("yesterday", "last week") | **When to use**: Start of a session, user asks "where did I leave off?" @@ -108,9 +109,9 @@ Resume interrupted work -- start-of-session briefing. Replay a past session chronologically by time range. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `time range` | No | Natural language time reference ("yesterday", "past 3 days", "session abc123") | +| Parameter | Required | Description | +| ------------ | -------- | ------------------------------------------------------------------------------ | +| `time range` | No | Natural language time reference ("yesterday", "past 3 days", "session abc123") | **When to use**: "What did I work on yesterday?", "Show me the last session", rebuilding context from a specific time period. @@ -122,9 +123,9 @@ Replay a past session chronologically by time range. Factual recap of what was done across recent sessions. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `time range` | No | Natural language time reference. Defaults to past 3 days | +| Parameter | Required | Description | +| ------------ | -------- | -------------------------------------------------------- | +| `time range` | No | Natural language time reference. Defaults to past 3 days | **When to use**: Sprint reviews, daily standups, tracking accomplishments and in-progress work. @@ -162,9 +163,9 @@ Check system health and memory statistics. Search across all projects for reusable patterns and solutions. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `pattern` | Yes | Pattern or topic to search for across projects | +| Parameter | Required | Description | +| --------- | -------- | ---------------------------------------------- | +| `pattern` | Yes | Pattern or topic to search for across projects | **When to use**: Looking for how something was solved in other projects, cross-project knowledge transfer, finding reusable patterns. @@ -176,9 +177,9 @@ Search across all projects for reusable patterns and solutions. Surface recurring patterns, problems, and decisions across sessions. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `scope` | No | Time range or topic. Defaults to past 30 days | +| Parameter | Required | Description | +| --------- | -------- | --------------------------------------------- | +| `scope` | No | Time range or topic. Defaults to past 30 days | **When to use**: Sprint retrospectives, identifying recurring themes, reviewing work patterns. @@ -206,20 +207,21 @@ Memory-informed codebase review and cleanup plan. Delete memory by topic, time range, or session. Always previews before deleting. -| Parameter | Required | Description | -|-----------|----------|-------------| -| `query` | No | Semantic query for topic-based deletion | -| `threshold` | No | Similarity threshold (0--1, default 0.6). Higher = more selective | -| `before` | No | Delete chunks before this ISO 8601 date | -| `after` | No | Delete chunks on or after this ISO 8601 date | -| `session_id` | No | Delete chunks from a specific session | -| `project` | Yes | Project slug (derived from cwd or asked) | +| Parameter | Required | Description | +| ------------ | -------- | ----------------------------------------------------------------- | +| `query` | No | Semantic query for topic-based deletion | +| `threshold` | No | Similarity threshold (0--1, default 0.6). Higher = more selective | +| `before` | No | Delete chunks before this ISO 8601 date | +| `after` | No | Delete chunks on or after this ISO 8601 date | +| `session_id` | No | Delete chunks from a specific session | +| `project` | Yes | Project slug (derived from cwd or asked) | **Workflow**: Always previews first (dry-run), shows what would be deleted, waits for explicit user confirmation before deleting. **When to use**: User asks to forget, remove, or clean up specific memory. Memory contains incorrect or outdated information. **Examples**: + - `/causantic-forget authentication flow` -- delete memory about authentication - `/causantic-forget everything before January` -- time-based deletion - `/causantic-forget session abc12345` -- delete a specific session @@ -228,19 +230,19 @@ Delete memory by topic, time range, or session. Always previews before deleting. ## Quick Decision Guide -| User intent | Skill | -|-------------|-------| -| "What do I know about X?" | `search` | -| "How did we solve X?" | `recall` | -| "Why does X work this way?" | `explain` | -| "What might be relevant?" | `predict` | -| "Where did I leave off?" | `resume` | -| "What did I work on yesterday?" | `reconstruct` | -| "Summarize this week" | `summary` | -| "How did other projects handle X?" | `crossref` | -| "What patterns do I see?" | `retro` | -| "Review and clean up this codebase" | `cleanup` | -| "Forget/delete memory about X" | `forget` | +| User intent | Skill | +| ----------------------------------- | ------------- | +| "What do I know about X?" | `search` | +| "How did we solve X?" | `recall` | +| "Why does X work this way?" | `explain` | +| "What might be relevant?" | `predict` | +| "Where did I leave off?" | `resume` | +| "What did I work on yesterday?" | `reconstruct` | +| "Summarize this week" | `summary` | +| "How did other projects handle X?" | `crossref` | +| "What patterns do I see?" | `retro` | +| "Review and clean up this codebase" | `cleanup` | +| "Forget/delete memory about X" | `forget` | ## Skill vs MCP Tool diff --git a/docs/reference/storage-api.md b/docs/reference/storage-api.md index 5d30a94..ba557dd 100644 --- a/docs/reference/storage-api.md +++ b/docs/reference/storage-api.md @@ -6,13 +6,13 @@ Reference documentation for Causantic's storage layer APIs. The storage layer provides persistence for the Causantic memory system. It consists of several stores: -| Store | Purpose | Module | -|-------|---------|--------| -| Chunk Store | Conversation segments | `chunk-store.ts` | -| Edge Store | Sequential causal connections | `edge-store.ts` | -| Vector Store | Embedding vectors for similarity search | `vector-store.ts` | +| Store | Purpose | Module | +| ------------- | --------------------------------------- | ------------------ | +| Chunk Store | Conversation segments | `chunk-store.ts` | +| Edge Store | Sequential causal connections | `edge-store.ts` | +| Vector Store | Embedding vectors for similarity search | `vector-store.ts` | | Keyword Store | FTS5 full-text search with BM25 ranking | `keyword-store.ts` | -| Cluster Store | Topic groupings | `cluster-store.ts` | +| Cluster Store | Topic groupings | `cluster-store.ts` | All stores use SQLite for persistence via `better-sqlite3-multiple-ciphers`. @@ -24,19 +24,19 @@ Chunks are the fundamental unit of storage, representing segments of conversatio ```typescript interface StoredChunk { - id: string; // UUID - sessionId: string; // Claude session ID - sessionSlug: string; // Project folder name - turnIndices: number[]; // Turn indices included (0-based) - startTime: string; // ISO timestamp of first message - endTime: string; // ISO timestamp of last message - content: string; // Rendered text content - codeBlockCount: number; // Number of code blocks - toolUseCount: number; // Number of tool uses - approxTokens: number; // Approximate token count - createdAt: string; // ISO timestamp when stored - agentId: string | null; // 'ui' for main, agent ID for sub-agents - spawnDepth: number; // 0=main, 1=sub-agent, 2=nested + id: string; // UUID + sessionId: string; // Claude session ID + sessionSlug: string; // Project folder name + turnIndices: number[]; // Turn indices included (0-based) + startTime: string; // ISO timestamp of first message + endTime: string; // ISO timestamp of last message + content: string; // Rendered text content + codeBlockCount: number; // Number of code blocks + toolUseCount: number; // Number of tool uses + approxTokens: number; // Approximate token count + createdAt: string; // ISO timestamp when stored + agentId: string | null; // 'ui' for main, agent ID for sub-agents + spawnDepth: number; // 0=main, 1=sub-agent, 2=nested } ``` @@ -51,9 +51,9 @@ interface StoredEdge { targetChunkId: string; edgeType: 'backward' | 'forward'; referenceType: ReferenceType | null; - initialWeight: number; // 0-1, before decay + initialWeight: number; // 0-1, before decay createdAt: string; - linkCount: number; // Boost count for duplicates + linkCount: number; // Boost count for duplicates } ``` @@ -61,12 +61,12 @@ interface StoredEdge { Edge reference types are purely structural roles that determine initial weight: -| Type | Weight | Description | -|------|--------|-------------| -| `within-chain` | 1.0 | D-T-D causal edge within one thinking entity (m×n all-pairs at turn boundaries) | -| `brief` | 0.9 | Parent agent spawning a sub-agent (m×n all-pairs, with 0.9^depth penalty) | -| `debrief` | 0.9 | Sub-agent returning results to parent (m×n all-pairs, with 0.9^depth penalty) | -| `cross-session` | 0.7 | Session continuation (previous final chunks ↔ new first chunks, m×n) | +| Type | Weight | Description | +| --------------- | ------ | ------------------------------------------------------------------------------- | +| `within-chain` | 1.0 | D-T-D causal edge within one thinking entity (m×n all-pairs at turn boundaries) | +| `brief` | 0.9 | Parent agent spawning a sub-agent (m×n all-pairs, with 0.9^depth penalty) | +| `debrief` | 0.9 | Sub-agent returning results to parent (m×n all-pairs, with 0.9^depth penalty) | +| `cross-session` | 0.7 | Session continuation (previous final chunks ↔ new first chunks, m×n) | ### Weighted Edges @@ -74,7 +74,7 @@ During traversal, edges include computed weight after decay: ```typescript interface WeightedEdge extends StoredEdge { - weight: number; // Computed: initialWeight × hopDecay(depth) × linkBoost + weight: number; // Computed: initialWeight × hopDecay(depth) × linkBoost } ``` @@ -278,11 +278,11 @@ Close the database connection. Storage operations throw standard JavaScript errors. Common error scenarios: -| Scenario | Error Type | -|----------|------------| -| Database not initialized | `Error: Database not initialized` | -| Duplicate ID | `Error: SQLITE_CONSTRAINT` | -| Invalid foreign key | `Error: SQLITE_CONSTRAINT` | +| Scenario | Error Type | +| ------------------------- | ----------------------------------------- | +| Database not initialized | `Error: Database not initialized` | +| Duplicate ID | `Error: SQLITE_CONSTRAINT` | +| Invalid foreign key | `Error: SQLITE_CONSTRAINT` | | Vector dimension mismatch | Detected at search time via NaN distances | ## Transaction Support @@ -298,15 +298,15 @@ vectorStore.insertBatch([...]); ## Performance Notes -| Operation | Complexity | Notes | -|-----------|------------|-------| -| Chunk lookup by ID | O(1) | Primary key index | -| Chunks by session | O(k) | Indexed by session_id | -| Edge lookup | O(1) | Primary key index | -| Outgoing edges | O(k) | Indexed by source_chunk_id | -| Vector search | O(n) | Brute-force, optimize if >100k vectors | -| Keyword search | O(log n) | FTS5 inverted index with BM25 ranking | -| Batch insert | O(n) | Single transaction | +| Operation | Complexity | Notes | +| ------------------ | ---------- | -------------------------------------- | +| Chunk lookup by ID | O(1) | Primary key index | +| Chunks by session | O(k) | Indexed by session_id | +| Edge lookup | O(1) | Primary key index | +| Outgoing edges | O(k) | Indexed by source_chunk_id | +| Vector search | O(n) | Brute-force, optimize if >100k vectors | +| Keyword search | O(log n) | FTS5 inverted index with BM25 ranking | +| Batch insert | O(n) | Single transaction | ## Related diff --git a/docs/reference/traversal-algorithm.md b/docs/reference/traversal-algorithm.md index c01244c..bed8d14 100644 --- a/docs/reference/traversal-algorithm.md +++ b/docs/reference/traversal-algorithm.md @@ -6,18 +6,15 @@ Causantic uses a chain-walking algorithm to reconstruct episodic narratives from The causal graph is a **sequential linked list** with branch points at sub-agent forks. The chain walker follows directed edges to build ordered narrative chains from seed chunks. -| Direction | Edge following | Use case | -|-----------|---------------|----------| +| Direction | Edge following | Use case | +| ------------ | ----------------------------------------- | ----------------------------------- | | **Backward** | Follow edges where target = current chunk | `recall` — "how did we solve this?" | -| **Forward** | Follow edges where source = current chunk | `predict` — "what comes next?" | +| **Forward** | Follow edges where source = current chunk | `predict` — "what comes next?" | ## Core Algorithm ```typescript -function walkChains( - seedIds: string[], - options: ChainWalkerOptions -): Chain[] +function walkChains(seedIds: string[], options: ChainWalkerOptions): Chain[]; ``` ### Pseudocode @@ -95,15 +92,16 @@ Query Edges are stored as single `forward` rows: -| Field | Value | -|-------|-------| -| `edge_type` | Always `'forward'` | -| `reference_type` | `'within-chain'`, `'cross-session'`, `'brief'`, or `'debrief'` | -| `source_chunk_id` | Earlier chunk | -| `target_chunk_id` | Later chunk | -| `initial_weight` | Always `1.0` | +| Field | Value | +| ----------------- | -------------------------------------------------------------- | +| `edge_type` | Always `'forward'` | +| `reference_type` | `'within-chain'`, `'cross-session'`, `'brief'`, or `'debrief'` | +| `source_chunk_id` | Earlier chunk | +| `target_chunk_id` | Later chunk | +| `initial_weight` | Always `1.0` | Direction is inferred at query time: + - **Forward edges**: `source_chunk_id = chunkId AND edge_type = 'forward'` - **Backward edges**: `target_chunk_id = chunkId AND edge_type = 'forward'` @@ -131,9 +129,9 @@ Median is robust to bridge nodes (semantic novelty) in short chains. A 3-node ch ```typescript interface ChainWalkerOptions { direction: 'forward' | 'backward'; - tokenBudget: number; // Max tokens across all chains + tokenBudget: number; // Max tokens across all chains queryEmbedding: number[]; // For per-node scoring - maxDepth?: number; // Safety cap (default: from config, typically 50) + maxDepth?: number; // Safety cap (default: from config, typically 50) } ``` @@ -151,12 +149,12 @@ interface ChainWalkerOptions { ## Performance Characteristics -| Aspect | Behavior | -|--------|----------| -| Time complexity | O(S × L) where S = seeds (5), L = max chain length | -| Space complexity | O(V) where V = unique chunks visited | -| Edge lookups | O(1) per hop via indexed queries | -| Scoring | O(1) per node (in-memory vector Map lookup + dot product) | +| Aspect | Behavior | +| ---------------- | --------------------------------------------------------- | +| Time complexity | O(S × L) where S = seeds (5), L = max chain length | +| Space complexity | O(V) where V = unique chunks visited | +| Edge lookups | O(1) per hop via indexed queries | +| Scoring | O(1) per node (in-memory vector Map lookup + dot product) | ### Optimizations diff --git a/docs/research/README.md b/docs/research/README.md index 95f27c4..82f3002 100644 --- a/docs/research/README.md +++ b/docs/research/README.md @@ -59,7 +59,7 @@ The research findings above shaped v0.2's architecture. v0.3.0 made significant - **Vector clocks removed**: Hop-based decay (itself removed with traversal) replaced vector-clock hop counting. - **`search` tool replaces `explain`**: Honest about what it does — pure semantic discovery with optional chain context. -The core insights remain valid: causal structure matters more than wall-clock time, lexical features detect topic shifts, and HDBSCAN clustering provides topic organization. What changed is *how* the causal graph is used — for structural ordering, not semantic ranking. +The core insights remain valid: causal structure matters more than wall-clock time, lexical features detect topic shifts, and HDBSCAN clustering provides topic organization. What changed is _how_ the causal graph is used — for structural ordering, not semantic ranking. See [experiments/lessons-learned.md](experiments/lessons-learned.md) for detailed post-mortems. @@ -75,7 +75,7 @@ See [approach/role-of-entropy.md](approach/role-of-entropy.md). ### Why Causal Graphs? -Unlike simple vector databases, Causantic tracks *relationships* between memory chunks: +Unlike simple vector databases, Causantic tracks _relationships_ between memory chunks: - **Causality**: What led to what - **Temporal ordering**: Edge age tracks recency diff --git a/docs/research/approach/dual-integration.md b/docs/research/approach/dual-integration.md index 022f78f..bfb6fa7 100644 --- a/docs/research/approach/dual-integration.md +++ b/docs/research/approach/dual-integration.md @@ -18,6 +18,7 @@ Hooks fire automatically at key moments: **Trigger**: New Claude Code session begins **Actions**: + - Query recent relevant context - Generate memory summary - Update CLAUDE.md @@ -29,6 +30,7 @@ Hooks fire automatically at key moments: **Trigger**: Before conversation history is compressed **Actions**: + - Ingest current session content - Create chunks and edges - Generate embeddings @@ -59,14 +61,14 @@ Claude identifies relevant historical context without being asked. ## Why Both? -| Capability | Hooks | MCP | -|------------|-------|-----| -| Automatic capture | Yes | No | -| On-demand queries | No | Yes | -| Background operation | Yes | No | -| Interactive | No | Yes | -| Context priming | Yes | Partial | -| User-initiated recall | No | Yes | +| Capability | Hooks | MCP | +| --------------------- | ----- | ------- | +| Automatic capture | Yes | No | +| On-demand queries | No | Yes | +| Background operation | Yes | No | +| Interactive | No | Yes | +| Context priming | Yes | Partial | +| User-initiated recall | No | Yes | ### Hooks Excel At diff --git a/docs/research/approach/hdbscan-performance.md b/docs/research/approach/hdbscan-performance.md index 0488a63..8381211 100644 --- a/docs/research/approach/hdbscan-performance.md +++ b/docs/research/approach/hdbscan-performance.md @@ -12,9 +12,12 @@ The `hdbscan-ts` package has O(n² × k) complexity due to `Array.includes()` ca ```javascript // Problematic pattern in hdbscan-ts -for (const point of points) { // O(n) - for (const cluster of clusters) { // O(k) - if (cluster.includes(point)) { // O(n) - Array.includes is linear! +for (const point of points) { + // O(n) + for (const cluster of clusters) { + // O(k) + if (cluster.includes(point)) { + // O(n) - Array.includes is linear! // ... } } @@ -29,13 +32,13 @@ Causantic now uses a native TypeScript HDBSCAN implementation with proper data s ### Key Optimizations -| Issue | hdbscan-ts | Native Implementation | -|-------|------------|----------------------| -| Point lookup | `Array.includes()` O(n) | `Set.has()` O(1) | -| Memory | JS arrays | Float64Array | -| Union-Find | None | Path compression + rank | -| k-th nearest | Full sort O(n log n) | Quickselect O(n) | -| Core distances | Single-threaded | Parallel (worker_threads) | +| Issue | hdbscan-ts | Native Implementation | +| -------------- | ----------------------- | ------------------------- | +| Point lookup | `Array.includes()` O(n) | `Set.has()` O(1) | +| Memory | JS arrays | Float64Array | +| Union-Find | None | Path compression + rank | +| k-th nearest | Full sort O(n log n) | Quickselect O(n) | +| Core distances | Single-threaded | Parallel (worker_threads) | ### Algorithm Steps @@ -48,11 +51,11 @@ Causantic now uses a native TypeScript HDBSCAN implementation with proper data s ### Performance | Dataset Size | hdbscan-ts (old) | Native (new) | -|--------------|------------------|--------------| -| 500 | ~5s | ~0.3s | -| 1,000 | ~30s | ~1s | -| 2,000 | ~3 min | ~4s | -| 6,000 | 65+ min | ~30s | +| ------------ | ---------------- | ------------ | +| 500 | ~5s | ~0.3s | +| 1,000 | ~30s | ~1s | +| 2,000 | ~3 min | ~4s | +| 6,000 | 65+ min | ~30s | ### Features diff --git a/docs/research/approach/landscape-analysis.md b/docs/research/approach/landscape-analysis.md index 7ccd443..65317de 100644 --- a/docs/research/approach/landscape-analysis.md +++ b/docs/research/approach/landscape-analysis.md @@ -4,16 +4,16 @@ How Causantic compares to existing AI memory systems, and why it takes a differe ## Competitor Feature Matrix -| System | Local-First | Temporal Decay | Graph Structure | Self-Benchmarking | Hop-Based Distance | -|--------|:-----------:|:--------------:|:--------------:|:-----------------:|:------------------:| -| **Causantic** | **Yes** | **Hop-based** | **Causal graph** | **Yes** | **Yes** | -| Mem0 | No (Cloud) | None | Paid add-on | No | No | -| Cognee | Self-hostable | None | Triplet extraction | No | No | -| Letta/MemGPT | Self-hostable | Summarization | None | No | No | -| Zep | Enterprise | Bi-temporal | Temporal KG | No | No | -| Supermemory | Cloudflare | Dual timestamps | Secondary | No | No | -| A-MEM | Research only | None | Zettelkasten | No | No | -| GraphRAG | Self-hostable | Static corpus | Hierarchical | No | No | +| System | Local-First | Temporal Decay | Graph Structure | Self-Benchmarking | Hop-Based Distance | +| ------------- | :-----------: | :-------------: | :----------------: | :---------------: | :----------------: | +| **Causantic** | **Yes** | **Hop-based** | **Causal graph** | **Yes** | **Yes** | +| Mem0 | No (Cloud) | None | Paid add-on | No | No | +| Cognee | Self-hostable | None | Triplet extraction | No | No | +| Letta/MemGPT | Self-hostable | Summarization | None | No | No | +| Zep | Enterprise | Bi-temporal | Temporal KG | No | No | +| Supermemory | Cloudflare | Dual timestamps | Secondary | No | No | +| A-MEM | Research only | None | Zettelkasten | No | No | +| GraphRAG | Self-hostable | Static corpus | Hierarchical | No | No | ## System Summaries @@ -39,16 +39,16 @@ Zettelkasten-inspired agentic memory with bidirectional linking. Only system wit ## Gap Analysis -| Gap | Current Landscape | Causantic's Approach | -|-----|-------------------|---------------------| -| **Local-first + sophisticated** | Cloud systems are sophisticated; local systems are simplistic | Full causal graph + clustering + hybrid search, all on your machine | -| **Hop-based decay** | Wall-clock time or none | Logical D-T-D hops preserve cross-session continuity | -| **Direction-specific retrieval** | Symmetric or none | Backward (dies@10 hops) vs forward (delayed, dies@20 hops) | -| **Self-benchmarking** | No system measures its own retrieval quality | Built-in benchmark suite with tuning recommendations | -| **Claude Code native** | General-purpose or platform-agnostic | Purpose-built hooks, MCP tools, and CLAUDE.md generation | +| Gap | Current Landscape | Causantic's Approach | +| -------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------- | +| **Local-first + sophisticated** | Cloud systems are sophisticated; local systems are simplistic | Full causal graph + clustering + hybrid search, all on your machine | +| **Hop-based decay** | Wall-clock time or none | Logical D-T-D hops preserve cross-session continuity | +| **Direction-specific retrieval** | Symmetric or none | Backward (dies@10 hops) vs forward (delayed, dies@20 hops) | +| **Self-benchmarking** | No system measures its own retrieval quality | Built-in benchmark suite with tuning recommendations | +| **Claude Code native** | General-purpose or platform-agnostic | Purpose-built hooks, MCP tools, and CLAUDE.md generation | ## Key Differentiator -Most memory systems optimize for *storing* memories. Causantic optimizes for *retrieving the right context at the right time* — using hybrid BM25+vector search with causal chain walking for episodic narrative context. (The 4.65× augmentation figure was a v0.2 research result using sum-product traversal, since replaced by chain walking — see [experiments/graph-traversal.md](../experiments/graph-traversal.md).) +Most memory systems optimize for _storing_ memories. Causantic optimizes for _retrieving the right context at the right time_ — using hybrid BM25+vector search with causal chain walking for episodic narrative context. (The 4.65× augmentation figure was a v0.2 research result using sum-product traversal, since replaced by chain walking — see [experiments/graph-traversal.md](../experiments/graph-traversal.md).) -*Condensed from the [full feasibility study](../archive/feasibility-study.md). See the archive for detailed per-system analysis including architecture diagrams and benchmark methodology.* +_Condensed from the [full feasibility study](../archive/feasibility-study.md). See the archive for detailed per-system analysis including architecture diagrams and benchmark methodology._ diff --git a/docs/research/approach/role-of-entropy.md b/docs/research/approach/role-of-entropy.md index 1df470c..6291838 100644 --- a/docs/research/approach/role-of-entropy.md +++ b/docs/research/approach/role-of-entropy.md @@ -63,7 +63,7 @@ Causantic used a **sum-product** calculation for node weights, analogous to Feyn ### Product Along Paths -Edge weights multiply along a single path. For a path of length *n* with edge weights *w₁, w₂, ..., wₙ*: +Edge weights multiply along a single path. For a path of length _n_ with edge weights _w₁, w₂, ..., wₙ_: ``` path_weight = w₁ × w₂ × ... × wₙ @@ -117,13 +117,13 @@ Since edge weights are <1, each additional cycle multiplies by a factor <1. The This mirrors perturbation theory in quantum field theory: -| Perturbation Theory | Semantic Graph | -|---------------------|----------------| -| Coupling constant α < 1 | Edge weight ∈ (0,1] | +| Perturbation Theory | Semantic Graph | +| -------------------------------------- | --------------------------------------- | +| Coupling constant α < 1 | Edge weight ∈ (0,1] | | Higher-order diagrams suppressed by αⁿ | Longer paths suppressed by w₁×w₂×...×wₙ | -| Sum over all diagrams | Sum over all paths | -| Renormalization handles infinities | Normalisation keeps weights bounded | -| Loop diagrams finite | Cycles attenuate naturally | +| Sum over all diagrams | Sum over all paths | +| Renormalization handles infinities | Normalisation keeps weights bounded | +| Loop diagrams finite | Cycles attenuate naturally | Just as Feynman diagrams with more loops contribute less to physical amplitudes (suppressed by powers of α), graph cycles contribute diminishingly to node influence (suppressed by products of weights <1). @@ -132,11 +132,13 @@ Just as Feynman diagrams with more loops contribute less to physical amplitudes Entropy accumulates differently by traversal direction: **Backward edges** (historical context): "What caused this?" + - Linear decay, dies at 10 hops - Discrimination fades quickly into the past - Recent causes are sharply discriminated; old causes blur together **Forward edges** (predictive context): "What might follow?" + - Delayed linear, holds for 5 hops, dies at 20 hops - Immediate predictions stay discriminated longer - Anticipatory context retains information longer before entropy dominates @@ -145,16 +147,16 @@ Entropy accumulates differently by traversal direction: A key insight from the design: -> Edge accumulation encodes frequency of co-occurrence, decay encodes recency, and path products encode causal distance. **The graph *is* the clock.** +> Edge accumulation encodes frequency of co-occurrence, decay encodes recency, and path products encode causal distance. **The graph _is_ the clock.** Traditional systems use external timestamps and apply global decay. Causantic embeds temporal dynamics directly into the graph structure — entropy flows through the graph topology itself. -| Aspect | Traditional | Entropic | -|--------|-------------|----------| -| Time reference | Wall clock | Causal hops | -| Entropy source | Age threshold | Path attenuation | +| Aspect | Traditional | Entropic | +| -------------- | ---------------- | ---------------------------- | +| Time reference | Wall clock | Causal hops | +| Entropy source | Age threshold | Path attenuation | | Discrimination | Binary (old/new) | Continuous (weight products) | -| Compression | Arbitrary cutoff | Natural convergence to zero | +| Compression | Arbitrary cutoff | Natural convergence to zero | ## Maximum Entropy Edge Creation (Historical — v0.2) diff --git a/docs/research/approach/why-causal-graphs.md b/docs/research/approach/why-causal-graphs.md index b58fd04..31b127d 100644 --- a/docs/research/approach/why-causal-graphs.md +++ b/docs/research/approach/why-causal-graphs.md @@ -82,11 +82,11 @@ With chunks as nodes: The current design separates concerns: -| Concern | Mechanism | Unit | -|---------|-----------|------| -| Causal traversal | Edge weights + decay | Chunks (precise) | -| Topic discovery | HDBSCAN clustering | Clusters (semantic grouping) | -| Entry point search | Vector similarity | Embeddings (similarity) | +| Concern | Mechanism | Unit | +| ------------------ | -------------------- | ---------------------------- | +| Causal traversal | Edge weights + decay | Chunks (precise) | +| Topic discovery | HDBSCAN clustering | Clusters (semantic grouping) | +| Entry point search | Vector similarity | Embeddings (similarity) | Clusters serve as a **lens for browsing and labeling** rather than a **unit of causality**. This keeps the entropic decay well-behaved while still providing topic organization. @@ -94,22 +94,22 @@ Clusters serve as a **lens for browsing and labeling** rather than a **unit of c Causantic uses purely structural edge types — semantic association is handled by vector search and clustering. Edges encode causal structure only: -| Type | Weight | Description | -|------|--------|-------------| -| `within-chain` | 1.0 | D-T-D causal edge within one thinking entity. Created as m×n all-pairs at each consecutive turn boundary, with topic-shift gating. | -| `brief` | 0.9 | Parent agent spawning a sub-agent. m×n all-pairs between parent turn chunks and sub-agent first-turn chunks. 0.9^depth penalty for nested agents. | -| `debrief` | 0.9 | Sub-agent returning results to parent. m×n all-pairs between sub-agent final-turn chunks and parent turn chunks. 0.9^depth penalty. | -| `cross-session` | 0.7 | Session continuation. m×n between previous session's final-turn chunks and new session's first-turn chunks. | +| Type | Weight | Description | +| --------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------- | +| `within-chain` | 1.0 | D-T-D causal edge within one thinking entity. Created as m×n all-pairs at each consecutive turn boundary, with topic-shift gating. | +| `brief` | 0.9 | Parent agent spawning a sub-agent. m×n all-pairs between parent turn chunks and sub-agent first-turn chunks. 0.9^depth penalty for nested agents. | +| `debrief` | 0.9 | Sub-agent returning results to parent. m×n all-pairs between sub-agent final-turn chunks and parent turn chunks. 0.9^depth penalty. | +| `cross-session` | 0.7 | Session continuation. m×n between previous session's final-turn chunks and new session's first-turn chunks. | ### Design evolution: from max entropy to sequential edges -| Aspect | v0.2 (m×n all-pairs) | v0.3 (sequential) | -|--------|----------------------|-------------------| -| Edge topology | m×n at each turn boundary | 1-to-1 linked list | -| Edges per transition | O(m×n), e.g. 5×5 = 25 | O(max(m,n)), e.g. 5 | -| Retrieval mechanism | Sum-product traversal with decay | Chain walking with cosine scoring | -| Scoring | Multiplicative path products | Independent cosine similarity per hop | -| Edge types | 4 structural (within-chain, cross-session, brief, debrief) | Sequential + cross-session + brief/debrief | +| Aspect | v0.2 (m×n all-pairs) | v0.3 (sequential) | +| -------------------- | ---------------------------------------------------------- | ------------------------------------------ | +| Edge topology | m×n at each turn boundary | 1-to-1 linked list | +| Edges per transition | O(m×n), e.g. 5×5 = 25 | O(max(m,n)), e.g. 5 | +| Retrieval mechanism | Sum-product traversal with decay | Chain walking with cosine scoring | +| Scoring | Multiplicative path products | Independent cosine similarity per hop | +| Edge types | 4 structural (within-chain, cross-session, brief, debrief) | Sequential + cross-session + brief/debrief | > **Historical note**: v0.2 used m×n all-pairs edges with sum-product traversal. This was theoretically motivated by maximum entropy (don't impose false structure), but in practice the graph contributed only ~2% of retrieval results. v0.3 uses sequential 1-to-1 edges — simpler, fewer edges, and the graph's value is structural ordering (episodic narratives) rather than semantic ranking. > @@ -117,11 +117,11 @@ Causantic uses purely structural edge types — semantic association is handled ## Comparison -| Approach | Finds Similar | Finds Related | Temporal Aware | -|----------|--------------|---------------|----------------| -| Vector DB | Yes | No | No | -| Graph DB | No | Yes | Partial | -| Causantic | Yes | Yes | Yes | +| Approach | Finds Similar | Finds Related | Temporal Aware | +| --------- | ------------- | ------------- | -------------- | +| Vector DB | Yes | No | No | +| Graph DB | No | Yes | Partial | +| Causantic | Yes | Yes | Yes | ## Results diff --git a/docs/research/archive/README.md b/docs/research/archive/README.md index 41a70c4..7961f6f 100644 --- a/docs/research/archive/README.md +++ b/docs/research/archive/README.md @@ -6,17 +6,17 @@ For distilled, up-to-date documentation, see the parent [Research](../README.md) ## Documents -| Document | Description | -|----------|-------------| -| [feasibility-study.md](feasibility-study.md) | Initial feasibility analysis including competitor landscape, integration points, causal graph formalism, and architecture recommendation | -| [pre-implementation-plan.md](pre-implementation-plan.md) | Prioritized checklist of ~25 open questions organized into P0/P1/P2, with resolved answers from experiments | -| [edge-decay-model.md](edge-decay-model.md) | Comprehensive design document for temporal decay curves including multi-linear, delayed linear, and exponential models with full experiment results | -| [session-data-inventory.md](session-data-inventory.md) | Audit of 32 projects, 251 sessions, 3.5 GB of Claude Code session data used for experiments | -| [embedding-benchmark-results.md](embedding-benchmark-results.md) | Two-run embedding model benchmark (66 → 294 chunks) plus 5 follow-up experiments on jina-small | -| [topic-continuity-results.md](topic-continuity-results.md) | Topic boundary detection experiment across 75 sessions, 2,817 transitions — lexical-only achieves 0.998 AUC | -| [vector-clocks.md](vector-clocks.md) | D-T-D vector clock model for logical distance — replaced by chain walking in v0.3 | -| [decay-models.md](decay-models.md) | Hop-based decay models (exponential backward, linear forward) — removed in v0.3 | -| [decay-curves.md](decay-curves.md) | Decay curve experiments: 9 models, 30 sessions, MRR analysis — superseded by chain walking | +| Document | Description | +| ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | +| [feasibility-study.md](feasibility-study.md) | Initial feasibility analysis including competitor landscape, integration points, causal graph formalism, and architecture recommendation | +| [pre-implementation-plan.md](pre-implementation-plan.md) | Prioritized checklist of ~25 open questions organized into P0/P1/P2, with resolved answers from experiments | +| [edge-decay-model.md](edge-decay-model.md) | Comprehensive design document for temporal decay curves including multi-linear, delayed linear, and exponential models with full experiment results | +| [session-data-inventory.md](session-data-inventory.md) | Audit of 32 projects, 251 sessions, 3.5 GB of Claude Code session data used for experiments | +| [embedding-benchmark-results.md](embedding-benchmark-results.md) | Two-run embedding model benchmark (66 → 294 chunks) plus 5 follow-up experiments on jina-small | +| [topic-continuity-results.md](topic-continuity-results.md) | Topic boundary detection experiment across 75 sessions, 2,817 transitions — lexical-only achieves 0.998 AUC | +| [vector-clocks.md](vector-clocks.md) | D-T-D vector clock model for logical distance — replaced by chain walking in v0.3 | +| [decay-models.md](decay-models.md) | Hop-based decay models (exponential backward, linear forward) — removed in v0.3 | +| [decay-curves.md](decay-curves.md) | Decay curve experiments: 9 models, 30 sessions, MRR analysis — superseded by chain walking | ## Relationship to Current Docs diff --git a/docs/research/archive/decay-curves.md b/docs/research/archive/decay-curves.md index 9ef54fb..2c97f18 100644 --- a/docs/research/archive/decay-curves.md +++ b/docs/research/archive/decay-curves.md @@ -52,15 +52,16 @@ Edge weights should decay based on logical hop distance (turn count difference), The ground truth from 30 sessions reveals two distinct regimes: | Hop Distance | Turn Pairs | Reference Rate | Normalized | -|-------------|-----------|----------------|------------| -| 1 hop | 1,512 | 1.150 | 100% | -| 2-3 hops | 2,934 | 0.136 | 12% | -| 4-6 hops | 4,176 | 0.126 | 11% | -| 7-10 hops | 5,148 | 0.102 | 9% | -| 11-20 hops | 10,815 | 0.093 | 8% | -| 21+ hops | 41,785 | 0.031 | 3% | +| ------------ | ---------- | -------------- | ---------- | +| 1 hop | 1,512 | 1.150 | 100% | +| 2-3 hops | 2,934 | 0.136 | 12% | +| 4-6 hops | 4,176 | 0.126 | 11% | +| 7-10 hops | 5,148 | 0.102 | 9% | +| 11-20 hops | 10,815 | 0.093 | 8% | +| 21+ hops | 41,785 | 0.031 | 3% | **Two regimes**: + 1. **Steep initial drop** (hop 1 → 2): 88% loss — adjacent turns are overwhelmingly the most referenced 2. **Long slow tail** (hop 2 → 21+): gradual decline from 12% to 3% — references exist at all distances @@ -70,42 +71,42 @@ The ground truth from 30 sessions reveals two distinct regimes: ### Overall MRR -| Model | MRR | Rank@1 | Rank@2-5 | Rank@6+ | -|-------|-----|--------|----------|---------| -| Simple Linear | 0.985 | 1,479 | 25 | 8 | -| Multi-Linear (Fast) | **0.985** | **1,479** | 25 | 8 | -| Multi-Linear (Default) | 0.985 | 1,479 | 25 | 8 | -| Exponential | 0.985 | 1,479 | 25 | 8 | -| Exponential (Slow) | 0.985 | 1,479 | 25 | 8 | -| Power Law (α=1) | 0.985 | 1,479 | 25 | 8 | -| Power Law (α=2) | 0.985 | 1,479 | 25 | 8 | -| Delayed Linear | 0.423 | 188 | 1,316 | 8 | -| Multi-Linear (Slow) | 0.423 | 188 | 1,316 | 8 | +| Model | MRR | Rank@1 | Rank@2-5 | Rank@6+ | +| ---------------------- | --------- | --------- | -------- | ------- | +| Simple Linear | 0.985 | 1,479 | 25 | 8 | +| Multi-Linear (Fast) | **0.985** | **1,479** | 25 | 8 | +| Multi-Linear (Default) | 0.985 | 1,479 | 25 | 8 | +| Exponential | 0.985 | 1,479 | 25 | 8 | +| Exponential (Slow) | 0.985 | 1,479 | 25 | 8 | +| Power Law (α=1) | 0.985 | 1,479 | 25 | 8 | +| Power Law (α=2) | 0.985 | 1,479 | 25 | 8 | +| Delayed Linear | 0.423 | 188 | 1,316 | 8 | +| Multi-Linear (Slow) | 0.423 | 188 | 1,316 | 8 | **Key finding**: All strictly monotonic models score identically (MRR=0.985). Models with hold periods (Delayed Linear, Multi-Linear Slow) score dramatically worse (0.423) because the plateau at short range creates ties that prevent ranking the nearest turn first. ### Hop-Distance Correlation (ρ) -| Model | Spearman ρ | -|-------|-----------| -| Simple Linear | **1.000** | -| Multi-Linear (Default) | **1.000** | -| Exponential | **1.000** | -| Power Law (α=1, α=2) | **1.000** | -| Delayed Linear | 0.943 | -| Multi-Linear (Slow) | 0.943 | +| Model | Spearman ρ | +| ---------------------- | ---------- | +| Simple Linear | **1.000** | +| Multi-Linear (Default) | **1.000** | +| Exponential | **1.000** | +| Power Law (α=1, α=2) | **1.000** | +| Delayed Linear | 0.943 | +| Multi-Linear (Slow) | 0.943 | All strictly monotonic models achieve perfect correlation with the empirical reference rate curve. Hold periods break monotonicity and reduce correlation. ### Stratified Analysis (Backward) -| Stratum | Refs | Best Model | MRR | Rank@1 | -|---------|------|------------|-----|--------| -| All references | 5,505 | Multi-Linear (Fast) | 0.985 | 98% | -| Non-adjacent (>1 hop) | 3,766 | Delayed Linear | 0.693 | 56% | -| Mid-range (>3 hops) | 3,366 | Multi-Linear (Fast) | 0.208 | 0% | -| Long-range (>5 hops, high conf) | 1,753 | Multi-Linear (Fast) | 0.142 | 0% | -| Very long-range (>10 hops) | 2,315 | Multi-Linear (Fast) | 0.088 | 0% | +| Stratum | Refs | Best Model | MRR | Rank@1 | +| ------------------------------- | ----- | ------------------- | ----- | ------ | +| All references | 5,505 | Multi-Linear (Fast) | 0.985 | 98% | +| Non-adjacent (>1 hop) | 3,766 | Delayed Linear | 0.693 | 56% | +| Mid-range (>3 hops) | 3,366 | Multi-Linear (Fast) | 0.208 | 0% | +| Long-range (>5 hops, high conf) | 1,753 | Multi-Linear (Fast) | 0.142 | 0% | +| Very long-range (>10 hops) | 2,315 | Multi-Linear (Fast) | 0.088 | 0% | **Critical insight**: At >3 hops, all models converge to ~0.2 MRR. At >10 hops, best is 0.088. **Decay curves alone cannot identify which specific distant turn is relevant.** Long-range retrieval requires content-based search (vector/keyword), not distance-based decay. @@ -113,14 +114,15 @@ All strictly monotonic models achieve perfect correlation with the empirical ref ### Stratified Forward MRR -| Stratum | Queries | MRR | Rank@1 | -|---------|---------|-----|--------| -| All (≥1 hop) | 1,496 | 0.992 | 99% | -| Non-adjacent (≥2 hops) | 891 | 0.372 | 18% | -| Mid-range (≥4 hops) | 830 | 0.376 | 19% | -| Long-range (≥6 hops) | 767 | 0.365 | 18% | +| Stratum | Queries | MRR | Rank@1 | +| ---------------------- | ------- | ----- | ------ | +| All (≥1 hop) | 1,496 | 0.992 | 99% | +| Non-adjacent (≥2 hops) | 891 | 0.372 | 18% | +| Mid-range (≥4 hops) | 830 | 0.376 | 19% | +| Long-range (≥6 hops) | 767 | 0.365 | 18% | **All 9 models produce identical forward MRR at every stratum.** This is because: + 1. Candidate future turns have unique integer hop distances 2. All models are monotonically decreasing 3. So all models produce the same ranking: closest candidate first @@ -130,13 +132,13 @@ All strictly monotonic models achieve perfect correlation with the empirical ref ### Why backward and forward need different treatment -| Property | Backward | Forward | -|----------|----------|---------| -| Shape matters? | Partially — no hold period | No — any monotonic function | -| Best model type | Exponential (steep initial, long tail) | Simple linear (minimal complexity) | -| Hold period? | Hurts discrimination (0.423 vs 0.985 MRR) | No effect (all models identical) | -| Effective range | ~30 hops (references exist at 21+) | ~30 hops (match backward) | -| Key insight | Steep-then-tail matches empirical reference rate | Only monotonicity matters | +| Property | Backward | Forward | +| --------------- | ------------------------------------------------ | ---------------------------------- | +| Shape matters? | Partially — no hold period | No — any monotonic function | +| Best model type | Exponential (steep initial, long tail) | Simple linear (minimal complexity) | +| Hold period? | Hurts discrimination (0.423 vs 0.985 MRR) | No effect (all models identical) | +| Effective range | ~30 hops (references exist at 21+) | ~30 hops (match backward) | +| Key insight | Steep-then-tail matches empirical reference rate | Only monotonicity matters | ## Production Configuration @@ -146,14 +148,14 @@ Based on these experiments: // Backward: Exponential (half-life ~5 hops, effective range ~30) export const BACKWARD_HOP_DECAY: HopDecayConfig = { type: 'exponential', - weightPerHop: 0.87, // half-life ~5 hops - minWeight: 0.01, // effective range ~30 hops + weightPerHop: 0.87, // half-life ~5 hops + minWeight: 0.01, // effective range ~30 hops }; // Forward: Simple linear (dies@30) export const FORWARD_HOP_DECAY: HopDecayConfig = { type: 'linear', - decayPerHop: 0.033, // dies at ~30 hops + decayPerHop: 0.033, // dies at ~30 hops minWeight: 0.01, }; ``` @@ -161,13 +163,13 @@ export const FORWARD_HOP_DECAY: HopDecayConfig = { ### Backward: Exponential rationale | Hops | Weight | Empirical Ref Rate | -|------|--------|--------------------| -| 1 | 0.87 | 1.150 (100%) | -| 3 | 0.66 | 0.136 (12%) | -| 5 | 0.50 | 0.126 (11%) | -| 10 | 0.25 | 0.102 (9%) | -| 20 | 0.06 | 0.093 (8%) | -| 30 | 0.015 | 0.031 (3%) | +| ---- | ------ | ------------------ | +| 1 | 0.87 | 1.150 (100%) | +| 3 | 0.66 | 0.136 (12%) | +| 5 | 0.50 | 0.126 (11%) | +| 10 | 0.25 | 0.102 (9%) | +| 20 | 0.06 | 0.093 (8%) | +| 30 | 0.015 | 0.031 (3%) | - Steep initial drop matches the 88% reference rate decline at hop 2 - Long asymptotic tail preserves signal at 20-30 hops where 3-9% of references occur diff --git a/docs/research/archive/decay-models.md b/docs/research/archive/decay-models.md index b2b6009..29290cb 100644 --- a/docs/research/archive/decay-models.md +++ b/docs/research/archive/decay-models.md @@ -15,11 +15,12 @@ The obvious approach is to decay edges based on elapsed time: function timeDecay(edge: Edge, now: Date): number { const ageMs = now.getTime() - edge.createdAt.getTime(); const ageDays = ageMs / (1000 * 60 * 60 * 24); - return Math.max(0, 1 - ageDays / 30); // Dies after 30 days + return Math.max(0, 1 - ageDays / 30); // Dies after 30 days } ``` **Problem**: This fails catastrophically for development work: + - Sessions may be days apart but semantically adjacent - Work pauses (weekends, vacations) shouldn't kill edges - A bug fix 2 weeks later is still relevant to the original code @@ -96,6 +97,7 @@ Weight ``` **What was validated**: + - No hold period — hold periods hurt backward MRR (0.423 vs 0.985) by creating ties at short range - Strictly monotonic — all strictly monotonic models score identically (ρ=1.0 with reference rate) - Long range needed — previous linear dies@10 killed real signal at 11-30 hops (3-9% reference rate) diff --git a/docs/research/archive/edge-decay-model.md b/docs/research/archive/edge-decay-model.md index 846943e..2b92cca 100644 --- a/docs/research/archive/edge-decay-model.md +++ b/docs/research/archive/edge-decay-model.md @@ -40,11 +40,13 @@ w(t) = w₀ * exp(-k * t) ``` **Pros**: + - Matches Ebbinghaus forgetting curve (cognitive science basis) - Smooth, continuous decay - Single parameter `k` controls decay rate **Cons**: + - Asymptotic approach to zero — never actually reaches zero - Requires arbitrary threshold for pruning (`w < ε → prune`) - Threshold is a tuning burden and feels ad-hoc @@ -57,12 +59,14 @@ w(t) = max(0, w₀ - k * t) // with floor ``` **Pros**: + - Deterministic zero-crossing: `t_death = w₀ / k` - Natural topology pruning — edge dies when `w ≤ 0` - Computationally trivial - Predictable memory pressure **Cons**: + - Doesn't model the "long tail" of memory - Single linear decay may be too simple - Decay rate is constant regardless of edge age @@ -76,11 +80,13 @@ w(t) = Σᵢ max(0, wᵢ - kᵢ * t) ``` Where each component represents a different "memory tier": + - **Short-term**: High initial weight, fast decay - **Medium-term**: Moderate weight, moderate decay - **Long-term**: Lower weight, slow decay **Pros**: + - Approximates smooth curves via superposition - Natural zero-crossing for topology management - Explicit control over multiple timescales @@ -88,6 +94,7 @@ Where each component represents a different "memory tier": - Computationally simple (sum of linear terms) **Cons**: + - More parameters to tune than single exponential - Need to determine tier weights and rates empirically @@ -99,11 +106,13 @@ w(t) = w₀ - k * (t - τ_hold) if t ≥ τ_hold ``` **Pros**: + - Models "working memory" — full relevance for holding period - Recent context is fully available, not just mostly available - Single additional parameter `τ_hold` **Cons**: + - Discontinuity in derivative at `t = τ_hold` - Still single decay rate after hold period @@ -114,11 +123,13 @@ w(t) = w₀ * (1 + t)^(-α) ``` **Pros**: + - Evidence from ACT-R cognitive architecture that memory retrieval follows power law - Long tail — decays faster initially, then slower - May better match empirical relevance patterns **Cons**: + - Asymptotic like exponential (needs threshold) - More complex computation than linear @@ -153,13 +164,14 @@ function totalEdgeWeight(tiers: DecayTier[], ageMs: number): number { ### Default Tier Configuration -| Tier | Initial Weight | Hold Period | Decay Rate | Death Time | -|------|---------------|-------------|------------|------------| -| Short-term | 1.0 | 5 min | 0.001/ms | ~22 min | -| Medium-term | 0.5 | 1 hour | 0.0001/ms | ~2.4 hours | -| Long-term | 0.3 | 24 hours | 0.00001/ms | ~32 hours | +| Tier | Initial Weight | Hold Period | Decay Rate | Death Time | +| ----------- | -------------- | ----------- | ---------- | ---------- | +| Short-term | 1.0 | 5 min | 0.001/ms | ~22 min | +| Medium-term | 0.5 | 1 hour | 0.0001/ms | ~2.4 hours | +| Long-term | 0.3 | 24 hours | 0.00001/ms | ~32 hours | **Combined characteristics**: + - Peak weight at creation: 1.8 - After 5 min: ~1.78 (short-term starts decaying) - After 1 hour: ~1.0 (medium-term starts decaying) @@ -179,18 +191,20 @@ function totalEdgeWeight(tiers: DecayTier[], ageMs: number): number { ## Edge Lifecycle ### Creation + When a causal edge is created (e.g., user turn references previous assistant output): ```typescript interface Edge { - sourceId: string; // Source chunk/turn - targetId: string; // Target chunk/turn - createdAt: number; // Unix timestamp ms - tiers: DecayTier[]; // Decay configuration + sourceId: string; // Source chunk/turn + targetId: string; // Target chunk/turn + createdAt: number; // Unix timestamp ms + tiers: DecayTier[]; // Decay configuration } ``` ### Query-Time Weight Calculation + At query time `t_query`: ```typescript @@ -201,6 +215,7 @@ function getEdgeWeight(edge: Edge, queryTime: number): number { ``` ### Pruning + Edge is pruned when total weight ≤ 0: ```typescript @@ -210,11 +225,13 @@ function shouldPrune(edge: Edge, queryTime: number): boolean { ``` Pruning can happen: + - **Lazily**: During traversal, skip edges with zero weight - **Eagerly**: Background process periodically removes dead edges - **In-situ**: At query time, remove edge if weight is zero ### Node Orphan Detection + When an edge is pruned, check if connected nodes are orphaned: ```typescript @@ -243,6 +260,7 @@ The current proposal does **not** include reinforcement on access. This is inten 3. **Unclear semantics**: What does "access" mean for an edge? If reinforcement is needed later, options include: + - Reset hold period on access (extend plateau) - Boost weight by fixed amount (accumulation) - Create new short-term tier contribution @@ -253,13 +271,13 @@ If reinforcement is needed later, options include: ### Multi-Linear Advantages -| Aspect | Multi-Linear | Exponential | -|--------|--------------|-------------| -| Zero-crossing | Deterministic | Asymptotic (needs threshold) | -| Topology management | Natural | Requires threshold tuning | -| Computation | Sum of linear | Requires exp() | -| Timescales | Explicit tiers | Single decay constant | -| Tunability | 2N params (N tiers) | 1-2 params | +| Aspect | Multi-Linear | Exponential | +| ------------------- | ------------------- | ---------------------------- | +| Zero-crossing | Deterministic | Asymptotic (needs threshold) | +| Topology management | Natural | Requires threshold tuning | +| Computation | Sum of linear | Requires exp() | +| Timescales | Explicit tiers | Single decay constant | +| Tunability | 2N params (N tiers) | 1-2 params | ### When Exponential Might Be Better @@ -272,26 +290,34 @@ If reinforcement is needed later, options include: ## Open Questions ### 1. Tier Configuration + Should tier parameters be: + - Global constants? - Per-edge-type (e.g., different for code vs. discussion)? - Learned from data? ### 2. Directionality ✓ Answered + Does relevance decay differ for: + - **Forward queries**: "What context led to this turn?" - **Backward queries**: "What turns were informed by this context?" **Answer**: Yes, significantly. Models with hold periods (Delayed Linear, Multi-Linear) show +0.27 to +0.64 MRR improvement on forward queries vs backward. Exponential decay is symmetric but suboptimal for long-range. Recommendation: Use separate decay profiles for retrieval (backward) vs prediction (forward) edges. ### 3. Edge Type Weighting + Should initial weights vary by edge type? + - Tool result → higher initial weight? - Topic continuation → moderate weight? - Explicit reference → highest weight? ### 4. Empirical Validation + How do we measure whether a decay curve "works"? + - Retrieval precision/recall at different time offsets - User satisfaction with retrieved context - Comparison of retrieved vs. actually-referenced content @@ -319,20 +345,21 @@ Determine which decay curve best predicts whether older context is actually rele From session data, identify when a turn explicitly or implicitly references earlier context: -| Signal | Detection Method | Relevance Label | -|--------|------------------|-----------------| -| **Explicit file reference** | User mentions same file as earlier assistant output | Strong relevance | -| **Tool result reference** | User refers to error/output from previous tool use | Strong relevance | -| **Continuation markers** | "yes", "the error", "that works" | Moderate relevance | -| **Topic continuity** | Same semantic topic (embedding similarity) | Moderate relevance | -| **Time adjacency** | Within same session, no topic shift | Weak relevance | -| **Topic shift** | Explicit "new question" or large time gap | No relevance | +| Signal | Detection Method | Relevance Label | +| --------------------------- | --------------------------------------------------- | ------------------ | +| **Explicit file reference** | User mentions same file as earlier assistant output | Strong relevance | +| **Tool result reference** | User refers to error/output from previous tool use | Strong relevance | +| **Continuation markers** | "yes", "the error", "that works" | Moderate relevance | +| **Topic continuity** | Same semantic topic (embedding similarity) | Moderate relevance | +| **Time adjacency** | Within same session, no topic shift | Weak relevance | +| **Topic shift** | Explicit "new question" or large time gap | No relevance | ### Experiment 1: Decay-Weighted Retrieval Ranking **Hypothesis**: A good decay model should rank actually-referenced turns higher than non-referenced turns at query time. **Method**: + 1. For each user turn at time `t_user`, identify which previous assistant turns it references 2. Compute decay weight for all candidate turns: `w(t_user - t_assistant)` 3. Rank candidates by decay weight @@ -349,6 +376,7 @@ MRR = (1/N) * Σ (1 / rank_of_first_relevant) **Hypothesis**: Decay weight should correlate with actual reference probability at different time offsets. **Method**: + 1. Bin turn pairs by time offset: 0-5min, 5-30min, 30min-1h, 1-4h, 4-24h, 1-7d 2. For each bin, compute: - Actual reference rate (% of pairs where later turn references earlier) @@ -365,6 +393,7 @@ MRR = (1/N) * Σ (1 / rank_of_first_relevant) **Backward query**: "What turns were informed by this context?" (looking forward in time) **Method**: + 1. For forward queries: Given turn T, which earlier turns influenced it? 2. For backward queries: Given turn T, which later turns reference it? 3. Evaluate decay models separately for each direction @@ -375,11 +404,13 @@ MRR = (1/N) * Σ (1 / rank_of_first_relevant) **Hypothesis**: Decay patterns may differ between coding sessions and non-coding sessions. **Method**: + 1. Segment sessions into types: coding (high tool use), discussion (low tool use), mixed 2. Run Experiments 1-3 separately for each type 3. Compare optimal decay parameters across types **Session types available**: + - Coding: Ultan, apolitical-assistant, cdx-core, etc. - Non-coding: pde-book, Personal-advice, analytic-methods-in-pde @@ -388,12 +419,14 @@ MRR = (1/N) * Σ (1 / rank_of_first_relevant) **Hypothesis**: We can find optimal tier parameters by maximizing retrieval quality. **Method**: + 1. Define parameter search space for multi-linear tiers 2. For each parameter configuration, run Experiment 1 3. Find configuration that maximizes MRR 4. Compare to preset configurations **Search space**: + ``` Short-term hold: [1, 5, 15, 30] minutes Short-term decay rate: [5, 15, 30, 60] minutes to death @@ -412,24 +445,24 @@ Long-term decay rate: [1, 3, 7, 14] days to death ### Expected Outcomes -| Outcome | Implication | -|---------|-------------| +| Outcome | Implication | +| --------------------------- | ------------------------------------------------------------------------- | | Multi-linear >> exponential | Multi-linear's explicit timescales better match actual relevance patterns | -| Multi-linear ≈ exponential | Simpler exponential may suffice | -| Forward ≠ backward | Need separate decay profiles for query direction | -| Coding ≠ non-coding | Need session-type-specific parameters | -| Optimal params ≠ presets | Current presets need tuning | +| Multi-linear ≈ exponential | Simpler exponential may suffice | +| Forward ≠ backward | Need separate decay profiles for query direction | +| Coding ≠ non-coding | Need session-type-specific parameters | +| Optimal params ≠ presets | Current presets need tuning | ### Simulation Results (Preliminary) From `npm run edge-decay-sim`: -| Model | Peak | @ 1h | @ 24h | @ 7d | Death | -|-------|------|------|-------|------|-------| -| Multi-Linear (Default) | 1.8 | 0.8 | 0.3 | 0 | 3d | -| Multi-Linear (Slow) | 1.5 | 1.3 | 0.5 | 0.14 | 17d | -| Exponential (1h half-life) | 1.8 | 0.9 | ~0 | ~0 | never | -| Power Law (α=1) | 1.8 | 0.9 | 0.07 | 0.01 | never | +| Model | Peak | @ 1h | @ 24h | @ 7d | Death | +| -------------------------- | ---- | ---- | ----- | ---- | ----- | +| Multi-Linear (Default) | 1.8 | 0.8 | 0.3 | 0 | 3d | +| Multi-Linear (Slow) | 1.5 | 1.3 | 0.5 | 0.14 | 17d | +| Exponential (1h half-life) | 1.8 | 0.9 | ~0 | ~0 | never | +| Power Law (α=1) | 1.8 | 0.9 | 0.07 | 0.01 | never | The Multi-Linear (Slow) model maintains the most "memory" over time (highest AUC), which may be appropriate for knowledge-building sessions but excessive for ephemeral task execution. @@ -445,23 +478,23 @@ The critical finding is that **optimal decay model depends on what context Claud ### Stratified Analysis by Context Distance -| Context Boundary | Best Model | MRR | Rank@1 | -|------------------|------------|-----|--------| -| All references (baseline) | Exponential | 0.961 | 95% | -| **Beyond immediate (>1 turn)** | **Delayed Linear** | **0.680** | 55% | -| **Beyond recent (>3 turns)** | **Delayed Linear** | **0.549** | 42% | -| **Beyond session (>30 min)** | **Multi-Linear (Default)** | **0.397** | 18% | -| **Long-range (>5 turns, high conf)** | **Delayed Linear** | **0.420** | 29% | +| Context Boundary | Best Model | MRR | Rank@1 | +| ------------------------------------ | -------------------------- | --------- | ------ | +| All references (baseline) | Exponential | 0.961 | 95% | +| **Beyond immediate (>1 turn)** | **Delayed Linear** | **0.680** | 55% | +| **Beyond recent (>3 turns)** | **Delayed Linear** | **0.549** | 42% | +| **Beyond session (>30 min)** | **Multi-Linear (Default)** | **0.397** | 18% | +| **Long-range (>5 turns, high conf)** | **Delayed Linear** | **0.420** | 29% | ### Model Comparison for Long-Range Retrieval (>3 turns) -| Model | MRR | vs Exponential | -|-------|-----|----------------| -| **Delayed Linear** | **0.549** | **+45%** | -| Multi-Linear (Slow) | 0.538 | +42% | -| Multi-Linear (Default) | 0.412 | +9% | -| Simple Linear | 0.382 | +1% | -| Exponential | 0.378 | baseline | +| Model | MRR | vs Exponential | +| ---------------------- | --------- | -------------- | +| **Delayed Linear** | **0.549** | **+45%** | +| Multi-Linear (Slow) | 0.538 | +42% | +| Multi-Linear (Default) | 0.412 | +9% | +| Simple Linear | 0.382 | +1% | +| Exponential | 0.378 | baseline | ### Why Hold Periods Matter @@ -480,7 +513,7 @@ For the memory system's primary use case (retrieving context beyond Claude's imm const MEMORY_RETRIEVAL_CONFIG: DecayModelConfig = { type: 'delayed-linear', initialWeight: 1.0, - holdPeriodMs: 30 * MS_PER_MINUTE, // Full weight for 30 min + holdPeriodMs: 30 * MS_PER_MINUTE, // Full weight for 30 min decayRate: 1.0 / (4 * MS_PER_HOUR), // Then decay over 4 hours }; @@ -488,21 +521,31 @@ const MEMORY_RETRIEVAL_CONFIG: DecayModelConfig = { const TOPOLOGY_CONFIG: DecayModelConfig = { type: 'multi-linear', tiers: [ - { name: 'session', initialWeight: 0.6, holdPeriodMs: 30 * MS_PER_MINUTE, decayRatePerMs: 0.6 / (2 * MS_PER_HOUR) }, - { name: 'project', initialWeight: 0.4, holdPeriodMs: 4 * MS_PER_HOUR, decayRatePerMs: 0.4 / (24 * MS_PER_HOUR) }, + { + name: 'session', + initialWeight: 0.6, + holdPeriodMs: 30 * MS_PER_MINUTE, + decayRatePerMs: 0.6 / (2 * MS_PER_HOUR), + }, + { + name: 'project', + initialWeight: 0.4, + holdPeriodMs: 4 * MS_PER_HOUR, + decayRatePerMs: 0.4 / (24 * MS_PER_HOUR), + }, ], }; ``` ### Reference Type Distribution -| Type | Count | % | -|------|-------|---| -| file-path | 4,172 | 44.6% | -| adjacent (weak) | 2,263 | 24.2% | -| code-entity | 2,118 | 22.6% | -| explicit-backref | 601 | 6.4% | -| error-fragment | 207 | 2.2% | +| Type | Count | % | +| ---------------- | ----- | ----- | +| file-path | 4,172 | 44.6% | +| adjacent (weak) | 2,263 | 24.2% | +| code-entity | 2,118 | 22.6% | +| explicit-backref | 601 | 6.4% | +| error-fragment | 207 | 2.2% | ### Directional Analysis: Backward vs Forward Queries @@ -510,12 +553,12 @@ A critical insight: Claude knows the past but cannot predict the future. This te #### Directional Comparison Results -| Model | Backward MRR | Forward MRR | Δ | Better For | -|-------|-------------|-------------|---|------------| -| Simple Linear | 0.952 | 0.960 | +0.008 | Similar | -| **Delayed Linear** | **0.329** | **0.969** | **+0.641** | **Forward** | -| Multi-Linear (Default) | 0.698 | 0.969 | +0.271 | Forward | -| Exponential | 0.955 | 0.960 | +0.005 | Similar | +| Model | Backward MRR | Forward MRR | Δ | Better For | +| ---------------------- | ------------ | ----------- | ---------- | ----------- | +| Simple Linear | 0.952 | 0.960 | +0.008 | Similar | +| **Delayed Linear** | **0.329** | **0.969** | **+0.641** | **Forward** | +| Multi-Linear (Default) | 0.698 | 0.969 | +0.271 | Forward | +| Exponential | 0.955 | 0.960 | +0.005 | Similar | #### Key Findings diff --git a/docs/research/archive/embedding-benchmark-results.md b/docs/research/archive/embedding-benchmark-results.md index cba2949..c52d042 100644 --- a/docs/research/archive/embedding-benchmark-results.md +++ b/docs/research/archive/embedding-benchmark-results.md @@ -58,15 +58,15 @@ The benchmark harness consists of: ### Metrics Definitions -| Metric | Description | -|--------|-------------| -| **ROC AUC** | Area under ROC curve. Related pairs scored by angular distance vs unrelated. Higher = better discrimination. | -| **Cluster count** | Number of clusters found by HDBSCAN (`minClusterSize=3`). | -| **Noise ratio** | Proportion of chunks HDBSCAN could not assign to any cluster. Lower = more structure found. | -| **Silhouette score** | Cluster cohesion vs separation, range [-1, 1]. Higher = tighter, better-separated clusters. | -| **Code-NL alignment** | Ratio of mean code-NL pair distance to mean random pair distance. Lower = better code/NL alignment. | -| **ms/chunk** | Mean inference time per chunk. | -| **Heap (MB)** | Heap memory delta after model load. | +| Metric | Description | +| --------------------- | ------------------------------------------------------------------------------------------------------------ | +| **ROC AUC** | Area under ROC curve. Related pairs scored by angular distance vs unrelated. Higher = better discrimination. | +| **Cluster count** | Number of clusters found by HDBSCAN (`minClusterSize=3`). | +| **Noise ratio** | Proportion of chunks HDBSCAN could not assign to any cluster. Lower = more structure found. | +| **Silhouette score** | Cluster cohesion vs separation, range [-1, 1]. Higher = tighter, better-separated clusters. | +| **Code-NL alignment** | Ratio of mean code-NL pair distance to mean random pair distance. Lower = better code/NL alignment. | +| **ms/chunk** | Mean inference time per chunk. | +| **Heap (MB)** | Heap memory delta after model load. | ### Distance Metric @@ -78,24 +78,25 @@ All distance calculations use **angular distance**: `arccos(cosine_similarity) / ### Run 2 Sessions (expanded) -| Session | Project | Messages | Turns | Chunks | Sampled | -|---------|---------|----------|-------|--------|---------| -| wild-churning-stream | speed-read | 3,041 | 118 | 69 | 30 | -| wild-churning-stream | speed-read | 2,493 | 108 | 55 | 30 | -| wild-churning-stream | speed-read | 1,099 | 22 | 23 | 23 | -| curried-wishing-star | semansiation | 518 | 33 | 23 | 23 | -| curried-wishing-star | semansiation | 229 | 8 | 4 | 4 | -| magical-marinating-wolf | Ultan | 2,002 | 46 | 49 | 30 | -| magical-marinating-wolf | Ultan | 800 | 24 | 24 | 24 | -| shiny-sniffing-forest | cdx-core | 2,098 | 39 | 45 | 30 | -| shiny-sniffing-forest | cdx-core | 543 | 16 | 11 | 11 | -| snuggly-wandering-porcupine | apolitical-assistant | 2,036 | 42 | 58 | 30 | -| tingly-brewing-lantern | apolitical-assistant | 2,514 | 73 | 60 | 30 | -| encapsulated-noodling-valley | apolitical-assistant | 1,153 | 70 | 29 | 29 | +| Session | Project | Messages | Turns | Chunks | Sampled | +| ---------------------------- | -------------------- | -------- | ----- | ------ | ------- | +| wild-churning-stream | speed-read | 3,041 | 118 | 69 | 30 | +| wild-churning-stream | speed-read | 2,493 | 108 | 55 | 30 | +| wild-churning-stream | speed-read | 1,099 | 22 | 23 | 23 | +| curried-wishing-star | semansiation | 518 | 33 | 23 | 23 | +| curried-wishing-star | semansiation | 229 | 8 | 4 | 4 | +| magical-marinating-wolf | Ultan | 2,002 | 46 | 49 | 30 | +| magical-marinating-wolf | Ultan | 800 | 24 | 24 | 24 | +| shiny-sniffing-forest | cdx-core | 2,098 | 39 | 45 | 30 | +| shiny-sniffing-forest | cdx-core | 543 | 16 | 11 | 11 | +| snuggly-wandering-porcupine | apolitical-assistant | 2,036 | 42 | 58 | 30 | +| tingly-brewing-lantern | apolitical-assistant | 2,514 | 73 | 60 | 30 | +| encapsulated-noodling-valley | apolitical-assistant | 1,153 | 70 | 29 | 29 | **Total**: 294 chunks from 12 sessions across 5 projects. **Project diversity**: + - **speed-read**: TypeScript, EPUB/PDF reader web component (code-heavy) - **semansiation**: Research/design for this project (conversational, NL-heavy) - **Ultan**: Swift, bibliography management app (code-heavy, different language) @@ -104,13 +105,13 @@ All distance calculations use **angular distance**: `arccos(cosine_similarity) / ### Labeled Pairs -| Category | Run 1 | Run 2 | -|----------|-------|-------| -| Same-session adjacent (related) | 30 | 60 | -| Same-project cross-session (related) | 20 | 40 | -| Cross-project random (unrelated) | 40 | 80 | -| Code-NL pairs | 3 | 17 | -| **Total** | **93** | **197** | +| Category | Run 1 | Run 2 | +| ------------------------------------ | ------ | ------- | +| Same-session adjacent (related) | 30 | 60 | +| Same-project cross-session (related) | 20 | 40 | +| Cross-project random (unrelated) | 40 | 80 | +| Code-NL pairs | 3 | 17 | +| **Total** | **93** | **197** | Run 2's 17 code-NL pairs (vs 3 in Run 1) gives much better reliability for the alignment metric. @@ -120,21 +121,21 @@ Run 2's 17 code-NL pairs (vs 3 in Run 1) gives much better reliability for the a ### Run 2 — Primary Results (294 chunks, 5 projects) -| Model | Dims | Context | ROC AUC | Clusters | Noise % | Silhouette | Code-NL | ms/chunk | Load (s) | Heap (MB) | -|-------|------|---------|---------|----------|---------|------------|---------|----------|----------|-----------| -| **jina-small** | 512 | 8,192 | **0.715** | 7 | 88.4% | **0.384** | 0.922 | 512 | 0.1 | ~0 | -| nomic-v1.5 | 768 | 8,192 | 0.683 | 2 | 95.9% | 0.310 | 0.974 | 2,083 | 0.3 | 19 | -| jina-code | 768 | 8,192 | 0.639 | **17** | **78.6%** | 0.327 | **0.863** | 2,356 | 0.4 | 30 | -| bge-small | 384 | 512 | 0.632 | 13 | 83.7% | 0.272 | 0.865 | **51** | 0.1 | ~0 | +| Model | Dims | Context | ROC AUC | Clusters | Noise % | Silhouette | Code-NL | ms/chunk | Load (s) | Heap (MB) | +| -------------- | ---- | ------- | --------- | -------- | --------- | ---------- | --------- | -------- | -------- | --------- | +| **jina-small** | 512 | 8,192 | **0.715** | 7 | 88.4% | **0.384** | 0.922 | 512 | 0.1 | ~0 | +| nomic-v1.5 | 768 | 8,192 | 0.683 | 2 | 95.9% | 0.310 | 0.974 | 2,083 | 0.3 | 19 | +| jina-code | 768 | 8,192 | 0.639 | **17** | **78.6%** | 0.327 | **0.863** | 2,356 | 0.4 | 30 | +| bge-small | 384 | 512 | 0.632 | 13 | 83.7% | 0.272 | 0.865 | **51** | 0.1 | ~0 | ### Run 1 — Initial Results (66 chunks, 2 projects) -| Model | Dims | Context | ROC AUC | Clusters | Noise % | Silhouette | Code-NL | ms/chunk | -|-------|------|---------|---------|----------|---------|------------|---------|----------| -| **jina-small** | 512 | 8,192 | **0.794** | 2 | 87.9% | **0.432** | 1.059 | 613 | -| bge-small | 384 | 512 | 0.730 | 4 | **75.8%** | 0.260 | **0.898** | **60** | -| jina-code | 768 | 8,192 | 0.721 | 4 | 80.3% | 0.383 | 0.954 | 2,670 | -| nomic-v1.5 | 768 | 8,192 | 0.605 | 3 | 84.8% | 0.381 | 1.088 | 2,261 | +| Model | Dims | Context | ROC AUC | Clusters | Noise % | Silhouette | Code-NL | ms/chunk | +| -------------- | ---- | ------- | --------- | -------- | --------- | ---------- | --------- | -------- | +| **jina-small** | 512 | 8,192 | **0.794** | 2 | 87.9% | **0.432** | 1.059 | 613 | +| bge-small | 384 | 512 | 0.730 | 4 | **75.8%** | 0.260 | **0.898** | **60** | +| jina-code | 768 | 8,192 | 0.721 | 4 | 80.3% | 0.383 | 0.954 | 2,670 | +| nomic-v1.5 | 768 | 8,192 | 0.605 | 3 | 84.8% | 0.381 | 1.088 | 2,261 | --- @@ -142,23 +143,23 @@ Run 2's 17 code-NL pairs (vs 3 in Run 1) gives much better reliability for the a The expanded corpus changed the picture materially: -| Model | ROC AUC (R1) | ROC AUC (R2) | Delta | Interpretation | -|-------|-------------|-------------|-------|----------------| -| jina-small | 0.794 | 0.715 | -0.079 | Moderate drop — still best. Expected regression with harder cross-project pairs. | -| bge-small | 0.730 | 0.632 | **-0.098** | Largest drop. Truncation hurts more with diverse projects. | -| jina-code | 0.721 | 0.639 | -0.082 | Similar drop to jina-small despite 8K context. | -| nomic-v1.5 | 0.605 | 0.683 | **+0.078** | Improved — more projects means more obviously-unrelated cross-project pairs, which nomic separates better than same-project related pairs. | +| Model | ROC AUC (R1) | ROC AUC (R2) | Delta | Interpretation | +| ---------- | ------------ | ------------ | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------ | +| jina-small | 0.794 | 0.715 | -0.079 | Moderate drop — still best. Expected regression with harder cross-project pairs. | +| bge-small | 0.730 | 0.632 | **-0.098** | Largest drop. Truncation hurts more with diverse projects. | +| jina-code | 0.721 | 0.639 | -0.082 | Similar drop to jina-small despite 8K context. | +| nomic-v1.5 | 0.605 | 0.683 | **+0.078** | Improved — more projects means more obviously-unrelated cross-project pairs, which nomic separates better than same-project related pairs. | **Key takeaway**: All models' AUCs dropped when moving from 2 to 5 projects, which is expected — cross-project "unrelated" pairs from 5 diverse projects are a harder test than 2 similar TypeScript projects. The important signal is **relative ordering held**: jina-small remained on top. bge-small's steeper decline suggests its 512-token truncation loses discriminative information that matters more with project diversity. ### Clustering Scale-Up -| Model | Clusters (R1) | Clusters (R2) | Noise (R1) | Noise (R2) | -|-------|--------------|--------------|-----------|-----------| -| jina-code | 4 | 17 | 80.3% | 78.6% | -| bge-small | 4 | 13 | 75.8% | 83.7% | -| jina-small | 2 | 7 | 87.9% | 88.4% | -| nomic-v1.5 | 3 | 2 | 84.8% | 95.9% | +| Model | Clusters (R1) | Clusters (R2) | Noise (R1) | Noise (R2) | +| ---------- | ------------- | ------------- | ---------- | ---------- | +| jina-code | 4 | 17 | 80.3% | 78.6% | +| bge-small | 4 | 13 | 75.8% | 83.7% | +| jina-small | 2 | 7 | 87.9% | 88.4% | +| nomic-v1.5 | 3 | 2 | 84.8% | 95.9% | jina-code found the most clusters (17) with the lowest noise (78.6%) — its code specialization may help it differentiate tool-heavy conversations at scale. bge-small went from lowest noise to second-highest, confirming that truncation becomes more problematic with diverse content. nomic-v1.5 collapsed to just 2 clusters at 96% noise — essentially failing to find structure. @@ -169,6 +170,7 @@ jina-code found the most clusters (17) with the lowest noise (78.6%) — its cod ### jina-small — Recommended **Strengths**: + - Highest ROC AUC in both runs (0.794 → 0.715) — best pair discrimination - Highest silhouette in both runs (0.432 → 0.384) — tightest clusters - 8,192-token context avoids truncation artifacts @@ -176,6 +178,7 @@ jina-code found the most clusters (17) with the lowest noise (78.6%) — its cod - 512ms/chunk is ~4x faster than the 768-dim models **Clustering at scale** (7 clusters, 34 chunks assigned): + - Cluster 0 (7 chunks): PR completions, CI passing, build summaries — cross-project "done" pattern - Cluster 5 (4 chunks): Implementation summaries from Ultan (cdx-core) — build results - Cluster 6 (3 chunks): Semansiation research doc-editing turns @@ -189,11 +192,13 @@ jina-code found the most clusters (17) with the lowest noise (78.6%) — its cod ### bge-small — Fast but Limited **Strengths**: + - 51ms/chunk — by far the fastest - Good code-NL alignment (0.865) - Most clusters found in Run 1 (4 clusters, 24.2% assigned) **Weaknesses exposed at scale**: + - ROC AUC dropped most steeply (0.730 → 0.632) — worst discrimination at scale - **Boilerplate clustering problem**: Cluster 3 groups 4 chunks that all begin with "This session is being continued from a previous conversation..." — the model can only see the first 512 tokens, and these session-continuation messages all start identically. The actual content after the boilerplate is truncated away. - Cluster 2 groups commit-related turns ("commit these changes", "let's do a commit") — surface-level similarity rather than deep semantic grouping @@ -204,6 +209,7 @@ jina-code found the most clusters (17) with the lowest noise (78.6%) — its cod ### jina-code — Surprising Cluster Density **Run 2 changed the picture for jina-code**: + - Found 17 clusters (most of any model) with 78.6% noise (lowest) - ROC AUC (0.639) is mediocre, but the clustering tells a different story - Cluster composition is semantically rich: @@ -242,12 +248,12 @@ Some groupings recur across 3+ models, suggesting genuine semantic structure: ### Cluster Size Distribution (Run 2) -| Model | Clusters | Sizes | Total clustered | % clustered | -|-------|----------|-------|-----------------|-------------| -| jina-code | 17 | 6,6,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3 | 63 | 21.4% | -| bge-small | 13 | 6,4,4,4,4,4,4,3,3,3,3,3,3 | 48 | 16.3% | -| jina-small | 7 | 7,7,6,4,4,3,3 | 34 | 11.6% | -| nomic-v1.5 | 2 | 9,3 | 12 | 4.1% | +| Model | Clusters | Sizes | Total clustered | % clustered | +| ---------- | -------- | --------------------------------- | --------------- | ----------- | +| jina-code | 17 | 6,6,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3 | 63 | 21.4% | +| bge-small | 13 | 6,4,4,4,4,4,4,3,3,3,3,3,3 | 48 | 16.3% | +| jina-small | 7 | 7,7,6,4,4,3,3 | 34 | 11.6% | +| nomic-v1.5 | 2 | 9,3 | 12 | 4.1% | jina-code finds the most structure, but jina-small's fewer clusters are individually more coherent (higher silhouette). @@ -257,10 +263,10 @@ jina-code finds the most structure, but jina-small's fewer clusters are individu ### Cross-Model Drift (Run 2) -| Chunk type | Count | Mean drift | -|------------|-------|------------| -| Long (>512 tokens) | 288 | 0.491 | -| Short (≤512 tokens) | 6 | 0.497 | +| Chunk type | Count | Mean drift | +| ------------------- | ----- | ---------- | +| Long (>512 tokens) | 288 | 0.491 | +| Short (≤512 tokens) | 6 | 0.497 | 98% of chunks exceed 512 tokens. The drift metric compares bge-small vs nomic-v1.5 embeddings, which inhabit different spaces, so the ~0.49 values reflect architectural differences rather than truncation impact. @@ -284,12 +290,12 @@ Five targeted experiments were run on jina-small to validate the production reco ### Summary Table -| Experiment | Baseline AUC | Variant AUC | dAUC | Baseline Silh. | Variant Silh. | dSilh. | -|------------|-------------|-------------|------|----------------|---------------|--------| -| Truncation (512 tokens) | 0.715 | 0.671 | **-0.044** | 0.384 | 0.229 | **-0.155** | -| Boilerplate filter | 0.715 | 0.720 | +0.004 | 0.384 | 0.395 | +0.011 | -| Thinking ablation | 0.715 | 0.778 | **+0.063** | 0.384 | 0.376 | -0.009 | -| Code-focused mode | 0.715 | 0.761 | +0.045 | 0.384 | 0.356 | -0.028 | +| Experiment | Baseline AUC | Variant AUC | dAUC | Baseline Silh. | Variant Silh. | dSilh. | +| ----------------------- | ------------ | ----------- | ---------- | -------------- | ------------- | ---------- | +| Truncation (512 tokens) | 0.715 | 0.671 | **-0.044** | 0.384 | 0.229 | **-0.155** | +| Boilerplate filter | 0.715 | 0.720 | +0.004 | 0.384 | 0.395 | +0.011 | +| Thinking ablation | 0.715 | 0.778 | **+0.063** | 0.384 | 0.376 | -0.009 | +| Code-focused mode | 0.715 | 0.761 | +0.045 | 0.384 | 0.356 | -0.028 | ### Experiment 1: Same-Model Truncation Test @@ -305,17 +311,17 @@ Hard-truncated all chunk text to ~512 tokens (1,792 characters) before embedding Swept `minClusterSize` from 2 to 10, re-clustering the same jina-small embeddings each time. -| minClusterSize | Clusters | Noise % | Silhouette | -|---------------|----------|---------|------------| -| 2 | 22 | 78.9% | 0.283 | -| 3 | 7 | 88.4% | 0.384 | -| **4** | **6** | **89.8%** | **0.438** | -| 5 | 5 | 87.8% | 0.380 | -| 6 | 6 | 86.1% | 0.373 | -| 7 | 5 | 85.4% | 0.336 | -| 8 | 4 | 88.8% | 0.381 | -| 9 | 3 | 89.1% | 0.356 | -| 10 | 3 | 87.8% | 0.286 | +| minClusterSize | Clusters | Noise % | Silhouette | +| -------------- | -------- | --------- | ---------- | +| 2 | 22 | 78.9% | 0.283 | +| 3 | 7 | 88.4% | 0.384 | +| **4** | **6** | **89.8%** | **0.438** | +| 5 | 5 | 87.8% | 0.380 | +| 6 | 6 | 86.1% | 0.373 | +| 7 | 5 | 85.4% | 0.336 | +| 8 | 4 | 88.8% | 0.381 | +| 9 | 3 | 89.1% | 0.356 | +| 10 | 3 | 87.8% | 0.286 | **Result**: `minClusterSize=4` produces the best silhouette (0.438, up from 0.384 at 3), with 6 clusters instead of 7. The value of 2 produces 22 fragmented clusters with poor cohesion. Values above 5 start losing structure. **Recommendation: use `minClusterSize=4` in production.** @@ -360,6 +366,7 @@ Boilerplate filtering and code-focused mode are smaller, situational improvement ### Auto-Labeled Pairs The labeled pairs are heuristic, not human-validated: + - "Adjacent chunks are related" assumes topical continuity, which may not hold across topic switches - "Cross-project chunks are unrelated" may miss genuinely similar patterns (e.g., git operations, session boilerplate appear across all projects) - 17 code-NL pairs in Run 2 is better but still modest @@ -387,6 +394,7 @@ Model inference is deterministic but HDBSCAN stability with borderline points wa ### Primary: jina-small **Rationale**: + - Highest ROC AUC across both corpus sizes — discrimination scales with project diversity - Highest silhouette — produces the most coherent clusters - 8,192-token context avoids the boilerplate-clustering problem that plagues bge-small @@ -396,12 +404,14 @@ Model inference is deterministic but HDBSCAN stability with borderline points wa - 512 dimensions is a good balance: compact enough for LanceDB storage, rich enough for semantic nuance **Recommended preprocessing** (from follow-up experiments): + - Exclude thinking blocks from chunk text before embedding (`includeThinking: false`) — +0.063 AUC - Strip session-continuation boilerplate — small but consistent improvement - Use `minClusterSize=4` for HDBSCAN — +0.054 silhouette over default of 3 - Keep `renderMode: 'full'` for clustering; consider `'code-focused'` for retrieval **Production integration path**: + - SessionEnd hook: Parse + chunk + embed the session (~30 chunks → ~15s) - Batch reprocessing: Viable at 512ms/chunk for the full corpus - Storage: 512 dims × 4 bytes = 2KB per chunk in LanceDB @@ -409,10 +419,12 @@ Model inference is deterministic but HDBSCAN stability with borderline points wa ### When to Consider Alternatives **bge-small** — if inference latency is critical (real-time retrieval at query time): + - Use as a fast first-pass ranker, with jina-small for re-ranking - Only viable if chunks are pre-truncated to remove session-continuation boilerplate **jina-code** — if cluster density matters more than discrimination: + - Found 17 clusters vs jina-small's 7; may be useful if downstream tasks need fine-grained groupings - Not recommended as primary due to 2.4s/chunk inference and 30MB heap @@ -437,10 +449,10 @@ Model inference is deterministic but HDBSCAN stability with borderline points wa ## Raw Data -| Run | Corpus | JSON | -|-----|--------|------| -| Run 1 | 66 chunks, 2 projects | [`benchmark-2026-02-03T14-56-58-170Z.json`](../benchmark-results/benchmark-2026-02-03T14-56-58-170Z.json) | -| Run 2 | 294 chunks, 5 projects | [`benchmark-2026-02-03T16-06-13-827Z.json`](../benchmark-results/benchmark-2026-02-03T16-06-13-827Z.json) | +| Run | Corpus | JSON | +| ----------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------- | +| Run 1 | 66 chunks, 2 projects | [`benchmark-2026-02-03T14-56-58-170Z.json`](../benchmark-results/benchmark-2026-02-03T14-56-58-170Z.json) | +| Run 2 | 294 chunks, 5 projects | [`benchmark-2026-02-03T16-06-13-827Z.json`](../benchmark-results/benchmark-2026-02-03T16-06-13-827Z.json) | | Experiments | 5 follow-up experiments on jina-small | [`experiments-2026-02-03T22-05-59-321Z.json`](../benchmark-results/experiments-2026-02-03T22-05-59-321Z.json) | ### Reproduction diff --git a/docs/research/archive/feasibility-study.md b/docs/research/archive/feasibility-study.md index dcb1843..e38e625 100644 --- a/docs/research/archive/feasibility-study.md +++ b/docs/research/archive/feasibility-study.md @@ -59,13 +59,13 @@ Session → Chunks → Embeddings → Vector Store ### Key Properties -| Property | Description | -|----------|-------------| -| **Local-first** | Runs entirely on developer's machine, no cloud dependency | -| **Privacy-preserving** | Optional hashing/encryption of content | -| **Temporal dynamics** | Memories decay over time, strengthen with use | -| **Associative** | Concepts link organically based on co-occurrence | -| **Claude Code native** | Purpose-built for coding assistant sessions | +| Property | Description | +| ---------------------- | --------------------------------------------------------- | +| **Local-first** | Runs entirely on developer's machine, no cloud dependency | +| **Privacy-preserving** | Optional hashing/encryption of content | +| **Temporal dynamics** | Memories decay over time, strengthen with use | +| **Associative** | Concepts link organically based on co-occurrence | +| **Claude Code native** | Purpose-built for coding assistant sessions | --- @@ -75,15 +75,16 @@ Session → Chunks → Embeddings → Vector Store Sessions are stored locally and fully accessible: -| Data | Location | Format | -|------|----------|--------| -| Transcripts | `~/.claude/projects//.jsonl` | JSON Lines | -| Session index | `~/.claude/projects//sessions-index.json` | JSON | -| Global history | `~/.claude/history.jsonl` | JSON Lines | +| Data | Location | Format | +| -------------- | ----------------------------------------------- | ---------- | +| Transcripts | `~/.claude/projects//.jsonl` | JSON Lines | +| Session index | `~/.claude/projects//sessions-index.json` | JSON | +| Global history | `~/.claude/history.jsonl` | JSON Lines | #### JSONL Message Structure Each line contains: + - `type`: Message type (user, assistant, file-history-snapshot, etc.) - `message`: Content with `role` and `content` fields - `uuid`: Unique message identifier @@ -96,12 +97,12 @@ Each line contains: Hooks provide lifecycle integration points: -| Hook | Trigger | Use Case | -|------|---------|----------| -| `SessionStart` | Session begins/resumes | Load relevant memories into context | -| `SessionEnd` | Session terminates | Trigger embedding + graph update | -| `PostToolUse` | After tool execution | Capture context around actions | -| `PreCompact` | Before context compaction | Save important context before loss | +| Hook | Trigger | Use Case | +| -------------- | ------------------------- | ----------------------------------- | +| `SessionStart` | Session begins/resumes | Load relevant memories into context | +| `SessionEnd` | Session terminates | Trigger embedding + graph update | +| `PostToolUse` | After tool execution | Capture context around actions | +| `PreCompact` | Before context compaction | Save important context before loss | Hook configuration in `.claude/settings.json`: @@ -137,6 +138,7 @@ This allows Claude to query the memory graph during conversations. ### Environment Variables Available in hooks: + - `$CLAUDE_SESSION_ID` - Current session identifier - `$CLAUDE_PROJECT_DIR` - Project directory path @@ -146,30 +148,33 @@ Available in hooks: ### Competitor Feature Matrix -| System | Local-First | Temporal Decay | Associative Graph | Memory Evolution | Accuracy | -|--------|-------------|----------------|-------------------|------------------|----------| -| **Mem0** | No (Cloud API) | No versioning | Paid add-on | Mutations only | 66.9% | -| **Cognee** | Self-hostable | None | Triplet extraction | Incremental only | 92.5% | -| **Letta/MemGPT** | Self-hostable | Summarization loss | None | FIFO eviction | 93.4% | -| **Zep** | Enterprise cloud | Bi-temporal | Temporal KG | Episode-based | 94.8% | -| **Supermemory** | Cloudflare | Dual timestamps | Secondary | Unknown | 76.7% | -| **A-MEM** | Research only | None | Zettelkasten | Cross-updates | 2x baseline | -| **GraphRAG** | Self-hostable | Static corpus | Hierarchical | Full rebuilds | N/A | +| System | Local-First | Temporal Decay | Associative Graph | Memory Evolution | Accuracy | +| ---------------- | ---------------- | ------------------ | ------------------ | ---------------- | ----------- | +| **Mem0** | No (Cloud API) | No versioning | Paid add-on | Mutations only | 66.9% | +| **Cognee** | Self-hostable | None | Triplet extraction | Incremental only | 92.5% | +| **Letta/MemGPT** | Self-hostable | Summarization loss | None | FIFO eviction | 93.4% | +| **Zep** | Enterprise cloud | Bi-temporal | Temporal KG | Episode-based | 94.8% | +| **Supermemory** | Cloudflare | Dual timestamps | Secondary | Unknown | 76.7% | +| **A-MEM** | Research only | None | Zettelkasten | Cross-updates | 2x baseline | +| **GraphRAG** | Self-hostable | Static corpus | Hierarchical | Full rebuilds | N/A | ### Detailed System Analysis #### Mem0 **Architecture**: Two-phase extraction/update pipeline + - Phase 1: LLM extracts facts from message pairs with rolling summary context - Phase 2: For each fact, retrieves top 10 similar memories, LLM decides ADD/UPDATE/DELETE/NOOP **Storage**: Triple-store hybrid (Vector + Graph + Key-Value) + - Vector: Qdrant, Pinecone, Chroma, etc. - Graph: Neo4j, Memgraph (Mem0g variant, paid) - KV: SQLite for audit trails **Limitations**: + - No true temporal decay - memories mutated in place, no versioning - Graph memory is paid add-on - Missing batch operations (100 memories = 100 API calls) @@ -180,6 +185,7 @@ Available in hooks: #### Cognee **Architecture**: ECL pipeline (Extract-Cognify-Load) + 1. Document classification 2. Permission validation 3. Chunking (200-2000 tokens) @@ -188,11 +194,13 @@ Available in hooks: 6. Embedding generation **Unique Features**: + - 12 search modes (GRAPH_COMPLETION, RAG_COMPLETION, CYPHER, etc.) - Incremental loading (unlike GraphRAG which requires full rebuilds) - Memify Pipeline for post-processing enrichment **Limitations**: + - 100% LLM-dependent extraction (no traditional NLP fallback) - Scalability issues (1GB takes ~40 minutes) - Auto-generated ontologies only in commercial version @@ -203,15 +211,18 @@ Available in hooks: #### Letta/MemGPT **Architecture**: OS-inspired virtual memory + - Main Context (RAM): System instructions + Core Memory blocks + Conversation history - External Context (Disk): Recall Memory + Archival Memory (vector DB) **Unique Features**: + - Self-editing memory via tool calls (agent manages its own memory) - Heartbeat mechanism for multi-step reasoning - Core Memory blocks pinned to context window **Limitations**: + - Recursive summarization is lossy (leads to memory holes) - No explicit temporal decay - No graph structure @@ -222,16 +233,19 @@ Available in hooks: #### Zep **Architecture**: Temporal Knowledge Graph via Graphiti engine + - Bi-temporal model: Timeline T (event order) + Timeline T' (ingestion order) - Episode-based data ingestion - Mirrors human cognition: episodic + semantic memory **Unique Features**: + - Best-in-class temporal reasoning - Multiple reranking strategies (RRF, MMR, graph-based) - AWS Neptune integration for enterprise **Limitations**: + - Enterprise/cloud-focused, not local-first - Requires infrastructure setup (graph DB, text search) - Higher latency than Mem0 (1.29s vs 0.148s p50) @@ -241,11 +255,13 @@ Available in hooks: #### Supermemory **Architecture**: Brain-inspired multi-layer + - Hot/recent data in Cloudflare KV - Deeper memories retrieved on demand - Dual-layer timestamping: `documentDate` vs `eventDate` **Limitations**: + - Cloud-dependent (Cloudflare infrastructure) - No explicit local-first mode - Associative structures secondary to semantic search @@ -255,29 +271,32 @@ Available in hooks: #### A-MEM (Research - NeurIPS 2025) **Architecture**: Zettelkasten-inspired agentic memory + - Interconnected knowledge networks through dynamic indexing - Memory evolution: new memories trigger updates to existing memories - Bidirectional linking between related concepts **Unique Features**: + - Only system with true associative memory evolution - Doubles performance on complex multi-hop reasoning - Runs on Llama 3.2 1B on single GPU **Limitations**: + - Research paper, not production-ready - Not local-first focused - No temporal decay ### Gap Analysis -| Gap | Current State | Opportunity | -|-----|---------------|-------------| -| **Temporal decay** | Only MemOS (research) implements Ebbinghaus-style decay | First production system with biologically-inspired decay | -| **Local-first** | Most require cloud; local options are simplistic | Sophisticated memory on developer's machine | -| **Associative evolution** | Only A-MEM (research) | Productionize for conversations | -| **Claude Code native** | No one targets this | Purpose-built integration | -| **Memory portability** | Platform-specific, no transfer | Export/import memory graphs | +| Gap | Current State | Opportunity | +| ------------------------- | ------------------------------------------------------- | -------------------------------------------------------- | +| **Temporal decay** | Only MemOS (research) implements Ebbinghaus-style decay | First production system with biologically-inspired decay | +| **Local-first** | Most require cloud; local options are simplistic | Sophisticated memory on developer's machine | +| **Associative evolution** | Only A-MEM (research) | Productionize for conversations | +| **Claude Code native** | No one targets this | Purpose-built integration | +| **Memory portability** | Platform-specific, no transfer | Export/import memory graphs | --- @@ -326,6 +345,7 @@ Understanding how existing memory systems operate helps clarify where Semansiati ``` **Mem0 API pattern:** + ```python from mem0 import Memory m = Memory() @@ -371,7 +391,7 @@ m.add(f"User: {query}\nAssistant: {response}", # 4. Store └─────────────────────────────────────────────────────────────────┘ ``` -**Key difference**: The LLM *decides* when to read/write memory via tool calls. No external orchestration. +**Key difference**: The LLM _decides_ when to read/write memory via tool calls. No external orchestration. ### Pattern 3: Background Indexing + On-Demand Retrieval @@ -407,13 +427,13 @@ m.add(f"User: {query}\nAssistant: {response}", # 4. Store ### What Triggers Memory Recall? -| Trigger | Systems | How it Works | -|---------|---------|--------------| -| **Every prompt** | Mem0, Zep, Supermemory | Automatic retrieval before each LLM call | -| **Agent decision** | MemGPT/Letta | LLM calls `archival_search` tool when it decides to | -| **Explicit API call** | Cognee, GraphRAG | Developer calls `search()` in their orchestration | -| **Session start** | Some custom impls | Load relevant context at conversation begin | -| **Keyword/entity match** | Zep | Detects entities in query, retrieves related memories | +| Trigger | Systems | How it Works | +| ------------------------ | ---------------------- | ----------------------------------------------------- | +| **Every prompt** | Mem0, Zep, Supermemory | Automatic retrieval before each LLM call | +| **Agent decision** | MemGPT/Letta | LLM calls `archival_search` tool when it decides to | +| **Explicit API call** | Cognee, GraphRAG | Developer calls `search()` in their orchestration | +| **Session start** | Some custom impls | Load relevant context at conversation begin | +| **Keyword/entity match** | Zep | Detects entities in query, retrieves related memories | ### The Typical "Glue Code" Pattern @@ -505,13 +525,13 @@ For Claude Code, the flow differs from typical chat applications: Most systems retrieve on **every prompt**, which has trade-offs: -| Approach | How | Trade-off | -|----------|-----|-----------| -| **Always retrieve** | Every turn queries memory | Simple, but noisy and adds latency | -| **Entity detection** | Only when entities/keywords match | Misses implicit relevance | -| **Embedding similarity gate** | Only if query embedding is close to stored memories | Requires threshold tuning | -| **LLM decides** | Agent calls memory tool when needed | Uses context tokens for tool schema | -| **Intent classification** | Classify query type, retrieve for certain intents | Requires intent model | +| Approach | How | Trade-off | +| ----------------------------- | --------------------------------------------------- | ----------------------------------- | +| **Always retrieve** | Every turn queries memory | Simple, but noisy and adds latency | +| **Entity detection** | Only when entities/keywords match | Misses implicit relevance | +| **Embedding similarity gate** | Only if query embedding is close to stored memories | Requires threshold tuning | +| **LLM decides** | Agent calls memory tool when needed | Uses context tokens for tool schema | +| **Intent classification** | Classify query type, retrieve for certain intents | Requires intent model | ### Semansiation: Data Capture Timing @@ -541,11 +561,11 @@ The key insight: **PreCompact is a canonical moment** to capture data. It fires **Hook strategy for data capture:** -| Hook | Action | Notes | -|------|--------|-------| -| **PreCompact** | Capture full context snapshot, queue for async processing | Primary capture point; fires before context loss | -| **PostToolUse** | (Optional) Capture context around significant actions | Useful for file edits, test runs | -| **SessionEnd** | Final capture of any remaining unprocessed context | Close out session in graph | +| Hook | Action | Notes | +| --------------- | --------------------------------------------------------- | ------------------------------------------------ | +| **PreCompact** | Capture full context snapshot, queue for async processing | Primary capture point; fires before context loss | +| **PostToolUse** | (Optional) Capture context around significant actions | Useful for file edits, test runs | +| **SessionEnd** | Final capture of any remaining unprocessed context | Close out session in graph | **Critical: Checkpoint tracking to avoid re-ingestion** @@ -554,8 +574,8 @@ PreCompact may fire multiple times per session. Need to track what's been ingest ```typescript interface SessionState { sessionId: string; - lastIngestedOffset: number; // Message index already processed - checkpoints: number[]; // Logical clock values at each PreCompact + lastIngestedOffset: number; // Message index already processed + checkpoints: number[]; // Logical clock values at each PreCompact } async function onPreCompact(transcript: Message[]): Promise { @@ -658,7 +678,7 @@ Semantic memory is available via semansiation tools: - \`explain(topic)\` - What typically leads to this? - \`predict(action)\` - What typically follows this? -Recent clusters for this project: ${recentClusters.map(c => c.label).join(', ')} +Recent clusters for this project: ${recentClusters.map((c) => c.label).join(', ')} `; return { additionalContext: priming }; @@ -670,20 +690,20 @@ Few tokens, but Claude knows memory is there and what domains are active. Claude decides when to query. Three modes matching causal graph structure: -| Tool | Traversal | Use Case | -|------|-----------|----------| -| `recall(query)` | Semantic similarity | "What do I know about X?" | -| `explain(topic)` | Reverse edges | "What typically leads to this error?" | -| `predict(action)` | Forward edges | "What usually follows this refactoring?" | +| Tool | Traversal | Use Case | +| ----------------- | ------------------- | ---------------------------------------- | +| `recall(query)` | Semantic similarity | "What do I know about X?" | +| `explain(topic)` | Reverse edges | "What typically leads to this error?" | +| `predict(action)` | Forward edges | "What usually follows this refactoring?" | **Summary: Hooks for capture, MCP for retrieval** -| Concern | Mechanism | -|---------|-----------| -| **Data capture** | Hooks (PreCompact primary, PostToolUse optional, SessionEnd final) | -| **Stable context** | CLAUDE.md (updated periodically between sessions) | -| **Dynamic priming** | SessionStart hook (lightweight, just awareness) | -| **On-demand retrieval** | MCP tools (Claude decides when to query) | +| Concern | Mechanism | +| ----------------------- | ------------------------------------------------------------------ | +| **Data capture** | Hooks (PreCompact primary, PostToolUse optional, SessionEnd final) | +| **Stable context** | CLAUDE.md (updated periodically between sessions) | +| **Dynamic priming** | SessionStart hook (lightweight, just awareness) | +| **On-demand retrieval** | MCP tools (Claude decides when to query) | This avoids the "guess user intent" problem while still providing rich memory access when needed. @@ -704,11 +724,13 @@ UNSOLICITED INJECTION MCP-FIRST ``` Claude is trained to: + 1. **Use tools when needed** — request information at the moment it's relevant 2. **Manage its own context** — decide what's important for the current task 3. **Work within provided capabilities** — leverage available tools appropriately Pushing potentially large amounts of context into the session unsolicited: + - **Forces assumptions** about which memories are relevant (most recent? highest edge weight? same project?) - **Works against Claude's training** — it's not designed to receive pre-loaded context based on heuristic guesses - **Wastes tokens** when the injected context isn't relevant to the actual task @@ -730,7 +752,7 @@ Context overflow: This failure mode has been observed with large PDF files — Claude reads too much content, the context fills completely, compaction can't reduce it enough, and the session becomes unrecoverable. The only option is to clear the context and lose everything. -**This is catastrophic for a memory system.** The very mechanism meant to *help* the session could *kill* it. If Semansiation aggressively injects retrieved memories at SessionStart, it risks: +**This is catastrophic for a memory system.** The very mechanism meant to _help_ the session could _kill_ it. If Semansiation aggressively injects retrieved memories at SessionStart, it risks: 1. **Immediate overflow** — if injection alone exceeds safe limits 2. **Reduced headroom** — less room for actual work before compaction needed @@ -738,6 +760,7 @@ This failure mode has been observed with large PDF files — Claude reads too mu 4. **Unrecoverable state** — user forced to clear context, losing the session The MCP-first approach avoids this entirely: + - Claude requests only what it needs, when it needs it - Retrieved content is proportional to actual queries - Context budget stays under user/Claude control @@ -745,13 +768,13 @@ The MCP-first approach avoids this entirely: The clean model: -| Layer | Role | Approach | -|-------|------|----------| -| **CLAUDE.md** | Documentation | "Here are stable facts about this project" | -| **MCP tools** | Capability | "Here's how to query memory if you need it" | -| **NOT** | Presumption | ~~"Here's what I think you need to know"~~ | +| Layer | Role | Approach | +| ------------- | ------------- | ------------------------------------------- | +| **CLAUDE.md** | Documentation | "Here are stable facts about this project" | +| **MCP tools** | Capability | "Here's how to query memory if you need it" | +| **NOT** | Presumption | ~~"Here's what I think you need to know"~~ | -This is ultimately why the MCP-first approach is not just pragmatically better, but *architecturally correct* — it respects Claude's design rather than fighting against it. +This is ultimately why the MCP-first approach is not just pragmatically better, but _architecturally correct_ — it respects Claude's design rather than fighting against it. --- @@ -779,6 +802,7 @@ FormStructuredData (root node) ``` The `children` method generates all possible "one-step-less-specific" variants: + - Remove one category - Remove one entity - Remove one item from a set @@ -822,6 +846,7 @@ case class DecayingTriple(...) { ``` Each weight can have **multiple decay triples with different lifespans**, allowing: + - Fast-decaying "recent" signal (e.g., 1 hour lifespan) - Medium-decaying signal (e.g., 1 day lifespan) - Slow-decaying "historical" signal (e.g., 30 day lifespan) @@ -867,14 +892,14 @@ def normaliseWeightSet(input: FormWeightedValueSet): FormWeightedValueSet = { ### Application to Semansiation -| sbxmlpoc Concept | Semansiation Application | -|------------------|--------------------------| -| **Multi-lifespan decay** | Short-term (1h) + medium-term (24h) + long-term (30d) decay triples on edges | +| sbxmlpoc Concept | Semansiation Application | +| ------------------------ | ---------------------------------------------------------------------------------- | +| **Multi-lifespan decay** | Short-term (1h) + medium-term (24h) + long-term (30d) decay triples on edges | | **Hierarchical lattice** | Semantic clusters as nodes, with parent/child edges to more/less specific clusters | -| **Weight boosting** | Hebbian reinforcement when concepts co-occur—boost the appropriate lifespan triple | -| **Tree inference** | Traverse from specific → general clusters when querying | -| **Normalisation** | Prevent weight explosion in high-activity clusters | -| **Logical clock** | Replace wall-clock `creationTime` with session-based logical clock | +| **Weight boosting** | Hebbian reinforcement when concepts co-occur—boost the appropriate lifespan triple | +| **Tree inference** | Traverse from specific → general clusters when querying | +| **Normalisation** | Prevent weight explosion in high-activity clusters | +| **Logical clock** | Replace wall-clock `creationTime` with session-based logical clock | ### Proposed Multi-Lifespan Edge Weight @@ -883,13 +908,13 @@ Adapting the sbxmlpoc pattern for Semansiation: ```typescript interface DecayingTriple { initialValue: number; - creationClock: number; // Logical clock (session count, not wall time) - lifespan: number; // In logical clock units + creationClock: number; // Logical clock (session count, not wall time) + lifespan: number; // In logical clock units } interface AssociationWeight { - triples: DecayingTriple[]; // Multiple decay rates - baseValue: number; // Permanent association strength + triples: DecayingTriple[]; // Multiple decay rates + baseValue: number; // Permanent association strength getValue(currentClock: number): number; boost(lifespan: number): AssociationWeight; @@ -897,10 +922,10 @@ interface AssociationWeight { } // Example lifespans (in session counts): -const IMMEDIATE = 1; // Decays after 1 session -const SHORT_TERM = 5; // Decays over ~5 sessions -const MEDIUM_TERM = 20; // Decays over ~20 sessions -const LONG_TERM = 100; // Decays over ~100 sessions +const IMMEDIATE = 1; // Decays after 1 session +const SHORT_TERM = 5; // Decays over ~5 sessions +const MEDIUM_TERM = 20; // Decays over ~20 sessions +const LONG_TERM = 100; // Decays over ~100 sessions ``` ### Hierarchical Cluster Model @@ -934,6 +959,7 @@ Apply the lattice structure to semantic clusters: ``` When querying: + 1. Find the most specific matching cluster 2. If insufficient data (low edge weights), recurse to parent clusters 3. Combine results weighted by cluster specificity @@ -996,10 +1022,10 @@ Causality naturally creates **directed edges**: The same graph supports two traversal modes: -| Mode | Traversal | Query | Use Case | -|------|-----------|-------|----------| -| **Explanatory** | Reverse edges | "What led me here?" | Debugging, root cause analysis | -| **Predictive** | Forward edges | "Where does this go?" | Planning, anticipating next steps | +| Mode | Traversal | Query | Use Case | +| --------------- | ------------- | --------------------- | --------------------------------- | +| **Explanatory** | Reverse edges | "What led me here?" | Debugging, root cause analysis | +| **Predictive** | Forward edges | "Where does this go?" | Planning, anticipating next steps | **Real developer workflows:** @@ -1019,17 +1045,17 @@ PREDICTIVE (forward traversal): Forward and reverse weights can diverge based on observed patterns: -| Pattern | Meaning | -|---------|---------| -| Strong forward, weak reverse | "X reliably causes Y, but Y has many causes" | +| Pattern | Meaning | +| ---------------------------- | ------------------------------------------------------------------------- | +| Strong forward, weak reverse | "X reliably causes Y, but Y has many causes" | | Weak forward, strong reverse | "X sometimes leads to Y, but when Y happens, X almost always preceded it" | ```typescript interface DirectionalEdge { from: ClusterId; to: ClusterId; - forwardWeight: DecayingWeight; // from predicts to - reverseWeight: DecayingWeight; // to explains from + forwardWeight: DecayingWeight; // from predicts to + reverseWeight: DecayingWeight; // to explains from } ``` @@ -1053,13 +1079,13 @@ Series converges naturally. Cycles contribute, but diminishingly. **Analogy: Perturbation theory / Feynman diagrams** -| Perturbation Theory | Semantic Graph | -|---------------------|----------------| -| Coupling constant α < 1 | Edge weight ∈ [0,1] | +| Perturbation Theory | Semantic Graph | +| -------------------------------------- | --------------------------------------- | +| Coupling constant α < 1 | Edge weight ∈ [0,1] | | Higher-order diagrams suppressed by αⁿ | Longer paths suppressed by w₁×w₂×...×wₙ | -| Sum over all diagrams | Sum over all paths | -| Renormalization handles infinities | Normalisation keeps weights bounded | -| Loop diagrams finite | Cycles attenuate naturally | +| Sum over all diagrams | Sum over all paths | +| Renormalization handles infinities | Normalisation keeps weights bounded | +| Loop diagrams finite | Cycles attenuate naturally | ### Implementation @@ -1070,23 +1096,17 @@ function computeInfluence( target: ClusterId, direction: 'forward' | 'reverse', maxDepth: number = 5, - minSignal: number = 0.01 // Cutoff for negligible contributions + minSignal: number = 0.01, // Cutoff for negligible contributions ): number { - - function propagate( - current: ClusterId, - signal: number, - depth: number - ): number { + function propagate(current: ClusterId, signal: number, depth: number): number { if (current === target) return signal; if (depth === 0 || signal < minSignal) return 0; - const edges = direction === 'forward' - ? graph.forwardEdges(current) - : graph.reverseEdges(current); + const edges = + direction === 'forward' ? graph.forwardEdges(current) : graph.reverseEdges(current); return edges.reduce((sum, edge) => { - const newSignal = signal * edge.weight; // Attenuation + const newSignal = signal * edge.weight; // Attenuation return sum + propagate(edge.to, newSignal, depth - 1); }, 0); } @@ -1121,22 +1141,22 @@ type RetrievalIntent = 'explanatory' | 'predictive' | 'exploratory'; function buildContext( currentCluster: ClusterId, intent: RetrievalIntent, - graph: CausalGraph + graph: CausalGraph, ): SemanticChunk[] { switch (intent) { case 'explanatory': // What led here? Traverse reverse edges - return traverseReverse(graph, currentCluster, depth=3); + return traverseReverse(graph, currentCluster, (depth = 3)); case 'predictive': // Where does this go? Traverse forward edges - return traverseForward(graph, currentCluster, depth=3); + return traverseForward(graph, currentCluster, (depth = 3)); case 'exploratory': // Balanced - both directions return [ - ...traverseReverse(graph, currentCluster, depth=2), - ...traverseForward(graph, currentCluster, depth=2) + ...traverseReverse(graph, currentCluster, (depth = 2)), + ...traverseForward(graph, currentCluster, (depth = 2)), ]; } } @@ -1152,7 +1172,7 @@ function buildContext( ### The D-T-D Model (Data-Transformation-Data) -This section defines *when* causal edges are created, grounded in the structure of conversational data. +This section defines _when_ causal edges are created, grounded in the structure of conversational data. #### Data-Transformation-Data Alternation @@ -1163,6 +1183,7 @@ A thread of sequential thought follows an alternating pattern: ``` Where: + - **D** (Data) = an observable output blob — one or more chunks constituting a coherent response or prompt - **T** (Transformation) = a processing step that is not directly observable — Claude's inference, or a human's thinking before typing @@ -1194,7 +1215,7 @@ For D₁ → T → D₂, causal edges are created as **one edge from each chunk - Even an associatively "weak" chunk in D₁ may have changed the entire output — thoughts are information-dense and not necessarily stable under perturbation - **Analogy**: Mathematical notation can change meaning completely with a single symbol change, while spoken language is more resilient but less information-dense. Session data is closer to mathematical notation in its sensitivity. -Each of these all-pairs edges **boosts the weight** on the corresponding cluster-to-cluster link in the causal graph. If D₁ has *m* chunks and D₂ has *n* chunks, a single transformation creates *m × n* edge boosts. In practice, typical data blobs contain 3-8 chunks, so the cross product is 9-64 edges per transformation — manageable. +Each of these all-pairs edges **boosts the weight** on the corresponding cluster-to-cluster link in the causal graph. If D₁ has _m_ chunks and D₂ has _n_ chunks, a single transformation creates _m × n_ edge boosts. In practice, typical data blobs contain 3-8 chunks, so the cross product is 9-64 edges per transformation — manageable. #### Edge Weight Normalisation @@ -1229,17 +1250,17 @@ REVERSE NORMALISATION (same graph, traversed backwards): This normalisation interacts naturally with the existing path attenuation and decay mechanisms. Multi-hop paths still attenuate as the product of edge weights along the path, and decay still causes indirect paths to fade faster than direct ones. -**Key insight**: This direct edge-weight approach makes vector clocks unnecessary. The original motivation for vector clocks was tracking causal distance across independent semantic domains — but that information is already encoded in the graph's edge weights and path attenuation. Edge accumulation encodes frequency of co-occurrence, decay encodes recency, and path products encode causal distance. The graph *is* the clock. +**Key insight**: This direct edge-weight approach makes vector clocks unnecessary. The original motivation for vector clocks was tracking causal distance across independent semantic domains — but that information is already encoded in the graph's edge weights and path attenuation. Edge accumulation encodes frequency of co-occurrence, decay encodes recency, and path products encode causal distance. The graph _is_ the clock. #### Mapping to Session Data -| Session Element | D-T-D Role | Observable? | -|----------------|------------|-------------| -| User prompt | D (data blob) | Yes — text in JSONL | -| Claude's inference | T (transformation) | No — internal processing | -| Assistant response | D (data blob) | Yes — text in JSONL | -| Human thinking before next prompt | T (transformation) | No — unobservable | -| Tool execution + result | T→D (transformation producing data) | Partially — result is observable | +| Session Element | D-T-D Role | Observable? | +| --------------------------------- | ----------------------------------- | -------------------------------- | +| User prompt | D (data blob) | Yes — text in JSONL | +| Claude's inference | T (transformation) | No — internal processing | +| Assistant response | D (data blob) | Yes — text in JSONL | +| Human thinking before next prompt | T (transformation) | No — unobservable | +| Tool execution + result | T→D (transformation producing data) | Partially — result is observable | A single conversational turn maps to: `D_user → T_claude → D_assistant`. @@ -1250,6 +1271,7 @@ A multi-turn exchange is: `D_user₁ → T → D_asst₁ → T_human → D_user The human transformation T_human between D_assistant and D_user_next raises a question: is the new prompt a **causal continuation** of the preceding output, or does it signal a **new thread of thought**? This matters because: + - **Continuation**: The all-pairs causal edges should connect D_assistant chunks to D_user_next chunks (normal edge creation) - **Topic switch**: The new prompt starts a fresh causal chain; connecting it to the preceding output would create false causal links @@ -1323,6 +1345,7 @@ Parent Context ``` **Causal relationships:** + - `a1 < a2 < a3` — sequential within Agent A (D-T-D edges within agent) - `b1 < b2 < b3` — sequential within Agent B (D-T-D edges within agent) - `a1 ∥ b1` — **concurrent** (no edges between them — they never appear in the same D₁→D₂ pair) @@ -1355,6 +1378,7 @@ If both agents touch the same cluster (e.g., `[testing]`), they each build separ #### Data Structure **Main Session Transcript** (`~/.claude/projects//.jsonl`): + ```json // Task tool invocation spawning a subagent { @@ -1384,6 +1408,7 @@ If both agents touch the same cluster (e.g., `[testing]`), they each build separ ``` **Subagent Transcripts** (separate files: `/subagents/agent-.jsonl`): + ```json { "agentId": "ad9c1a0", @@ -1397,13 +1422,13 @@ If both agents touch the same cluster (e.g., `[testing]`), they each build separ #### Key Fields for Parallelism Detection -| Field | Location | Purpose | -|-------|----------|---------| -| `agentId` | Progress events, subagent files | Unique identifier per subagent | -| `parentToolUseID` | Progress events | Links to spawning Task call | -| `timestamp` | All entries | ISO timestamps for ordering | -| `isSidechain: true` | Subagent transcripts | Marks as subagent | -| `sessionId` | All entries | Ties parent and children together | +| Field | Location | Purpose | +| ------------------- | ------------------------------- | --------------------------------- | +| `agentId` | Progress events, subagent files | Unique identifier per subagent | +| `parentToolUseID` | Progress events | Links to spawning Task call | +| `timestamp` | All entries | ISO timestamps for ordering | +| `isSidechain: true` | Subagent transcripts | Marks as subagent | +| `sessionId` | All entries | Ties parent and children together | #### Detecting Parallel Execution @@ -1420,6 +1445,7 @@ Parallel agents show **interleaved progress events** in the main transcript: ``` **Detection algorithm**: + 1. Collect all progress events grouped by `agentId` 2. For each agent, determine active time range: `[first_timestamp, last_timestamp]` 3. Agents with overlapping ranges were concurrent @@ -1427,13 +1453,13 @@ Parallel agents show **interleaved progress events** in the main transcript: #### What We Can Track -| Aspect | How | -|--------|-----| -| **Which agents ran in parallel** | Overlapping timestamp ranges | -| **Parent-child relationship** | `parentToolUseID` links agent to Task call | -| **Full agent content** | Separate transcript in `subagents/agent-.jsonl` | -| **Causal order within agent** | Sequential `parentUuid` chain | -| **Merge point** | When parent transcript continues after agent completes | +| Aspect | How | +| -------------------------------- | ------------------------------------------------------ | +| **Which agents ran in parallel** | Overlapping timestamp ranges | +| **Parent-child relationship** | `parentToolUseID` links agent to Task call | +| **Full agent content** | Separate transcript in `subagents/agent-.jsonl` | +| **Causal order within agent** | Sequential `parentUuid` chain | +| **Merge point** | When parent transcript continues after agent completes | #### Implementation Implications @@ -1444,7 +1470,7 @@ interface AgentContext { parentSessionId: string; startTime: Date; endTime?: Date; - concurrentWith: Set; // Other agentIds running in parallel + concurrentWith: Set; // Other agentIds running in parallel } function detectConcurrency(progressEvents: ProgressEvent[]): Map { @@ -1507,12 +1533,12 @@ Parent Context **Why this matters:** -| Without Causal Isolation | With Causal Isolation | -|--------------------------|----------------------| -| All agents' edges mixed into one graph | Each agent's edges form a distinct subgraph | +| Without Causal Isolation | With Causal Isolation | +| ------------------------------------------------------ | -------------------------------------------------- | +| All agents' edges mixed into one graph | Each agent's edges form a distinct subgraph | | Testing agent's edges interfere with refactoring edges | Specialists' edge weights accumulate independently | -| Intermediate reasoning conflated | Detailed work preserved in own causal chain | -| Query results mix all specialist contexts | Can query specialist context specifically | +| Intermediate reasoning conflated | Detailed work preserved in own causal chain | +| Query results mix all specialist contexts | Can query specialist context specifically | **Briefing and debriefing as causal boundaries:** @@ -1527,20 +1553,21 @@ Parent Context ```typescript // Query the testing specialist's semantic context specifically const testingContext = await recall({ - query: "test failure patterns", - agentScope: "testing-agent-a1517fe", // Scope to specialist's subgraph - direction: "reverse" // What led to these failures? + query: 'test failure patterns', + agentScope: 'testing-agent-a1517fe', // Scope to specialist's subgraph + direction: 'reverse', // What led to these failures? }); // Query across all specialists (merged view — follow edges past merge point) const mergedContext = await recall({ - query: "test failure patterns", - agentScope: "all", - direction: "reverse" + query: 'test failure patterns', + agentScope: 'all', + direction: 'reverse', }); ``` **This aligns with how human specialist teams work:** + - The QA specialist doesn't need to track every detail of the architect's work - They share context at handoffs (briefings, standups, reviews) - Each specialist's detailed knowledge is preserved but not forced into shared timeline @@ -1559,11 +1586,11 @@ Because a specialist agent's semantic context is self-contained (its own cluster **Example scenarios:** -| Scenario | How It Works | -|----------|--------------| +| Scenario | How It Works | +| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | **Reusable testing expertise** | A testing specialist's memory of failure patterns, debugging strategies, and test structures could transfer between projects using similar frameworks | -| **Domain specialist sharing** | A specialist trained on AWS infrastructure patterns could be "briefed" into projects needing that expertise | -| **Team knowledge transfer** | When onboarding to a new codebase, import relevant specialist memories from experienced team members | +| **Domain specialist sharing** | A specialist trained on AWS infrastructure patterns could be "briefed" into projects needing that expertise | +| **Team knowledge transfer** | When onboarding to a new codebase, import relevant specialist memories from experienced team members | **What makes this possible:** @@ -1632,28 +1659,28 @@ The centroid works for **matching** ("is this query near this cluster?") but fai Human memory exhibits a similar duality: -| Aspect | What it does | Analog | -|--------|--------------|--------| -| **Familiarity** | "This feels related to things I know" | Centroid matching | -| **Recall** | "Here's what I specifically remember" | Exemplar retrieval | +| Aspect | What it does | Analog | +| --------------- | ------------------------------------- | ------------------ | +| **Familiarity** | "This feels related to things I know" | Centroid matching | +| **Recall** | "Here's what I specifically remember" | Exemplar retrieval | -We don't recall every chair we've seen. We have a *concept* of "chair" (prototype/centroid), but can also recall *specific chairs* when prompted (exemplars). +We don't recall every chair we've seen. We have a _concept_ of "chair" (prototype/centroid), but can also recall _specific chairs_ when prompted (exemplars). ### Representation Options -| Approach | For Matching | For Retrieval | Trade-off | -|----------|--------------|---------------|-----------| -| **Centroid only** | Centroid | ??? | Can't retrieve coherent text | -| **Exemplar (nearest to centroid)** | Exemplar | Return exemplar | Single point may miss breadth | -| **K exemplars** | Centroid | Return top-k | More coverage, uses more tokens | -| **LLM synthesis** | Centroid | Generate summary | Expensive, non-deterministic | -| **LLM-generated label** | Centroid | Return label + exemplars | Best of both worlds | +| Approach | For Matching | For Retrieval | Trade-off | +| ---------------------------------- | ------------ | ------------------------ | ------------------------------- | +| **Centroid only** | Centroid | ??? | Can't retrieve coherent text | +| **Exemplar (nearest to centroid)** | Exemplar | Return exemplar | Single point may miss breadth | +| **K exemplars** | Centroid | Return top-k | More coverage, uses more tokens | +| **LLM synthesis** | Centroid | Generate summary | Expensive, non-deterministic | +| **LLM-generated label** | Centroid | Return label + exemplars | Best of both worlds | ### The LLM-Mediated Approach The insight: **embedding similarity forms clusters (cheap, scalable), but semantic meaning is refined by an LLM (expensive, batched)**. -We can't have an LLM process every incoming chunk against all embeddings—horrible scaling. However, we *can* have the LLM periodically: +We can't have an LLM process every incoming chunk against all embeddings—horrible scaling. However, we _can_ have the LLM periodically: 1. **Redraw semantic boundaries** between clusters 2. **Generate semantic labels** that describe what each cluster represents @@ -1668,12 +1695,12 @@ interface SemanticCluster { exemplars: ChunkReference[]; // Semantic (LLM-generated, periodically refreshed) - label: string; // "Error handling in async TypeScript" - description: string; // "Patterns for handling errors in Promise-based code..." + label: string; // "Error handling in async TypeScript" + description: string; // "Patterns for handling errors in Promise-based code..." contrastiveFeatures: string; // "Unlike sync error handling, focuses on..." // Freshness tracking - lastLLMRefresh: number; // Logical clock + lastLLMRefresh: number; // Logical clock exemplarCountAtRefresh: number; } ``` @@ -1686,24 +1713,24 @@ A background process periodically refines cluster semantics: async function refreshClusterSemantics( cluster: SemanticCluster, neighbors: SemanticCluster[], - llm: LLM + llm: LLM, ): Promise { // Sample exemplars from this cluster - const samples = sampleExemplars(cluster, k=10); - const sampleTexts = samples.map(s => s.text); + const samples = sampleExemplars(cluster, (k = 10)); + const sampleTexts = samples.map((s) => s.text); // Sample from neighboring clusters for contrast - const neighborSamples = neighbors.flatMap(n => - sampleExemplars(n, k=3).map(s => ({ cluster: n.label, text: s.text })) + const neighborSamples = neighbors.flatMap((n) => + sampleExemplars(n, (k = 3)).map((s) => ({ cluster: n.label, text: s.text })), ); const result = await llm.complete(` Analyze this cluster of related memories: - ${sampleTexts.map(t => `- ${t}`).join('\n')} + ${sampleTexts.map((t) => `- ${t}`).join('\n')} Neighboring clusters contain: - ${neighborSamples.map(s => `[${s.cluster}]: ${s.text}`).join('\n')} + ${neighborSamples.map((s) => `[${s.cluster}]: ${s.text}`).join('\n')} Provide: 1. A concise label (3-6 words) for this cluster @@ -1776,16 +1803,18 @@ async function retrieveClusterMemory( ### The Invariant Question -What *is* the semantic invariant of a cluster? Not the centroid (geometric mean), but: +What _is_ the semantic invariant of a cluster? Not the centroid (geometric mean), but: > **The pattern that survives across instances—what's common to all members.** Like recognizing "chair-ness" not by averaging all chairs, but by extracting invariants: + - Has a seat - Has support structure - Meant for sitting The LLM-generated description attempts to capture this invariant through: + 1. **Induction**: Looking at exemplars and extracting common themes 2. **Contrast**: Defining what makes this cluster distinct from neighbors @@ -1803,7 +1832,7 @@ This is computationally expensive, so it's done periodically rather than on ever ## Chunk Assignment Model -> *Note: This section captures exploratory thinking—design may evolve.* +> _Note: This section captures exploratory thinking—design may evolve._ ### Ingestion Flow @@ -1830,9 +1859,8 @@ Rather than comparing against centroids (which may not correspond to real conten ```typescript function assignChunk( chunk: Chunk, - clusters: SemanticCluster[] + clusters: SemanticCluster[], ): { cluster: SemanticCluster; isNewExemplar: boolean } { - let bestMatch: { cluster: SemanticCluster; exemplar: Chunk; distance: number } | null = null; for (const cluster of clusters) { @@ -1851,7 +1879,7 @@ function assignChunk( if (bestMatch) { // Assign to existing cluster bestMatch.cluster.members.push(chunk.ref); - boostEdges(bestMatch.cluster.id); // D-T-D edge weight accumulation + boostEdges(bestMatch.cluster.id); // D-T-D edge weight accumulation return { cluster: bestMatch.cluster, isNewExemplar: false }; } else { // Create new cluster with chunk as exemplar @@ -1874,32 +1902,31 @@ Compute threshold from actual cluster extent: ```typescript function clusterThreshold(cluster: SemanticCluster): number { if (cluster.exemplars.length < 2) { - return DEFAULT_THRESHOLD; // Bootstrap value for new clusters + return DEFAULT_THRESHOLD; // Bootstrap value for new clusters } // Threshold based on max angular distance among exemplars to centroid - const distances = cluster.exemplars.map(e => - angularDistance(e.vector, cluster.centroid) - ); + const distances = cluster.exemplars.map((e) => angularDistance(e.vector, cluster.centroid)); // Use max distance with margin, or could use percentile - return Math.max(...distances) * 1.2; // 20% margin + return Math.max(...distances) * 1.2; // 20% margin } ``` ### Compound Clusters: Flattening Overlapping Invariants **The Problem**: A chunk like "dog sleeping on bed" might semantically belong to both `[dog]` and `[bed]` clusters. Naively, this requires: + - Multi-cluster assignment - Weighted edge boosts across clusters - Complex bookkeeping **The Solution**: Don't try to decompose into primitive invariants. Instead, treat unique **sets** of semantic invariants as distinct clusters: -| Chunk | Naive Approach | Compound Approach | -|-------|----------------|-------------------| -| "my dog barks" | Assign to `[dog]` | Assign to `[dog]` | -| "comfortable bed" | Assign to `[bed]` | Assign to `[bed]` | +| Chunk | Naive Approach | Compound Approach | +| --------------------- | ------------------------------ | --------------------------------------- | +| "my dog barks" | Assign to `[dog]` | Assign to `[dog]` | +| "comfortable bed" | Assign to `[bed]` | Assign to `[bed]` | | "dog sleeping on bed" | Assign to `[dog]` AND `[bed]`? | Assign to `[dog∩bed]` — its own cluster | **Why this works**: @@ -1913,6 +1940,7 @@ Chunk: "dog sleeping on bed" ``` The clusters that **actually emerge** reflect reality: + - `[dog]` — content purely about dogs - `[bed]` — content purely about beds - `[dog, bed]` — content distinctly about both together @@ -1926,6 +1954,7 @@ The clusters that **actually emerge** reflect reality: 5. **No empty combinations** — intersection clusters only exist if content exists there **Trade-off**: Potentially more clusters, but in practice: + - Many combinations won't occur - Combinations that DO occur are semantically meaningful - Cluster count is bounded by actual content diversity, not combinatorics @@ -1938,8 +1967,8 @@ Track activity per cluster to prioritize LLM semantic recalibration: interface SemanticCluster { // ... existing fields ... - activityCount: number; // Ticks since last LLM refresh - lastRefreshClock: number; // Logical clock at last refresh + activityCount: number; // Ticks since last LLM refresh + lastRefreshClock: number; // Logical clock at last refresh } // Increment on each chunk assignment @@ -1949,11 +1978,9 @@ function onChunkAssigned(clusterId: ClusterId): void { } // Prioritize high-activity clusters for LLM refresh -function prioritizeClustersForRefresh( - clusters: SemanticCluster[] -): SemanticCluster[] { +function prioritizeClustersForRefresh(clusters: SemanticCluster[]): SemanticCluster[] { return clusters - .filter(c => c.activityCount > ACTIVITY_THRESHOLD) + .filter((c) => c.activityCount > ACTIVITY_THRESHOLD) .sort((a, b) => b.activityCount - a.activityCount); } @@ -2014,46 +2041,47 @@ function onClusterRefreshed(cluster: SemanticCluster): void { ### Embedding Models (Local) -| Model | Library | Size | Speed | Best For | -|-------|---------|------|-------|----------| -| `bge-small-en-v1.5` | FastEmbed | 33MB | 12x faster than PyTorch | CPU-only, fast | -| `potion-base-8M` | Model2Vec | 30MB | 500x faster | Minimal resources | -| `nomic-embed-text` | Ollama | 274MB | Good | Easy setup, long context | -| `all-MiniLM-L6-v2` | sentence-transformers | 90MB | 14.7ms/1K tokens | Real-time apps | +| Model | Library | Size | Speed | Best For | +| ------------------- | --------------------- | ----- | ----------------------- | ------------------------ | +| `bge-small-en-v1.5` | FastEmbed | 33MB | 12x faster than PyTorch | CPU-only, fast | +| `potion-base-8M` | Model2Vec | 30MB | 500x faster | Minimal resources | +| `nomic-embed-text` | Ollama | 274MB | Good | Easy setup, long context | +| `all-MiniLM-L6-v2` | sentence-transformers | 90MB | 14.7ms/1K tokens | Real-time apps | **Recommendation**: FastEmbed with `bge-small-en-v1.5` for best speed/quality tradeoff on CPU. ### Vector Stores (Local) -| Store | Backend | TypeScript | Performance | Best For | -|-------|---------|------------|-------------|----------| -| **LanceDB** | Apache Arrow | Native embedded | Sub-100ms on 1B vectors | Primary choice | -| **Qdrant** | Rust | SDK available | Excellent | Complex filtering | -| **ChromaDB** | SQLite | Client-server | Good (<10ms on 1M) | Rapid prototyping | -| **sqlite-vec** | SQLite | Via bindings | Moderate | Vectors + relations | +| Store | Backend | TypeScript | Performance | Best For | +| -------------- | ------------ | --------------- | ----------------------- | ------------------- | +| **LanceDB** | Apache Arrow | Native embedded | Sub-100ms on 1B vectors | Primary choice | +| **Qdrant** | Rust | SDK available | Excellent | Complex filtering | +| **ChromaDB** | SQLite | Client-server | Good (<10ms on 1M) | Rapid prototyping | +| **sqlite-vec** | SQLite | Via bindings | Moderate | Vectors + relations | **Recommendation**: LanceDB for primary vector storage. Only vector DB with native embedded TypeScript SDK. ### Graph Storage (Local) -| Store | Type | Concurrency | Performance | Best For | -|-------|------|-------------|-------------|----------| -| **Kuzu** | Embedded | File-locked | Fast OLAP | Primary choice | -| **NetworkX** | In-memory | N/A | Good for <100K nodes | Prototyping | -| **Neo4j** | Server | Full | Production-grade | If scaling needed | -| **SQLite** | Adjacency list | File-locked | Moderate | Simple hierarchies | +| Store | Type | Concurrency | Performance | Best For | +| ------------ | -------------- | ----------- | -------------------- | ------------------ | +| **Kuzu** | Embedded | File-locked | Fast OLAP | Primary choice | +| **NetworkX** | In-memory | N/A | Good for <100K nodes | Prototyping | +| **Neo4j** | Server | Full | Production-grade | If scaling needed | +| **SQLite** | Adjacency list | File-locked | Moderate | Simple hierarchies | **Recommendation**: Kuzu for embedded graph storage (DuckDB philosophy for graphs). Fall back to NetworkX for prototyping. ### Clustering Algorithms -| Algorithm | Type | Best For | -|-----------|------|----------| -| **HDBSCAN** | Density-based | Semantic clusters in embedding space | -| **Leiden** | Community detection | Structural communities in graph | -| **Agglomerative** | Hierarchical | Multi-resolution clustering | +| Algorithm | Type | Best For | +| ----------------- | ------------------- | ------------------------------------ | +| **HDBSCAN** | Density-based | Semantic clusters in embedding space | +| **Leiden** | Community detection | Structural communities in graph | +| **Agglomerative** | Hierarchical | Multi-resolution clustering | **Recommendation**: Dual clustering approach: + 1. HDBSCAN on embeddings → semantic clusters 2. Leiden on graph topology → community detection @@ -2063,7 +2091,7 @@ These provide complementary views (semantic similarity vs structural connectivit #### Edge-Weight Decay via D-T-D Transitions -A simple global logical clock is insufficient — it treats all semantic domains as evolving together, when in reality they're causally independent. The D-T-D model solves this naturally: edges only decay relative to *their own cluster's activity*, because edge weights are boosted by D-T-D transitions touching that cluster. A flurry of `[git]` activity creates and boosts `[git]`-related edges without affecting `[error-handling]` edges at all — they simply aren't part of those D-T-D transitions. +A simple global logical clock is insufficient — it treats all semantic domains as evolving together, when in reality they're causally independent. The D-T-D model solves this naturally: edges only decay relative to _their own cluster's activity_, because edge weights are boosted by D-T-D transitions touching that cluster. A flurry of `[git]` activity creates and boosts `[git]`-related edges without affecting `[error-handling]` edges at all — they simply aren't part of those D-T-D transitions. Decay is driven by edge age relative to ongoing activity on the same cluster pair. Edges that are repeatedly reinforced by new D-T-D transitions stay strong; edges that stop being reinforced fade. @@ -2074,13 +2102,13 @@ Based on prior art, use **multiple decay triples** on each edge: ```typescript interface DecayingTriple { initialValue: number; - creationTime: number; // Logical time at creation (D-T-D transition count) - lifespan: number; // In transition units + creationTime: number; // Logical time at creation (D-T-D transition count) + lifespan: number; // In transition units } interface AssociationWeight { triples: DecayingTriple[]; - baseValue: number; // Permanent component + baseValue: number; // Permanent component } // Lifespan constants (in D-T-D transition units) @@ -2090,10 +2118,7 @@ const MEDIUM_TERM = 20; const LONG_TERM = 100; // Decay is relative to the number of transitions that have occurred -function getValue( - weight: AssociationWeight, - currentTransitionCount: number -): number { +function getValue(weight: AssociationWeight, currentTransitionCount: number): number { const tripleSum = weight.triples.reduce((sum, t) => { const elapsed = currentTransitionCount - t.creationTime; const decayed = Math.max(0, t.initialValue - elapsed / t.lifespan); @@ -2103,24 +2128,25 @@ function getValue( } function boost(weight: AssociationWeight, lifespan: number, now: number): AssociationWeight { - const existing = weight.triples.find(t => t.lifespan === lifespan); + const existing = weight.triples.find((t) => t.lifespan === lifespan); if (existing) { return { ...weight, - triples: weight.triples.map(t => - t === existing ? { ...t, initialValue: t.initialValue + 1 } : t - ) + triples: weight.triples.map((t) => + t === existing ? { ...t, initialValue: t.initialValue + 1 } : t, + ), }; } else { return { ...weight, - triples: [...weight.triples, { initialValue: 1, creationTime: now, lifespan }] + triples: [...weight.triples, { initialValue: 1, creationTime: now, lifespan }], }; } } ``` Benefits over simple exponential decay: + - **Natural memory consolidation**: Boost short-term on first occurrence, medium-term on repetition, long-term on consistent use - **Logical clock**: Session-based timing better matches developer workflows than wall time - **Multiple decay curves**: Same edge can have fast-decaying "recent" signal AND slow-decaying "historical" signal @@ -2233,22 +2259,22 @@ def reinforce_edge(edge: Edge, ### Technology Stack -| Component | Technology | Rationale | -|-----------|------------|-----------| -| Embeddings | FastEmbed + bge-small-en | Fast, CPU-only, 33MB | -| Vectors | LanceDB | Embedded, TypeScript native, fast | -| Graph | Kuzu | Embedded, DuckDB-like philosophy | -| Clustering | HDBSCAN + Leiden | Complementary semantic + structural | -| Language | TypeScript/Python | Match Claude Code ecosystem | +| Component | Technology | Rationale | +| ---------- | ------------------------ | ----------------------------------- | +| Embeddings | FastEmbed + bge-small-en | Fast, CPU-only, 33MB | +| Vectors | LanceDB | Embedded, TypeScript native, fast | +| Graph | Kuzu | Embedded, DuckDB-like philosophy | +| Clustering | HDBSCAN + Leiden | Complementary semantic + structural | +| Language | TypeScript/Python | Match Claude Code ecosystem | ### Resource Requirements -| Resource | Estimate | -|----------|----------| -| RAM | <2GB for embedding + graph operations | -| Disk | <5GB for substantial memory (100K+ chunks) | -| CPU | Any modern CPU, no GPU required | -| Startup | <1s for embedded databases | +| Resource | Estimate | +| -------- | ------------------------------------------ | +| RAM | <2GB for embedding + graph operations | +| Disk | <5GB for substantial memory (100K+ chunks) | +| CPU | Any modern CPU, no GPU required | +| Startup | <1s for embedded databases | --- @@ -2282,34 +2308,35 @@ def reinforce_edge(edge: Edge, ### Why This Wins -| Competitor | What They Lack | -|------------|----------------| -| Zep | Not local, no decay dynamics | -| Mem0 | Cloud-centric, graph is paid, no decay | -| A-MEM | Research-only, not local-first, no temporal | -| Cognee | No temporal, scalability issues | -| All | No Claude Code integration | +| Competitor | What They Lack | +| ---------- | ------------------------------------------- | +| Zep | Not local, no decay dynamics | +| Mem0 | Cloud-centric, graph is paid, no decay | +| A-MEM | Research-only, not local-first, no temporal | +| Cognee | No temporal, scalability issues | +| All | No Claude Code integration | ### Key Technical Differentiators -| Feature | Implementation | Research Basis | -|---------|----------------|----------------| -| **Causal directed graph** | Forward (predictive) + reverse (explanatory) edges | Causal inference theory | -| **D-T-D edge-weight accumulation** | All-pairs edges between adjacent data blobs; normalised weights encode causal distance directly in the graph | D-T-D model / causal inference | -| **Path attenuation** | Influence = Σ paths, each path = ∏ edge weights | Perturbation theory / Feynman diagrams | -| Multi-lifespan decay | Multiple decay triples per edge (1/5/20/100 cluster ticks) | sbxmlpoc PoC | -| Hierarchical clusters | Lattice structure with parent/child clusters | sbxmlpoc marginalisation | -| Bounded Hebbian edges | Oja's rule saturation | Prevents runaway weights | -| Memory triggers | New memories update related existing | A-MEM (NeurIPS 2025) | -| PMI-weighted edges | Point-wise Mutual Information | GCN text classification | -| Dual clustering | HDBSCAN + Leiden | Complementary views | -| Code-aware chunking | Preserve code blocks, stack traces | Novel for Claude Code | +| Feature | Implementation | Research Basis | +| ---------------------------------- | ------------------------------------------------------------------------------------------------------------ | -------------------------------------- | +| **Causal directed graph** | Forward (predictive) + reverse (explanatory) edges | Causal inference theory | +| **D-T-D edge-weight accumulation** | All-pairs edges between adjacent data blobs; normalised weights encode causal distance directly in the graph | D-T-D model / causal inference | +| **Path attenuation** | Influence = Σ paths, each path = ∏ edge weights | Perturbation theory / Feynman diagrams | +| Multi-lifespan decay | Multiple decay triples per edge (1/5/20/100 cluster ticks) | sbxmlpoc PoC | +| Hierarchical clusters | Lattice structure with parent/child clusters | sbxmlpoc marginalisation | +| Bounded Hebbian edges | Oja's rule saturation | Prevents runaway weights | +| Memory triggers | New memories update related existing | A-MEM (NeurIPS 2025) | +| PMI-weighted edges | Point-wise Mutual Information | GCN text classification | +| Dual clustering | HDBSCAN + Leiden | Complementary views | +| Code-aware chunking | Preserve code blocks, stack traces | Novel for Claude Code | --- ## Implementation Roadmap ### Phase 1: MVP (1-2 weeks) + - [ ] Parse session JSONL files - [ ] Generate embeddings with FastEmbed - [ ] Store in LanceDB @@ -2317,12 +2344,14 @@ def reinforce_edge(edge: Edge, - [ ] Basic similarity search via MCP tool ### Phase 2: Associative Graph (2-3 weeks) + - [ ] Add Kuzu for graph storage - [ ] Implement co-occurrence detection (sliding window) - [ ] Basic edge weights without decay - [ ] HDBSCAN clustering on embeddings ### Phase 3: Memory Dynamics (2-3 weeks) + - [ ] Two-phase temporal decay - [ ] Bounded Hebbian reinforcement - [ ] Cross-cluster edge strengthening @@ -2330,6 +2359,7 @@ def reinforce_edge(edge: Edge, - [ ] Background pruning of decayed edges ### Phase 4: Polish (1-2 weeks) + - [ ] Optional encryption layer - [ ] SessionStart context injection - [ ] Performance optimization @@ -2344,7 +2374,7 @@ def reinforce_edge(edge: Edge, 1. ~~**Chunking strategy**: Sentence-level? Paragraph? Turn-based? Code-block aware?~~ **RESOLVED**: Turn-based, code-block-aware chunking implemented and validated. Thinking blocks should be excluded before embedding (+0.063 AUC). See [benchmark results](embedding-benchmark-results.md#follow-up-experiments). 2. **Decay parameters**: What lifespan values (in cluster ticks) work best? Start with 1/5/20/100? -3. ~~**Vector clock tick granularity**: Tick per chunk? Per message? Per session touching the cluster?~~ **RESOLVED**: Vector clocks eliminated entirely. The D-T-D model creates all-pairs edges directly between chunks in adjacent data blobs. Edge weight accumulation and decay encode causal distance — the graph *is* the clock. See [The D-T-D Model](#the-d-t-d-model-data-transformation-data). +3. ~~**Vector clock tick granularity**: Tick per chunk? Per message? Per session touching the cluster?~~ **RESOLVED**: Vector clocks eliminated entirely. The D-T-D model creates all-pairs edges directly between chunks in adjacent data blobs. Edge weight accumulation and decay encode causal distance — the graph _is_ the clock. See [The D-T-D Model](#the-d-t-d-model-data-transformation-data). 4. **Linear vs exponential decay**: sbxmlpoc used linear; exponential may be more biologically accurate 5. ~~**Cold start**: How to bootstrap useful clusters without history?~~ **RESOLVED**: Not a real problem. Within a session, the full conversation is in context until compaction — the memory system has no role until then. Across sessions, the first session runs normally, gets indexed at SessionEnd, and memory is available for subsequent sessions. There is no gap that needs filling. 6. **Cross-project memory**: Share associations across projects or isolate? diff --git a/docs/research/archive/pre-implementation-plan.md b/docs/research/archive/pre-implementation-plan.md index d4d22a3..ba7ab3e 100644 --- a/docs/research/archive/pre-implementation-plan.md +++ b/docs/research/archive/pre-implementation-plan.md @@ -10,11 +10,13 @@ ## Executive Summary The research phase has validated three critical components: + - **Topic continuity detection**: 0.998 AUC with lexical features (30-min time gap threshold) - **Embedding model selection**: jina-small (0.715 ROC AUC, 0.384 silhouette) - **Temporal decay modeling**: Delayed Linear with 30-min hold period (+45% MRR over exponential) Before implementation, we need to address **~25 open questions** grouped into 6 categories. This document prioritizes them into: + - **P0**: Must resolve before any implementation (blockers) - **P1**: Must resolve before MVP (affects core architecture) - **P2**: Can defer to iteration (optimize later) @@ -26,23 +28,26 @@ Before implementation, we need to address **~25 open questions** grouped into 6 These decisions affect the fundamental shape of the implementation. ### 1.1 Graph Storage Backend + **Question**: Where to persist the D-T-D graph? -| Option | Pros | Cons | -|--------|------|------| -| SQLite | Simple, embedded, portable | Schema migrations, no native graph queries | -| SQLite + JSON blobs | Flexible schema | Harder to query | -| LanceDB | Vector-native, embedded | Less mature | -| File-based JSON | Simplest, human-readable | No concurrency, slow for large graphs | +| Option | Pros | Cons | +| ------------------- | -------------------------- | ------------------------------------------ | +| SQLite | Simple, embedded, portable | Schema migrations, no native graph queries | +| SQLite + JSON blobs | Flexible schema | Harder to query | +| LanceDB | Vector-native, embedded | Less mature | +| File-based JSON | Simplest, human-readable | No concurrency, slow for large graphs | **Recommendation**: Start with SQLite + dedicated tables for nodes, edges, clusters. Migrate later if needed. **Action**: Design schema before implementing. ### 1.2 Embedding Storage + **Question**: Store embeddings in graph DB or separate vector store? **Options**: + 1. **Inline in SQLite**: Simple but bloats DB, no ANN queries 2. **Separate LanceDB**: Native ANN, but two stores to sync 3. **Hybrid**: Cluster centroids in SQLite, chunk embeddings in LanceDB @@ -52,21 +57,24 @@ These decisions affect the fundamental shape of the implementation. **Action**: Prototype both approaches, measure query latency. ### 1.3 MCP Server vs Hooks + **Question**: Primary integration mechanism? -| Approach | Use Case | -|----------|----------| -| **Hooks** | Capture session data (PreCompact), inject context (SessionStart) | -| **MCP Server** | On-demand recall, explain, predict tools for Claude | +| Approach | Use Case | +| -------------- | ---------------------------------------------------------------- | +| **Hooks** | Capture session data (PreCompact), inject context (SessionStart) | +| **MCP Server** | On-demand recall, explain, predict tools for Claude | **Recommendation**: Both — hooks for passive capture, MCP for active retrieval. **Action**: Implement hooks first (simpler), add MCP tools incrementally. ### 1.4 Cluster Identity + **Question**: What defines a cluster's identity over time? When clusters split, merge, or drift: + - Does the cluster keep its ID? - Do edges to that cluster stay valid? - How to handle cluster "death"? @@ -82,13 +90,16 @@ When clusters split, merge, or drift: These need empirical answers but don't block architecture. ### 2.1 Cluster Assignment Threshold + **Question**: What angular distance threshold assigns a chunk to a cluster? **Current knowledge**: + - Embedding benchmark used HDBSCAN (no explicit threshold) - jina-small shows ~0.384 silhouette — moderate separation **Experiment needed**: + - Compute pairwise angular distances within same-cluster vs cross-cluster - Find threshold that balances precision/recall - Test: 0.3, 0.4, 0.5, 0.6 radians @@ -96,11 +107,13 @@ These need empirical answers but don't block architecture. **Action**: Run cluster assignment threshold sweep experiment. ### 2.2 Path Depth Cutoff + **Question**: How deep to traverse causal graph for context retrieval? **RESOLVED**: maxDepth=20 **Experiment results** (depth sweep with sum-product traversal): + - maxDepth=5: 3.05x augmentation - maxDepth=10: 3.57x augmentation - maxDepth=15: 3.87x augmentation @@ -109,32 +122,38 @@ These need empirical answers but don't block architecture. Diminishing returns start at depth=15 (< 1% gain per depth unit). ### 2.3 Signal Threshold + **Question**: What minimum edge weight is "negligible"? **Options**: 0.1, 0.01, 0.001 **Trade-off**: + - Too high: Prune too aggressively, lose long-range connections - Too low: Keep noise, bloat traversal **Action**: Measure edge weight distribution from decay experiments, pick 10th percentile. ### 2.4 Decay Tier Configuration + **Question**: Optimal tier weights and timescales for production? **Current recommendation** (from experiments): + ``` Retrieval: Delayed Linear, 30-min hold, 4-hour decay Prediction: Exponential, 10-min half-life ``` **Remaining questions**: + - Multi-linear slow vs delayed linear for retrieval? - Optimal hold period (30 min validated, test 15/45/60)? **Action**: Run parameter sweep on hold period (15/30/45/60 min). ### 2.5 Initial Edge Weight by Type + **Question**: Should edge weights vary by reference type? **Reference type distribution** (from experiments): @@ -155,9 +174,11 @@ Prediction: Exponential, 10-min half-life How clusters are described and retrieved affects retrieval quality. ### 3.1 Refresh Frequency + **Question**: How often to regenerate cluster descriptions via LLM? **Options**: + - Per N sessions - On significant membership change (>20% new chunks) - On-demand when retrieved @@ -167,9 +188,11 @@ How clusters are described and retrieved affects retrieval quality. **Action**: Implement stale detection (membership hash), measure LLM calls. ### 3.2 Exemplar Count + **Question**: How many exemplars per cluster? **Trade-off**: + - Few (3-5): Cheaper embedding, but may miss diversity - Many (10-20): Better representation, but slower @@ -178,9 +201,11 @@ How clusters are described and retrieved affects retrieval quality. **Action**: Test retrieval quality at 3/5/10 exemplars. ### 3.3 Retrieval Mode Selection + **Question**: Return summary, exemplars, or both? **Options**: + 1. **Summary only**: Compact, good for priming 2. **Exemplars only**: Concrete, good for detailed recall 3. **Both**: Comprehensive but verbose @@ -190,9 +215,11 @@ How clusters are described and retrieved affects retrieval quality. **Action**: Implement both modes, let retrieval context decide. ### 3.4 Semantic Drift Detection + **Question**: How to know when a cluster's LLM description no longer matches its exemplars? **Approach**: + 1. Embed the LLM-generated summary 2. Compare to cluster centroid 3. If distance > threshold, mark stale @@ -206,6 +233,7 @@ How clusters are described and retrieved affects retrieval quality. How the system integrates with Claude Code. ### 4.1 PreCompact Hook Frequency + **Question**: How often does PreCompact fire? **Need to verify**: Does it fire once at end of session, or periodically? @@ -213,9 +241,11 @@ How the system integrates with Claude Code. **Action**: Test with actual Claude Code sessions, log hook invocations. ### 4.2 CLAUDE.md Size Budget + **Question**: How many tokens of auto-generated memory content? **Constraints**: + - CLAUDE.md is always loaded - Too large = context waste - Too small = useless @@ -225,9 +255,11 @@ How the system integrates with Claude Code. **Action**: Prototype and measure impact on session quality. ### 4.3 SessionStart Priming Content + **Question**: What to inject at session start? **Options**: + 1. Top N recently active clusters (by tick count) 2. Clusters related to current project 3. Cross-project relevant clusters @@ -237,11 +269,13 @@ How the system integrates with Claude Code. **Action**: Implement and test retrieval relevance. ### 4.4 MCP Tool Latency Budget + **Question**: What's acceptable latency for recall/explain/predict? **Target**: <500ms for good UX **Components**: + - Embedding query: ~50ms (jina-small) - Graph traversal: ~10ms (with indices) - LLM description: ~500ms (if not cached) @@ -249,6 +283,7 @@ How the system integrates with Claude Code. **Action**: Benchmark end-to-end latency, optimize hot paths. ### 4.5 PostToolUse Selectivity + **Question**: Which tool uses warrant memory capture? **High value**: File reads, writes, edits (code context) @@ -263,9 +298,11 @@ How the system integrates with Claude Code. ## 5. Additional Experiments Needed (P1) ### 5.1 Cluster Assignment Threshold Sweep + **Purpose**: Find optimal angular distance for assigning chunks to clusters. **Method**: + 1. Take embedding benchmark corpus (294 chunks) 2. Run HDBSCAN to get cluster assignments 3. Compute angular distance from each chunk to assigned cluster centroid @@ -275,9 +312,11 @@ How the system integrates with Claude Code. **Output**: Recommended threshold with confidence interval. ### 5.2 Edge Weight by Reference Type + **Purpose**: Test if type-weighted edges improve retrieval. **Method**: + 1. Re-run edge decay experiment with type-weighted initial weights 2. Compare MRR to uniform weights 3. Stratify by context distance @@ -285,11 +324,13 @@ How the system integrates with Claude Code. **Output**: Optimal weight mapping or confirmation that uniform is fine. ### 5.3 Non-Coding Session Validation + **Purpose**: Verify models work for non-coding conversations. **Gap**: Current experiments heavily weight coding sessions (Ultan, cdx-core, etc.) **Method**: + 1. Run topic continuity on pde-book (10 sessions, math writing) 2. Run edge decay on Personal-advice 3. Compare metrics to coding baseline @@ -297,9 +338,11 @@ How the system integrates with Claude Code. **Output**: Confirmation or adjusted parameters for non-coding. ### 5.4 Cross-Session Memory Relevance + **Purpose**: Test if memories from session N are useful in session N+1. **Method**: + 1. Take multi-session project (apolitical-assistant: 86 sessions) 2. For each session N, retrieve context from sessions 3 turns) @@ -323,41 +368,49 @@ How the system integrates with Claude Code. These can be addressed after initial implementation. ### 6.1 Cross-Project Memory + **Question**: Share associations across projects or isolate? **Defer because**: Start isolated, add cross-project later if valuable. ### 6.2 Cluster Hierarchy Depth + **Question**: How many levels of abstraction? **Defer because**: Start flat, add hierarchy if clusters get too numerous. ### 6.3 Long Inactivity Handling + **Question**: Should wall time factor in after months of inactivity? **Defer because**: Unlikely to matter in initial deployment. ### 6.4 Visualization UI + **Question**: Should there be a UI to explore the memory graph? **Defer because**: Developer tooling, not core functionality. ### 6.5 Manual Curation + **Question**: Allow users to pin/delete/edit memories? **Defer because**: Power user feature, not MVP. ### 6.6 Export Format + **Question**: What format for memory graph portability? **Defer because**: Solve when needed. ### 6.7 Traversal Transparency + **Question**: Show users "why" certain context was retrieved? **Defer because**: Nice-to-have for debugging, not core. ### 6.8 Reinforcement on Access + **Question**: Should accessing a memory strengthen it? **Defer because**: Adds complexity, current decay model sufficient. @@ -368,16 +421,16 @@ These can be addressed after initial implementation. These have been answered through experiments. -| Question | Answer | Evidence | -|----------|--------|----------| -| Topic continuity detection | Lexical features (0.998 AUC), 30-min time gap threshold | Topic continuity experiment | -| Embedding model selection | jina-small (0.715 AUC, 0.384 silhouette) | Embedding benchmark | -| Decay curve type | Delayed Linear for retrieval, Exponential for prediction | Edge decay experiments | -| Directional asymmetry | Yes — +0.64 MRR delta for delayed linear | Forward prediction experiment | -| Thinking block handling | Remove before embedding (+0.063 AUC) | Ablation study | -| Chunk strategy | Turn-based, code-block aware | Parser implementation | -| Cold start problem | Not real — full context until compaction | Design analysis | -| Parallelism detection | Via parentToolUseID + timestamps | Session data inspection | +| Question | Answer | Evidence | +| -------------------------- | -------------------------------------------------------- | ----------------------------- | +| Topic continuity detection | Lexical features (0.998 AUC), 30-min time gap threshold | Topic continuity experiment | +| Embedding model selection | jina-small (0.715 AUC, 0.384 silhouette) | Embedding benchmark | +| Decay curve type | Delayed Linear for retrieval, Exponential for prediction | Edge decay experiments | +| Directional asymmetry | Yes — +0.64 MRR delta for delayed linear | Forward prediction experiment | +| Thinking block handling | Remove before embedding (+0.063 AUC) | Ablation study | +| Chunk strategy | Turn-based, code-block aware | Parser implementation | +| Cold start problem | Not real — full context until compaction | Design analysis | +| Parallelism detection | Via parentToolUseID + timestamps | Session data inspection | --- @@ -386,31 +439,37 @@ These have been answered through experiments. Based on dependencies and risk, implement in this order: ### Phase 1: Core Infrastructure + 1. **Schema design** — SQLite tables for nodes, edges, clusters 2. **Embedding store** — LanceDB integration for chunk vectors 3. **Session ingestion** — Parse sessions, create chunks, embed, store ### Phase 2: Graph Construction + 4. **Topic continuity** — Apply lexical classifier to detect turn boundaries 5. **Edge creation** — All-pairs edges between adjacent D chunks 6. **Decay application** — Query-time weight calculation ### Phase 3: Cluster Detection + 7. **HDBSCAN integration** — Periodic clustering of chunks 8. **Cluster metadata** — Store centroids, exemplars, descriptions 9. **Assignment threshold** — Implement threshold-based assignment ### Phase 4: Retrieval + 10. **Basic recall** — Query by embedding similarity 11. **Graph traversal** — Follow edges with decay weights 12. **Context assembly** — Build retrieval response ### Phase 5: Integration + 13. **Hooks** — PreCompact capture, SessionStart injection 14. **MCP tools** — recall, explain, predict 15. **CLAUDE.md generation** — Auto-generate memory section ### Phase 6: Optimization + 16. **LLM refresh** — Semantic cluster descriptions 17. **Performance tuning** — Index optimization, caching 18. **Parameter tuning** — Based on production metrics @@ -433,11 +492,11 @@ After these, we'll have high confidence in all P0/P1 decisions and can start Pha ## Appendix: Data Available -| Dataset | Size | Sessions | Use | -|---------|------|----------|-----| -| Full corpus | 3.5 GB | 251 | Production | -| Embedding benchmark | 294 chunks | 12 | Cluster experiments | -| Topic continuity | 2,817 transitions | 75 | Validated | -| Edge decay | 9,361 references | 75 | Validated | -| Non-coding (pde-book) | 312 MB | 10 | Validation needed | -| Large project (apolitical) | 751 MB | 86 | Cross-session testing | +| Dataset | Size | Sessions | Use | +| -------------------------- | ----------------- | -------- | --------------------- | +| Full corpus | 3.5 GB | 251 | Production | +| Embedding benchmark | 294 chunks | 12 | Cluster experiments | +| Topic continuity | 2,817 transitions | 75 | Validated | +| Edge decay | 9,361 references | 75 | Validated | +| Non-coding (pde-book) | 312 MB | 10 | Validation needed | +| Large project (apolitical) | 751 MB | 86 | Cross-session testing | diff --git a/docs/research/archive/session-data-inventory.md b/docs/research/archive/session-data-inventory.md index 886c575..5c32241 100644 --- a/docs/research/archive/session-data-inventory.md +++ b/docs/research/archive/session-data-inventory.md @@ -9,62 +9,64 @@ ## Summary -| Metric | Value | -|--------|-------| -| Total project directories | 32 | -| Top-level session files | 251 | -| Subagent task files | 1,421 | -| Total JSONL files | 1,672 | -| Total disk usage | 3.5 GB | -| Date range | 2026-01-05 to 2026-02-06 | -| Largest single session | 121 MB (Ultan) | -| Most sessions | apolitical-assistant (86) | +| Metric | Value | +| ------------------------- | ------------------------- | +| Total project directories | 32 | +| Top-level session files | 251 | +| Subagent task files | 1,421 | +| Total JSONL files | 1,672 | +| Total disk usage | 3.5 GB | +| Date range | 2026-01-05 to 2026-02-06 | +| Largest single session | 121 MB (Ultan) | +| Most sessions | apolitical-assistant (86) | --- ## Projects by Size -| # | Project | Sessions | Total Size | Type | Notes | -|---|---------|----------|------------|------|-------| -| 1 | Ultan | 28 | 1.0 GB | Coding | Swift, bibliography management app | -| 2 | apolitical-assistant | 86 | 751 MB | Coding | TypeScript, engineering leadership tool | -| 3 | katanalog-website | 9 | 639 MB | Coding | Large sessions (109 MB max) | -| 4 | **pde-book** | 10 | 312 MB | **Non-coding** | Mathematical/academic writing | -| 5 | apolitical-dev-analytics | 8 | 267 MB | Coding | Data/analytics, TypeScript | -| 6 | codex-file-format-spec | 29 | 213 MB | Coding | Spec/design work | -| 7 | cdx-core | 13 | 178 MB | Coding | TypeScript, document format tooling | -| 8 | speed-read | 8 | 50 MB | Coding | TypeScript, EPUB/PDF reader | -| 9 | iksium | 4 | 37 MB | Coding | | -| 10 | semansiation | 4 | 29 MB | Coding | This project (research/NL-heavy) | -| 11 | cdx-pandoc | 8 | 24 MB | Coding | Pandoc integration | -| 12 | baykenClaude | 7 | 19 MB | Coding | | -| 13 | apolitical-bug-triage | 5 | 17 MB | Coding | Bug triage tool | -| 14 | ghanalytics | 2 | 9 MB | Coding | GitHub analytics | -| 15 | **analytic-methods-in-pde** | 3 | 8 MB | **Non-coding** | Mathematical research | -| 16 | file-format | 2 | 5 MB | Coding | File format spec | -| 17 | Dev (root) | 1 | 4 MB | Mixed | | -| 18 | rust-pdf-poc | 2 | 4 MB | Coding | Rust, PDF processing | -| 19 | kanpii | 1 | 3 MB | Coding | | -| 20 | thylacine | 1 | 2 MB | Coding | | -| 21 | box-packing | 2 | 2 MB | Coding | Algorithm/optimization | -| 22 | bengal-stm | 3 | 1 MB | Coding | | -| 23 | **Personal-advice** | 1 | 820 KB | **Non-coding** | Personal/relationship guidance | -| 24 | data-v2 | 4 | 772 KB | Mixed | | -| 25 | bayken-data | 1 | 740 KB | Mixed | | -| 26 | platform-v2 | 1 | 596 KB | Coding | | -| 27 | thought-stream | 1 | 352 KB | Mixed | | -| 28 | claude-global-skills | 2 | 216 KB | Coding | | -| 29 | gvn (home) | 2 | 184 KB | Mixed | | -| 30 | pi-hole | 1 | 68 KB | Coding | Network config | -| 31 | iterm-temp | 1 | 20 KB | Coding | | -| 32 | Apolitical (root) | 1 | 12 KB | Mixed | | +| # | Project | Sessions | Total Size | Type | Notes | +| --- | --------------------------- | -------- | ---------- | -------------- | --------------------------------------- | +| 1 | Ultan | 28 | 1.0 GB | Coding | Swift, bibliography management app | +| 2 | apolitical-assistant | 86 | 751 MB | Coding | TypeScript, engineering leadership tool | +| 3 | katanalog-website | 9 | 639 MB | Coding | Large sessions (109 MB max) | +| 4 | **pde-book** | 10 | 312 MB | **Non-coding** | Mathematical/academic writing | +| 5 | apolitical-dev-analytics | 8 | 267 MB | Coding | Data/analytics, TypeScript | +| 6 | codex-file-format-spec | 29 | 213 MB | Coding | Spec/design work | +| 7 | cdx-core | 13 | 178 MB | Coding | TypeScript, document format tooling | +| 8 | speed-read | 8 | 50 MB | Coding | TypeScript, EPUB/PDF reader | +| 9 | iksium | 4 | 37 MB | Coding | | +| 10 | semansiation | 4 | 29 MB | Coding | This project (research/NL-heavy) | +| 11 | cdx-pandoc | 8 | 24 MB | Coding | Pandoc integration | +| 12 | baykenClaude | 7 | 19 MB | Coding | | +| 13 | apolitical-bug-triage | 5 | 17 MB | Coding | Bug triage tool | +| 14 | ghanalytics | 2 | 9 MB | Coding | GitHub analytics | +| 15 | **analytic-methods-in-pde** | 3 | 8 MB | **Non-coding** | Mathematical research | +| 16 | file-format | 2 | 5 MB | Coding | File format spec | +| 17 | Dev (root) | 1 | 4 MB | Mixed | | +| 18 | rust-pdf-poc | 2 | 4 MB | Coding | Rust, PDF processing | +| 19 | kanpii | 1 | 3 MB | Coding | | +| 20 | thylacine | 1 | 2 MB | Coding | | +| 21 | box-packing | 2 | 2 MB | Coding | Algorithm/optimization | +| 22 | bengal-stm | 3 | 1 MB | Coding | | +| 23 | **Personal-advice** | 1 | 820 KB | **Non-coding** | Personal/relationship guidance | +| 24 | data-v2 | 4 | 772 KB | Mixed | | +| 25 | bayken-data | 1 | 740 KB | Mixed | | +| 26 | platform-v2 | 1 | 596 KB | Coding | | +| 27 | thought-stream | 1 | 352 KB | Mixed | | +| 28 | claude-global-skills | 2 | 216 KB | Coding | | +| 29 | gvn (home) | 2 | 184 KB | Mixed | | +| 30 | pi-hole | 1 | 68 KB | Coding | Network config | +| 31 | iterm-temp | 1 | 20 KB | Coding | | +| 32 | Apolitical (root) | 1 | 12 KB | Mixed | | --- ## Session Type Classification ### Coding Sessions (Majority) + Standard software development conversations involving: + - Code generation, debugging, refactoring - Tool use (Read, Edit, Bash, Grep, etc.) - Technical planning and architecture @@ -74,13 +76,14 @@ Standard software development conversations involving: ### Non-Coding Sessions (New) -| Project | Sessions | Size | Content Type | -|---------|----------|------|--------------| -| **pde-book** | 10 | 312 MB | Mathematical writing, LaTeX, PDE theory | -| **analytic-methods-in-pde** | 3 | 8 MB | Mathematical research, proofs | -| **Personal-advice** | 1 | 820 KB | Relationship guidance, mental health discussion | +| Project | Sessions | Size | Content Type | +| --------------------------- | -------- | ------ | ----------------------------------------------- | +| **pde-book** | 10 | 312 MB | Mathematical writing, LaTeX, PDE theory | +| **analytic-methods-in-pde** | 3 | 8 MB | Mathematical research, proofs | +| **Personal-advice** | 1 | 820 KB | Relationship guidance, mental health discussion | **Characteristics**: + - Low/no tool_use activity - Longer natural language exchanges - Different topic continuity patterns (fewer explicit file references) @@ -98,13 +101,13 @@ Standard software development conversations involving: The topic continuity experiment (Run 1) used **30 sessions, 1,538 transitions** from coding projects only. -| Metric | Value | -|--------|-------| -| Sessions | 30 | -| Transitions | 1,538 | -| Valid (with prior context) | 1,407 | -| Continuations | 1,428 (93%) | -| New topics | 110 (7%) | +| Metric | Value | +| -------------------------- | ----------- | +| Sessions | 30 | +| Transitions | 1,538 | +| Valid (with prior context) | 1,407 | +| Continuations | 1,428 (93%) | +| New topics | 110 (7%) | **Gap**: Non-coding sessions not yet included. Should add pde-book and Personal-advice for diversity. @@ -114,13 +117,13 @@ The topic continuity experiment (Run 1) used **30 sessions, 1,538 transitions** The embedding benchmark uses **5 projects, 12 sessions, 294 chunks** — less than 1% of available data. -| Project | Sessions Used | Sessions Available | -|---------|--------------|-------------------| -| speed-read | 3 | 8 | -| semansiation | 2 | 4 | -| Ultan | 2 | 28 | -| cdx-core | 2 | 13 | -| apolitical-assistant | 3 | 86 | +| Project | Sessions Used | Sessions Available | +| -------------------- | ------------- | ------------------ | +| speed-read | 3 | 8 | +| semansiation | 2 | 4 | +| Ultan | 2 | 28 | +| cdx-core | 2 | 13 | +| apolitical-assistant | 3 | 86 | --- @@ -155,11 +158,13 @@ The embedding benchmark uses **5 projects, 12 sessions, 294 chunks** — less th ``` These represent parallel agent execution (Task tool invocations). Currently unused in experiments. Relevant for: + - Testing the causal graph's parallel agent handling - Validating clock partitioning and merge point detection - Understanding nested parallelism patterns Projects with heaviest subagent usage: + - apolitical-assistant: 257 files - speed-read: 246 files - Ultan: 171 files diff --git a/docs/research/archive/topic-continuity-results.md b/docs/research/archive/topic-continuity-results.md index f8d25f0..04ef244 100644 --- a/docs/research/archive/topic-continuity-results.md +++ b/docs/research/archive/topic-continuity-results.md @@ -8,34 +8,36 @@ This experiment evaluates classifiers for detecting whether a user's message **c **Source**: 75 Claude Code sessions from `~/.claude/projects/` -| Metric | Count | -|--------|-------| -| Total transitions | 3,179 | -| Valid (with prior context) | 2,817 | -| Continuations | 2,952 (93%) | -| New topics | 227 (7%) | -| High confidence labels | 1,554 (49%) | +| Metric | Count | +| -------------------------- | ----------- | +| Total transitions | 3,179 | +| Valid (with prior context) | 2,817 | +| Continuations | 2,952 (93%) | +| New topics | 227 (7%) | +| High confidence labels | 1,554 (49%) | ### Label Distribution by Source -| Source | Count | Label | Confidence | -|--------|-------|-------|------------| -| Same-session adjacent | 1,470 | continuation | medium | -| Tool/file references | 772 | continuation | high | -| Explicit continuation markers | 710 | continuation | high | -| Time gap (>30 min) | 155 | new_topic | medium | -| Session boundaries | 45 | new_topic | high | -| Explicit shift markers | 27 | new_topic | high | +| Source | Count | Label | Confidence | +| ----------------------------- | ----- | ------------ | ---------- | +| Same-session adjacent | 1,470 | continuation | medium | +| Tool/file references | 772 | continuation | high | +| Explicit continuation markers | 710 | continuation | high | +| Time gap (>30 min) | 155 | new_topic | medium | +| Session boundaries | 45 | new_topic | high | +| Explicit shift markers | 27 | new_topic | high | The dataset is imbalanced (93% continuations), reflecting the reality that most adjacent turns in coding sessions continue the same topic. ## Classifiers Evaluated ### 1. Embedding-Only + - Compute angular distance between user text embedding and previous assistant output embedding - Score = 1 - distance (higher = more likely continuation) ### 2. Lexical-Only + - Time gap threshold (>30 minutes suggests new topic) - Topic-shift markers: `/^actually,?\s*(let's|can we)/i`, `/^(new|different) (question|topic)/i`, etc. - Continuation markers: `/^(yes|no|right|correct)/i`, `/^(the|your) (error|output|result)/i`, etc. @@ -43,6 +45,7 @@ The dataset is imbalanced (93% continuations), reflecting the reality that most - Keyword overlap (Jaccard coefficient) ### 3. Hybrid + - Weighted combination of embedding distance and lexical features - Default weights: embedding (0.5), topic-shift (0.2), continuation (0.15), time-gap (0.05), paths (0.05), keywords (0.05) @@ -50,12 +53,12 @@ The dataset is imbalanced (93% continuations), reflecting the reality that most All 4 registered embedding models were evaluated: -| Model | Dims | Embedding AUC | Embedding F1 | Hybrid AUC | Hybrid F1 | -|-------|------|--------------|--------------|------------|-----------| -| nomic-v1.5 | 768 | 0.574 | 0.532 | 0.944 | 0.937 | -| **jina-small** | 512 | 0.541 | 0.498 | **0.946** | **0.979** | -| bge-small | 384 | 0.558 | 0.548 | 0.927 | 0.965 | -| jina-code | 768 | 0.551 | 0.582 | 0.883 | 0.904 | +| Model | Dims | Embedding AUC | Embedding F1 | Hybrid AUC | Hybrid F1 | +| -------------- | ---- | ------------- | ------------ | ---------- | --------- | +| nomic-v1.5 | 768 | 0.574 | 0.532 | 0.944 | 0.937 | +| **jina-small** | 512 | 0.541 | 0.498 | **0.946** | **0.979** | +| bge-small | 384 | 0.558 | 0.548 | 0.927 | 0.965 | +| jina-code | 768 | 0.551 | 0.582 | 0.883 | 0.904 | **Lexical-only** achieves **0.998 AUC** (F1=0.999) across all configurations. @@ -72,18 +75,18 @@ All 4 registered embedding models were evaluated: Using nomic-v1.5 as the embedding baseline (comprehensive 75-session run): -| Configuration | ROC AUC | Delta vs Embedding | -|---------------|---------|-------------------| -| All lexical features | 1.000 | +0.426 | -| **Time gap only** | **0.921** | **+0.347** | -| Embedding + time gap | 0.860 | +0.287 | -| Embedding + markers | 0.745 | +0.172 | -| All features (hybrid) | 0.944 | +0.370 | -| Continuation markers only | 0.605 | +0.031 | -| Shift markers only | 0.573 | -0.001 | -| Embedding only | 0.574 | baseline | -| Keyword overlap only | 0.564 | -0.010 | -| File path overlap only | 0.484 | -0.090 | +| Configuration | ROC AUC | Delta vs Embedding | +| ------------------------- | --------- | ------------------ | +| All lexical features | 1.000 | +0.426 | +| **Time gap only** | **0.921** | **+0.347** | +| Embedding + time gap | 0.860 | +0.287 | +| Embedding + markers | 0.745 | +0.172 | +| All features (hybrid) | 0.944 | +0.370 | +| Continuation markers only | 0.605 | +0.031 | +| Shift markers only | 0.573 | -0.001 | +| Embedding only | 0.574 | baseline | +| Keyword overlap only | 0.564 | -0.010 | +| File path overlap only | 0.484 | -0.090 | ### Feature Importance Ranking @@ -97,19 +100,22 @@ Using nomic-v1.5 as the embedding baseline (comprehensive 75-session run): ## Threshold Analysis ### Time Gap Threshold + The experiment used 30 minutes as the default time gap threshold. Results confirm this is highly effective: + - 155 transitions labeled as `new_topic` due to time gap - Time gap alone achieves 0.921 AUC - The 30-minute threshold is the single most valuable signal in the dataset ### Classification Threshold + Optimal thresholds found via Youden's J statistic: -| Classifier | Optimal Threshold | F1 | -|------------|------------------|-----| -| Lexical-only | 0.400 | 0.999 | -| Hybrid (jina-small) | varies | 0.979 | -| Embedding-only | varies | 0.498-0.582 | +| Classifier | Optimal Threshold | F1 | +| ------------------- | ----------------- | ----------- | +| Lexical-only | 0.400 | 0.999 | +| Hybrid (jina-small) | varies | 0.979 | +| Embedding-only | varies | 0.498-0.582 | ## Recommendations @@ -124,11 +130,7 @@ Optimal thresholds found via Youden's J statistic: ```typescript // Simple, effective approach -function isTopicContinuation( - prevTurn: Turn, - nextTurn: Turn, - timeGapMs: number, -): boolean { +function isTopicContinuation(prevTurn: Turn, nextTurn: Turn, timeGapMs: number): boolean { const timeGapMinutes = timeGapMs / (1000 * 60); // Large time gap strongly suggests new topic @@ -151,6 +153,7 @@ function isTopicContinuation( ### For Edge Detection in D-T-D When creating edges in the Document-Turn-Document model: + - **Continuation** → Create causal edges from previous assistant chunks to user chunk - **New topic** → No backward edges; user chunk starts a new subgraph @@ -170,10 +173,10 @@ When creating edges in the Document-Turn-Document model: ## Experiment History -| Run | Sessions | Transitions | Valid | Lexical AUC | Best Hybrid AUC | -|-----|----------|-------------|-------|-------------|-----------------| -| Initial | 30 | 1,538 | 1,407 | 0.999 | 0.934 (nomic-v1.5) | -| Comprehensive | 75 | 3,179 | 2,817 | 0.998 | 0.946 (jina-small) | +| Run | Sessions | Transitions | Valid | Lexical AUC | Best Hybrid AUC | +| ------------- | -------- | ----------- | ----- | ----------- | ------------------ | +| Initial | 30 | 1,538 | 1,407 | 0.999 | 0.934 (nomic-v1.5) | +| Comprehensive | 75 | 3,179 | 2,817 | 0.998 | 0.946 (jina-small) | Results are consistent across runs — lexical features dominate, embeddings add marginal value. diff --git a/docs/research/archive/vector-clocks.md b/docs/research/archive/vector-clocks.md index 6a48054..0352af5 100644 --- a/docs/research/archive/vector-clocks.md +++ b/docs/research/archive/vector-clocks.md @@ -138,11 +138,11 @@ Forward (predictive): ## Comparison -| Approach | Monday 9am -> Tuesday 9am | -|----------|---------------------------| -| Wall-clock | 24 hours (seems old) | +| Approach | Monday 9am -> Tuesday 9am | +| ------------- | --------------------------------- | +| Wall-clock | 24 hours (seems old) | | Session-based | 2 sessions (ignores relationship) | -| Vector clock | 1 hop (recognizes continuation) | +| Vector clock | 1 hop (recognizes continuation) | ## Results diff --git a/docs/research/decisions.md b/docs/research/decisions.md index abaf29e..d07e18b 100644 --- a/docs/research/decisions.md +++ b/docs/research/decisions.md @@ -95,6 +95,7 @@ A chronological narrative of every major design decision in Causantic's developm **Why**: The causal graph should encode only causal structure. Semantic association (what topics are related) is vector search and clustering's job. **Key design choices**: + - 4 structural roles: within-chain (1.0), cross-session (0.7), brief (0.9), debrief (0.9) - m×n all-pairs at each consecutive turn boundary (no edges within the same turn) - Topic-shift gating: time gap > 30min or explicit shift markers → no edges @@ -177,6 +178,7 @@ A chronological narrative of every major design decision in Causantic's developm **What replaced it**: Sequential linked-list edges (each chunk links to the next in its session), walked by `chain-walker.ts`. Chain walking follows edges forward or backward from vector/keyword seeds, scoring each step by direct cosine similarity against the query. This produces ordered episodic narratives rather than ranked disconnected chunks. **Key design choices**: + - Sequential edges: 1-to-1 (not m×n), preserving session order - Cosine-similarity scoring per hop (not multiplicative path products) - `search-assembler.ts` replaces `context-assembler.ts` as the retrieval pipeline diff --git a/docs/research/experiments/cluster-threshold.md b/docs/research/experiments/cluster-threshold.md index 19a374e..00bde12 100644 --- a/docs/research/experiments/cluster-threshold.md +++ b/docs/research/experiments/cluster-threshold.md @@ -28,14 +28,14 @@ Angular distance threshold affects clustering quality. There exists an optimal t ## Results -| Threshold | Precision | Recall | F1 | -|-----------|-----------|--------|-----| -| 0.05 | 1.000 | 0.712 | 0.832 | -| 0.07 | 1.000 | 0.823 | 0.903 | -| 0.09 | **1.000** | **0.887** | **0.940** | -| 0.11 | 0.982 | 0.901 | 0.940 | -| 0.13 | 0.954 | 0.923 | 0.938 | -| 0.15 | 0.921 | 0.945 | 0.933 | +| Threshold | Precision | Recall | F1 | +| --------- | --------- | --------- | --------- | +| 0.05 | 1.000 | 0.712 | 0.832 | +| 0.07 | 1.000 | 0.823 | 0.903 | +| 0.09 | **1.000** | **0.887** | **0.940** | +| 0.11 | 0.982 | 0.901 | 0.940 | +| 0.13 | 0.954 | 0.923 | 0.938 | +| 0.15 | 0.921 | 0.945 | 0.933 | **Winner**: Threshold 0.09 (F1=0.940, 100% precision, 88.7% recall) @@ -103,7 +103,7 @@ Chunks are assigned to clusters based on centroid distance: ```typescript function assignToCluster(chunk: Chunk, clusters: Cluster[]): Cluster | null { let bestCluster = null; - let bestDistance = threshold; // 0.09 + let bestDistance = threshold; // 0.09 for (const cluster of clusters) { const distance = angularDistance(chunk.embedding, cluster.centroid); @@ -113,7 +113,7 @@ function assignToCluster(chunk: Chunk, clusters: Cluster[]): Cluster | null { } } - return bestCluster; // null if no cluster within threshold + return bestCluster; // null if no cluster within threshold } ``` diff --git a/docs/research/experiments/embedding-models.md b/docs/research/experiments/embedding-models.md index d0d7a0b..b242428 100644 --- a/docs/research/experiments/embedding-models.md +++ b/docs/research/experiments/embedding-models.md @@ -10,13 +10,13 @@ Different embedding models offer trade-offs between quality, size, and inference ### Models Tested -| Model | Dimensions | Size | -|-------|------------|------| -| jina-small | 512 | 33M | -| jina-base | 768 | 137M | -| all-MiniLM-L6 | 384 | 22M | -| bge-small | 384 | 33M | -| e5-small | 384 | 33M | +| Model | Dimensions | Size | +| ------------- | ---------- | ---- | +| jina-small | 512 | 33M | +| jina-base | 768 | 137M | +| all-MiniLM-L6 | 384 | 22M | +| bge-small | 384 | 33M | +| e5-small | 384 | 33M | ### Metrics @@ -35,23 +35,23 @@ Different embedding models offer trade-offs between quality, size, and inference ### Quality Metrics -| Model | Silhouette | MRR | ROC AUC | -|-------|------------|-----|---------| -| jina-small | 0.412 | 0.687 | 0.891 | -| jina-base | 0.438 | 0.712 | 0.903 | -| all-MiniLM-L6 | 0.389 | 0.654 | 0.867 | -| bge-small | 0.401 | 0.671 | 0.882 | -| e5-small | 0.395 | 0.668 | 0.879 | +| Model | Silhouette | MRR | ROC AUC | +| ------------- | ---------- | ----- | ------- | +| jina-small | 0.412 | 0.687 | 0.891 | +| jina-base | 0.438 | 0.712 | 0.903 | +| all-MiniLM-L6 | 0.389 | 0.654 | 0.867 | +| bge-small | 0.401 | 0.671 | 0.882 | +| e5-small | 0.395 | 0.668 | 0.879 | ### Performance Metrics -| Model | Embeddings/sec | Memory (MB) | -|-------|---------------|-------------| -| jina-small | 145 | 180 | -| jina-base | 67 | 420 | -| all-MiniLM-L6 | 312 | 95 | -| bge-small | 178 | 150 | -| e5-small | 189 | 145 | +| Model | Embeddings/sec | Memory (MB) | +| ------------- | -------------- | ----------- | +| jina-small | 145 | 180 | +| jina-base | 67 | 420 | +| all-MiniLM-L6 | 312 | 95 | +| bge-small | 178 | 150 | +| e5-small | 189 | 145 | ## Analysis @@ -74,6 +74,7 @@ Quality (MRR) **Winner**: jina-small Rationale: + - 97% of jina-base quality at 2.2x speed - Best quality-to-speed ratio - Reasonable memory footprint @@ -81,11 +82,11 @@ Rationale: ### When to Consider Alternatives -| Use Case | Recommendation | -|----------|---------------| -| Maximum quality | jina-base | -| Minimum resources | all-MiniLM-L6 | -| Balanced | jina-small (default) | +| Use Case | Recommendation | +| ----------------- | -------------------- | +| Maximum quality | jina-base | +| Minimum resources | all-MiniLM-L6 | +| Balanced | jina-small (default) | ## Implementation @@ -94,10 +95,7 @@ Causantic uses jina-small via Hugging Face Transformers: ```typescript import { pipeline } from '@huggingface/transformers'; -const embedder = await pipeline( - 'feature-extraction', - 'jinaai/jina-embeddings-v2-small-en' -); +const embedder = await pipeline('feature-extraction', 'jinaai/jina-embeddings-v2-small-en'); async function embed(text: string): Promise { const result = await embedder(text, { @@ -123,12 +121,12 @@ Note: Changing models requires re-embedding all chunks. Five targeted experiments on jina-small validated the production recommendation: -| Experiment | Baseline AUC | Variant AUC | Delta AUC | Baseline Silh. | Variant Silh. | Delta Silh. | -|------------|-------------|-------------|-----------|----------------|---------------|-------------| -| Truncation (512 tokens) | 0.715 | 0.671 | **-0.044** | 0.384 | 0.229 | **-0.155** | -| Boilerplate filter | 0.715 | 0.720 | +0.004 | 0.384 | 0.395 | +0.011 | -| Thinking block ablation | 0.715 | 0.778 | **+0.063** | 0.384 | 0.376 | -0.009 | -| Code-focused mode | 0.715 | 0.761 | +0.045 | 0.384 | 0.356 | -0.028 | +| Experiment | Baseline AUC | Variant AUC | Delta AUC | Baseline Silh. | Variant Silh. | Delta Silh. | +| ----------------------- | ------------ | ----------- | ---------- | -------------- | ------------- | ----------- | +| Truncation (512 tokens) | 0.715 | 0.671 | **-0.044** | 0.384 | 0.229 | **-0.155** | +| Boilerplate filter | 0.715 | 0.720 | +0.004 | 0.384 | 0.395 | +0.011 | +| Thinking block ablation | 0.715 | 0.778 | **+0.063** | 0.384 | 0.376 | -0.009 | +| Code-focused mode | 0.715 | 0.761 | +0.045 | 0.384 | 0.356 | -0.028 | ### Key Findings diff --git a/docs/research/experiments/graph-traversal.md b/docs/research/experiments/graph-traversal.md index c9570d5..b22564b 100644 --- a/docs/research/experiments/graph-traversal.md +++ b/docs/research/experiments/graph-traversal.md @@ -31,34 +31,34 @@ Graph traversal from vector search results can find additional relevant context ### Cross-Project Experiment (492 queries, 25 sessions) -| Metric | Value | -|--------|-------| -| Weighted Average Augmentation | **4.65×** | -| Median | 4.54× | -| Range | 3.60× - 5.87× | -| Total Chunks | 6,243 | -| Total Queries | 492 | +| Metric | Value | +| ----------------------------- | ------------- | +| Weighted Average Augmentation | **4.65×** | +| Median | 4.54× | +| Range | 3.60× - 5.87× | +| Total Chunks | 6,243 | +| Total Queries | 492 | **Key Finding**: Graph-augmented retrieval consistently provides 4-6× the context vs vector search alone, with even the worst-performing session achieving 3.60× augmentation. ### Single-Project Baseline (10 queries) -| Metric | Vector-Only | Graph-Augmented | Improvement | -|--------|-------------|-----------------|-------------| -| Augmentation | 1.0× | 3.88× | 3.88× | -| Avg Chunks Added | 10 | 28.8 | 2.88× | -| Paths Explored | — | 239 | — | +| Metric | Vector-Only | Graph-Augmented | Improvement | +| ---------------- | ----------- | --------------- | ----------- | +| Augmentation | 1.0× | 3.88× | 3.88× | +| Avg Chunks Added | 10 | 28.8 | 2.88× | +| Paths Explored | — | 239 | — | ### Depth Sweep Results | maxDepth | Chunks Added | Augmentation | Efficiency | -|----------|--------------|--------------|------------| -| 3 | 13.8 | 2.38x | 0.279 | -| 5 | 20.5 | 3.05x | 0.166 | -| 7 | 23.9 | 3.39x | 0.138 | -| 10 | 25.7 | 3.57x | 0.122 | -| 15 | 28.7 | 3.87x | 0.120 | -| 20 | 28.8 | 3.88x | 0.121 | +| -------- | ------------ | ------------ | ---------- | +| 3 | 13.8 | 2.38x | 0.279 | +| 5 | 20.5 | 3.05x | 0.166 | +| 7 | 23.9 | 3.39x | 0.138 | +| 10 | 25.7 | 3.57x | 0.122 | +| 15 | 28.7 | 3.87x | 0.120 | +| 20 | 28.8 | 3.88x | 0.121 | - **Diminishing returns** start at depth=15 (< 1% gain per depth unit after) - **Recommended**: maxDepth=20 matches forward decay (dies at 20 hops) @@ -66,6 +66,7 @@ Graph traversal from vector search results can find additional relevant context ## Traversal Algorithm The traverser uses **sum-product rules** inspired by Feynman diagrams: + - **Product rule**: Weights multiply along paths (w₁ × w₂ × ... × wₙ) - **Sum rule**: Multiple paths to a node contribute additively @@ -104,12 +105,12 @@ See [The Role of Entropy](/docs/research/approach/role-of-entropy.md) for the th > This data was collected with the original 9 semantic edge types. Since v0.3, edges use 2 structural roles (within-chain, cross-session) with sequential 1-to-1 topology. -| Edge Type | Augmentation Contribution | -|-----------|--------------------------| -| file-path | 48% | -| adjacent | 31% | -| topic | 15% | -| cross-session | 6% | +| Edge Type | Augmentation Contribution | +| ------------- | ------------------------- | +| file-path | 48% | +| adjacent | 31% | +| topic | 15% | +| cross-session | 6% | File-path edges were the most valuable for finding related context under the semantic model. The v0.3 structural model replaces all edge types with sequential within-chain edges — the graph provides structural ordering (what came before/after), while vector+keyword search handles relevance ranking. @@ -118,21 +119,21 @@ File-path edges were the most valuable for finding related context under the sem ### minWeight Sweep (fixed depth=20) | minWeight | Chunks Added | Augmentation | -|-----------|--------------|--------------| -| 0.1 | 8.9 | 1.89x | -| 0.05 | 19.9 | 2.99x | -| 0.01 | 28.8 | 3.88x | -| 0.005 | 28.8 | 3.88x | -| 0.001 | 28.8 | 3.88x | +| --------- | ------------ | ------------ | +| 0.1 | 8.9 | 1.89x | +| 0.05 | 19.9 | 2.99x | +| 0.01 | 28.8 | 3.88x | +| 0.005 | 28.8 | 3.88x | +| 0.001 | 28.8 | 3.88x | **Finding**: minWeight=0.01 captures all reachable context. Lower thresholds add computational cost without benefit. ### Default Configuration -| Parameter | Value | Rationale | -|-----------|-------|-----------| -| maxDepth | 20 | Matches forward decay (dies at 20 hops) | -| minWeight | 0.01 | Captures full context without noise | +| Parameter | Value | Rationale | +| --------- | ----- | --------------------------------------- | +| maxDepth | 20 | Matches forward decay (dies at 20 hops) | +| minWeight | 0.01 | Captures full context without noise | ## v0.3 Results: Chain Walking Augmentation @@ -140,19 +141,20 @@ v0.3.0 replaced sum-product graph traversal with **chain walking** — following ### Cross-Project Chain Walking (297 queries, 15 projects) -| Metric | v0.2 (sum-product) | v0.3 (chain walking) | -|--------|-------------------|---------------------| -| Weighted Average Augmentation | 4.65× | **2.46×** | -| Queries | 492 | 297 | -| Projects | 25 | 15 | -| Queries producing chains | N/A | 100% | -| Mean chain length | N/A | 3.8 chunks | +| Metric | v0.2 (sum-product) | v0.3 (chain walking) | +| ----------------------------- | ------------------ | -------------------- | +| Weighted Average Augmentation | 4.65× | **2.46×** | +| Queries | 492 | 297 | +| Projects | 25 | 15 | +| Queries producing chains | N/A | 100% | +| Mean chain length | N/A | 3.8 chunks | ### Why the Number Dropped The v0.2 4.65× figure counted all chunks reachable through m×n edges via sum-product traversal — including chunks only distantly related to the query. The v0.3 2.46× counts additional unique chunks found by walking sequential chains from the same vector seeds. Key differences: + 1. **Fewer edges**: Sequential linked-list (7,631 edges) vs m×n all-pairs (19,338 edges) 2. **Ordered output**: Chain walking produces chronologically ordered narratives, not ranked scores 3. **Quality over quantity**: Chain chunks are sequentially connected to seeds, not just reachable through any path @@ -161,13 +163,13 @@ Key differences: The 2.46× number understates the value because it measures the same thing as v0.2 (additional chunks found). Chain walking's real contribution is **episodic ordering** — turning a bag of ranked results into a coherent narrative. The collection benchmark captures this better: -| Metric | Value | -|--------|-------| -| Chain coverage | 97% of queries produce episodic chains | -| Mean chain length | 4.3 chunks per narrative | -| Token efficiency | 127% (returned context is relevant) | -| p95 recall latency | 3,314ms (down from 16,952ms) | -| Fallback rate | 3% (fall back to search-style results) | +| Metric | Value | +| ------------------ | -------------------------------------- | +| Chain coverage | 97% of queries produce episodic chains | +| Mean chain length | 4.3 chunks per narrative | +| Token efficiency | 127% (returned context is relevant) | +| p95 recall latency | 3,314ms (down from 16,952ms) | +| Fallback rate | 3% (fall back to search-style results) | ### Conclusion diff --git a/docs/research/experiments/lessons-learned.md b/docs/research/experiments/lessons-learned.md index 5185620..5221180 100644 --- a/docs/research/experiments/lessons-learned.md +++ b/docs/research/experiments/lessons-learned.md @@ -39,7 +39,7 @@ Hop-based distance (traversal depth / turn count difference) instead of wall-clo Same decay function for backward and forward edges: ```typescript -weight = 1 - (hops / 15); // Linear, same for both directions +weight = 1 - hops / 15; // Linear, same for both directions ``` ### What Happened @@ -60,6 +60,7 @@ Backward and forward edges have different semantics: ### The Fix Direction-specific decay: + - Backward: Linear, dies at 10 hops - Forward: Delayed linear, 5-hop hold, dies at 20 hops @@ -86,7 +87,8 @@ O(n² × k) complexity due to `Array.includes()`: // Problematic code in hdbscan-ts for (const point of points) { for (const cluster of clusters) { - if (cluster.includes(point)) { // O(n) lookup + if (cluster.includes(point)) { + // O(n) lookup // ... } } @@ -147,7 +149,7 @@ Assign each chunk to nearest cluster centroid: ```typescript function assignCluster(chunk: Chunk): Cluster { return clusters.reduce((best, cluster) => - distance(chunk, cluster) < distance(chunk, best) ? cluster : best + distance(chunk, cluster) < distance(chunk, best) ? cluster : best, ); } ``` @@ -181,16 +183,16 @@ Chunks beyond threshold remain unclustered (noise). These questions were identified as open before implementation and resolved through experiments: -| Question | Answer | Evidence | -|----------|--------|----------| -| Topic continuity detection | Lexical features (0.998 AUC), 30-min time gap threshold | Topic continuity experiment | -| Embedding model selection | jina-small (0.715 AUC, 0.384 silhouette) | Embedding benchmark | -| Decay curve type | Delayed linear for retrieval, exponential for prediction | Edge decay experiments | -| Directional asymmetry | Yes — +0.64 MRR delta for delayed linear | Forward prediction experiment | -| Thinking block handling | Remove before embedding (+0.063 AUC) | Ablation study | -| Chunk strategy | Turn-based, code-block aware | Parser implementation | -| Cold start problem | Not real — full context until compaction | Design analysis | -| Parallelism detection | Via parentToolUseID + timestamps | Session data inspection | +| Question | Answer | Evidence | +| -------------------------- | -------------------------------------------------------- | ----------------------------- | +| Topic continuity detection | Lexical features (0.998 AUC), 30-min time gap threshold | Topic continuity experiment | +| Embedding model selection | jina-small (0.715 AUC, 0.384 silhouette) | Embedding benchmark | +| Decay curve type | Delayed linear for retrieval, exponential for prediction | Edge decay experiments | +| Directional asymmetry | Yes — +0.64 MRR delta for delayed linear | Forward prediction experiment | +| Thinking block handling | Remove before embedding (+0.063 AUC) | Ablation study | +| Chunk strategy | Turn-based, code-block aware | Parser implementation | +| Cold start problem | Not real — full context until compaction | Design analysis | +| Parallelism detection | Via parentToolUseID + timestamps | Session data inspection | ## Sum-Product Graph Traversal at Scale (v0.3.0) @@ -235,6 +237,7 @@ Sequential 1-to-1 edges. Each chunk links to the next chunk in its session, pres The graph's value is **structural ordering** — what came before and after — not **semantic ranking**. Vector search and BM25 are better at "what's relevant to this query." The graph is better at "given something relevant, what's the surrounding narrative?" This separation of concerns led to the current architecture: + - **Semantic discovery**: Hybrid BM25 + vector search (fast, accurate, query-driven) - **Structural context**: Chain walking along sequential edges (episodic, narrative, seed-driven) - **Topic grouping**: HDBSCAN clustering (browsing, organization) diff --git a/docs/research/experiments/topic-continuity.md b/docs/research/experiments/topic-continuity.md index a2ae3c0..fe53890 100644 --- a/docs/research/experiments/topic-continuity.md +++ b/docs/research/experiments/topic-continuity.md @@ -18,13 +18,13 @@ Topic transitions in conversation sessions can be detected using lexical and str We evaluated multiple feature types for boundary detection: -| Feature Type | Description | -|--------------|-------------| -| Lexical overlap | Jaccard similarity of tokens | -| Reference continuity | Shared file paths mentioned | -| Turn length delta | Change in message length | -| Tool usage pattern | Same/different tools used | -| Time gap | Seconds between turns | +| Feature Type | Description | +| -------------------- | ---------------------------- | +| Lexical overlap | Jaccard similarity of tokens | +| Reference continuity | Shared file paths mentioned | +| Turn length delta | Change in message length | +| Tool usage pattern | Same/different tools used | +| Time gap | Seconds between turns | ### Metrics @@ -37,12 +37,12 @@ We evaluated multiple feature types for boundary detection: ### Feature Comparison -| Feature Set | AUC | Precision | Recall | F1 | -|-------------|-----|-----------|--------|-----| -| Lexical only | **0.998** | 0.95 | 0.94 | 0.945 | -| Reference only | 0.923 | 0.89 | 0.85 | 0.869 | -| Time gap only | 0.712 | 0.68 | 0.71 | 0.695 | -| Combined (all) | 0.997 | 0.96 | 0.93 | 0.945 | +| Feature Set | AUC | Precision | Recall | F1 | +| -------------- | --------- | --------- | ------ | ----- | +| Lexical only | **0.998** | 0.95 | 0.94 | 0.945 | +| Reference only | 0.923 | 0.89 | 0.85 | 0.869 | +| Time gap only | 0.712 | 0.68 | 0.71 | 0.695 | +| Combined (all) | 0.997 | 0.96 | 0.93 | 0.945 | **Winner**: Lexical features alone achieve near-perfect AUC (0.998). @@ -50,13 +50,13 @@ We evaluated multiple feature types for boundary detection: Optimal Jaccard similarity threshold for boundary detection: -| Threshold | Precision | Recall | F1 | -|-----------|-----------|--------|-----| -| 0.1 | 0.78 | 0.98 | 0.869 | -| 0.2 | 0.89 | 0.95 | 0.919 | -| 0.3 | 0.95 | 0.94 | **0.945** | -| 0.4 | 0.97 | 0.89 | 0.928 | -| 0.5 | 0.98 | 0.82 | 0.893 | +| Threshold | Precision | Recall | F1 | +| --------- | --------- | ------ | --------- | +| 0.1 | 0.78 | 0.98 | 0.869 | +| 0.2 | 0.89 | 0.95 | 0.919 | +| 0.3 | 0.95 | 0.94 | **0.945** | +| 0.4 | 0.97 | 0.89 | 0.928 | +| 0.5 | 0.98 | 0.82 | 0.893 | **Optimal threshold**: 0.3 (F1=0.945) @@ -87,11 +87,13 @@ Boundary (high boundary score): ### Edge Cases **False positives** (predicted boundary, actually continuation): + - Long explanations with new vocabulary - Copy-pasted code blocks with different content - Multi-step debugging with different error messages **False negatives** (missed boundary): + - Quick topic switches within same file - Related topics (e.g., auth → session management) @@ -104,7 +106,7 @@ function shouldChunk(prevTurn: Turn, currTurn: Turn): boolean { const prevTokens = new Set(tokenize(prevTurn.content)); const currTokens = new Set(tokenize(currTurn.content)); - const intersection = [...currTokens].filter(t => prevTokens.has(t)); + const intersection = [...currTokens].filter((t) => prevTokens.has(t)); const union = new Set([...prevTokens, ...currTokens]); const jaccard = intersection.length / union.size; @@ -140,14 +142,14 @@ Results are saved to `benchmark-results/topic-continuity/`. The comprehensive 75-session run revealed where topic labels come from: -| Source | Count | Label | Confidence | -|--------|-------|-------|------------| -| Same-session adjacent | 1,470 | continuation | medium | -| Tool/file references | 772 | continuation | high | -| Explicit continuation markers | 710 | continuation | high | -| Time gap (>30 min) | 155 | new_topic | medium | -| Session boundaries | 45 | new_topic | high | -| Explicit shift markers | 27 | new_topic | high | +| Source | Count | Label | Confidence | +| ----------------------------- | ----- | ------------ | ---------- | +| Same-session adjacent | 1,470 | continuation | medium | +| Tool/file references | 772 | continuation | high | +| Explicit continuation markers | 710 | continuation | high | +| Time gap (>30 min) | 155 | new_topic | medium | +| Session boundaries | 45 | new_topic | high | +| Explicit shift markers | 27 | new_topic | high | The dataset is imbalanced (93% continuations), reflecting the reality that most adjacent turns in coding sessions continue the same topic. diff --git a/docs/research/future-work.md b/docs/research/future-work.md index 2492c9d..be05012 100644 --- a/docs/research/future-work.md +++ b/docs/research/future-work.md @@ -25,6 +25,7 @@ These items were previously listed as future work and have since been implemente **Current State**: Chain walker follows sequential edges with cosine-similarity scoring. Depth limit of 50. **Potential Improvements**: + 1. Adaptive depth — stop walking when cosine similarity drops below threshold for N consecutive hops 2. Bidirectional walk merging — combine backward and forward walks more intelligently 3. Branch-aware walking — follow brief/debrief edges into sub-agent chains when relevant @@ -34,6 +35,7 @@ These items were previously listed as future work and have since been implemente **Goal**: Learn from user interactions to improve retrieval quality. **Approach**: + - Track which retrieved chunks the user actually references in conversation - Use implicit feedback (chunks that appear in subsequent tool calls) as positive signal - Adjust chain walking scoring or seed selection based on feedback patterns @@ -43,11 +45,13 @@ These items were previously listed as future work and have since been implemente **Goal**: Demonstrate cost savings from smarter context retrieval. **Approach**: + - Track context window usage before/after memory augmentation - Measure compression ratio (raw session tokens vs retrieved context) - Dashboard showing memory efficiency metrics **Implementation Ideas**: + - Hook into Claude Code's token reporting - Log baseline vs augmented queries - Calculate effective compression ratio @@ -61,6 +65,7 @@ These items were previously listed as future work and have since been implemente **Challenge**: HDBSCAN is not inherently incremental. **Potential Approaches**: + 1. Approximate nearest cluster assignment for new points 2. Periodic full re-clustering with incremental updates between 3. Explore online clustering algorithms (DBSTREAM, DenStream) @@ -72,11 +77,13 @@ These items were previously listed as future work and have since been implemente **Goal**: Make embedding model configurable. **Options**: + - jina-small (current default) - jina-base (higher quality) - Custom fine-tuned model **Considerations**: + - Changing models requires re-embedding all chunks - Model-specific clustering thresholds - Storage for multiple embedding versions @@ -86,6 +93,7 @@ These items were previously listed as future work and have since been implemente **Goal**: Direct integration without MCP. **Features**: + - Inline memory suggestions - Memory explorer panel - Query interface in editor @@ -97,6 +105,7 @@ These items were previously listed as future work and have since been implemente **Goal**: Share memory across team members. **Considerations**: + - Privacy (redact sensitive content) - Encryption (secure transport) - Conflict resolution (merge strategies) @@ -107,6 +116,7 @@ These items were previously listed as future work and have since been implemente **Goal**: Handle images, diagrams, and other media. **Challenges**: + - Embedding non-text content - Storage format - Retrieval across modalities @@ -120,6 +130,7 @@ These items were previously listed as future work and have since been implemente **Current**: Code-aware chunking based on structure. **To Explore**: + - Fixed token counts - Semantic boundaries - Adaptive sizing based on content @@ -131,6 +142,7 @@ These items were previously listed as future work and have since been implemente **Current**: Based on structural cross-session edges. The [collection benchmark suite](../guides/benchmarking.md) can now measure cross-session bridging quality — run `npx causantic benchmark-collection --full` to evaluate. **To Measure**: + - Precision of cross-session edges - User validation of suggested links - A/B testing of linking strategies @@ -142,6 +154,7 @@ These items were previously listed as future work and have since been implemente **Current**: Fixed curves based on aggregate data. **To Explore**: + - User-specific curve fitting - Project-type differences (frontend vs backend) - Adaptive decay based on retrieval feedback