From 2a533114ac1f6698475e34c611f1cdedcea162ad Mon Sep 17 00:00:00 2001
From: Baseline User
Date: Mon, 2 Mar 2026 12:56:36 +0530
Subject: [PATCH 1/4] chore: migrate pre-commit config to non-deprecated stage
names
Co-Authored-By: Claude Opus 4.6
---
.pre-commit-config.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 93fbf7b..a52b986 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -7,7 +7,7 @@ repos:
name: Gitleaks - Detect secrets
entry: gitleaks detect --source . -c .gitleaks.toml
language: system
- stages: [commit]
+ stages: [pre-commit]
pass_filenames: false
always_run: true
From f8aa5cb0b70ebca1f09188f8313472f4db54f08e Mon Sep 17 00:00:00 2001
From: Baseline User
Date: Mon, 2 Mar 2026 13:59:09 +0530
Subject: [PATCH 2/4] docs: overhaul documentation structure and tone
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Rewrite README with vision-first framing — leads with the agentic
memory problem, origin story, and Ingest→Categorize→Recall→Search
as the central concept
- Add docs/cli.md as the complete command reference (moved out of README)
- Add docs/search.md as a user-facing guide to search vs recall
- Rewrite all user-facing docs (getting-started, team-sharing,
configuration, architecture) to match README tone — direct, honest,
opens with context before diving into mechanics
- Reorganize docs structure: kebab-case throughout, internal planning
docs move to docs/internal/, personal writing gitignored via
docs/writing/
- Rename: CI_HARDENING_EXECUTION_PLAN → internal/ci-hardening,
DESIGN → internal/design, WORKFLOW_AUTOMATION → internal/workflow-automation,
e2e-dev-release-flow-test → internal/e2e-release-flow,
search-recall-architecture → internal/search-analysis
- Update .gitignore: add docs/writing/, .letta/, zsh plugins
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta
---
.gitignore | 10 +
README.md | 555 +++++-------------
docs/architecture.md | 238 ++++----
docs/cli.md | 300 ++++++++--
docs/configuration.md | 104 ++--
docs/getting-started.md | 98 +++-
.../ci-hardening.md} | 0
docs/{DESIGN.md => internal/design.md} | 0
.../e2e-release-flow.md} | 0
.../search-analysis.md} | 0
docs/{ => internal}/website.md | 0
.../workflow-automation.md} | 0
docs/search.md | 134 +++++
docs/team-sharing.md | 168 +++---
14 files changed, 914 insertions(+), 693 deletions(-)
rename docs/{CI_HARDENING_EXECUTION_PLAN.md => internal/ci-hardening.md} (100%)
rename docs/{DESIGN.md => internal/design.md} (100%)
rename docs/{e2e-dev-release-flow-test.md => internal/e2e-release-flow.md} (100%)
rename docs/{search-recall-architecture.md => internal/search-analysis.md} (100%)
rename docs/{ => internal}/website.md (100%)
rename docs/{WORKFLOW_AUTOMATION.md => internal/workflow-automation.md} (100%)
create mode 100644 docs/search.md
diff --git a/.gitignore b/.gitignore
index 659c46f..2e3da59 100644
--- a/.gitignore
+++ b/.gitignore
@@ -43,3 +43,13 @@ temp/
.smriti/CLAUDE.md
.smriti/knowledge/
.smriti/index.json
+
+# Personal writing / local-only notes
+docs/writing/
+
+# Letta Code agent state
+.letta/
+
+# Zsh plugins (should not be in project repo)
+zsh-autosuggestions/
+zsh-syntax-highlighting/
diff --git a/README.md b/README.md
index ba9e970..ad3ee91 100644
--- a/README.md
+++ b/README.md
@@ -2,19 +2,40 @@
-Built on top of [QMD](https://github.com/tobi/qmd) by Tobi Lütke.
+
+ An exploration of memory in the agentic world
+
---
-## The Problem
+The agentic world is moving fast. Every team is shipping with AI — Claude Code,
+Cursor, Codex, Cline. The agents are getting better. The tooling is maturing.
+
+But there's a gap nobody has fully closed:
+
+> **Agents don't remember.**
+
+Not from yesterday. Not from each other. Not from your teammates. Every session
+starts from zero, no matter how much your team has already figured out.
+
+This isn't just a developer experience problem. It's a foundational gap in how
+agents work. As they get more capable and longer-running, memory becomes a
+prerequisite — not a feature. Without it, knowledge stays buried in chat
+histories. Teams re-discover what they've already figured out. Decisions get
+made twice.
+
+The answer, I think, mirrors how our own memory works:
-Your team ships code with AI agents every day — Claude Code, Cursor, Codex. But
-every agent has a blind spot:
+> **Ingest → Categorize → Recall → Search**
-> **They don't remember anything.** Not from yesterday. Not from each other. Not
-> from your teammates.
+That's the brain. That's what **Smriti** (Sanskrit: _memory_) is building
+toward.
-Here's what that looks like:
+---
+
+## The Problem, Up Close
+
+Here's what the gap looks like in practice:
| Monday | Tuesday |
| ------------------------------------------------------------- | --------------------------------------------------- |
@@ -31,16 +52,17 @@ The result:
- **Zero continuity** — each session starts from scratch, no matter how much
your team has already figured out
-The agents are brilliant. But they're amnesic. **This is the biggest gap in
-AI-assisted development today.**
+The agents are brilliant. But they're amnesic. **This is the biggest unsolved
+gap in agentic AI today.**
+
+---
## What Smriti Does
-**Smriti** (Sanskrit: _memory_) is a shared memory layer that sits underneath
-all your AI agents.
+Smriti is a shared memory layer that sits underneath your AI agents.
-Every conversation → automatically captured → indexed →
-searchable. One command to recall what matters.
+Every conversation → automatically captured → indexed → searchable. One command
+to recall what matters.
```bash
# What did we figure out about the auth migration?
@@ -53,12 +75,67 @@ smriti list --project myapp
smriti search "rate limiting strategy" --project api-service
```
-> **20,000 tokens** of past conversations → **500 tokens** of relevant
-> context. Your agents get what they need without blowing up your token budget.
+> **20,000 tokens** of past conversations → **500 tokens** of relevant context.
+> Your agents get what they need without blowing up your token budget.
-## The Workflow
+Built on top of [QMD](https://github.com/tobi/qmd) by Tobi Lütke. Everything
+runs locally — no cloud, no accounts, no telemetry.
+
+---
+
+## Install
+
+**macOS / Linux:**
+
+```bash
+curl -fsSL https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.sh | bash
+```
+
+**Windows** (PowerShell):
+
+```powershell
+irm https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.ps1 | iex
+```
+
+Both installers will:
+
+- Install [Bun](https://bun.sh) if you don't have it
+- Clone Smriti to `~/.smriti`
+- Set up the `smriti` CLI on your PATH
+- Configure the Claude Code auto-save hook
+
+**Requirements:** macOS, Linux, or Windows 10+ · Git · Bun ≥ 1.1
+(auto-installed) · Ollama (optional, for synthesis)
+
+```bash
+smriti upgrade # update to latest
+```
+
+---
+
+## Quick Start
+
+```bash
+# 1. Ingest your recent Claude Code sessions
+smriti ingest claude
+
+# 2. Search what your team has discussed
+smriti search "database connection pooling"
+
+# 3. Recall with synthesis into one coherent summary (requires Ollama)
+smriti recall "how did we handle rate limiting" --synthesize
+
+# 4. Share knowledge with your team through git
+smriti share --project myapp
+git add .smriti && git commit -m "chore: share session knowledge"
+
+# 5. Teammates pull it in
+smriti sync --project myapp
+```
+
+---
-Here's what changes when your team runs Smriti:
+## The Workflow
**1. Conversations are captured automatically**
@@ -101,49 +178,11 @@ teammates pull it and import it into their local memory. No cloud service, no
account, no sync infrastructure — just git.
```bash
-# Share what you've learned
smriti share --project myapp --category decision
-
-# Pull in what others have shared
smriti sync --project myapp
```
-## Install
-
-**macOS / Linux:**
-
-```bash
-curl -fsSL https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.sh | bash
-```
-
-**Windows** (PowerShell):
-
-```powershell
-irm https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.ps1 | iex
-```
-
-Both installers will:
-
-- Install [Bun](https://bun.sh) if you don't have it
-- Clone Smriti to `~/.smriti`
-- Set up the `smriti` CLI on your PATH
-- Configure the Claude Code auto-save hook
-
-### Requirements
-
-- **macOS, Linux, or Windows 10+**
-- **Git**
-- **Bun** >= 1.1 (installed automatically)
-- **Ollama** (optional — for local summarization and synthesis)
-
-### Upgrade
-
-```bash
-smriti upgrade
-```
-
-Pulls the latest version from GitHub and reinstalls dependencies. Equivalent to
-re-running the install script.
+---
## Commands
@@ -175,8 +214,8 @@ smriti categorize # Auto-categorize sessions
smriti projects # List all tracked projects
smriti upgrade # Update smriti to the latest version
-# Context and comparison
-smriti context # Generate project context for .smriti/CLAUDE.md
+# Context injection
+smriti context # Generate project context → .smriti/CLAUDE.md
smriti context --dry-run # Preview without writing
smriti compare --last # Compare last 2 sessions (tokens, tools, files)
smriti compare # Compare specific sessions
@@ -187,6 +226,8 @@ smriti sync # Import teammates' shared knowledge
smriti team # View team contributions
```
+---
+
## How It Works
```
@@ -221,308 +262,7 @@ Claude Code Cursor Codex Other Agents
Everything runs locally. Your conversations never leave your machine. The SQLite
database, the embeddings, the search indexes — all on disk, all yours.
-## Ingest Architecture
-
-Smriti ingest uses a layered pipeline:
-
-1. `parsers/*` extract agent transcripts into normalized messages (no DB writes).
-2. `session-resolver` derives project/session state, including incremental offsets.
-3. `store-gateway` persists messages, sidecars, session meta, and costs.
-4. `ingest/index.ts` orchestrates the flow with per-session error isolation.
-
-This keeps parser logic, resolution logic, and persistence logic separated and testable.
-See `INGEST_ARCHITECTURE.md` and `src/ingest/README.md` for implementation details.
-
-## Tagging & Categories
-
-Sessions and messages are automatically tagged into a hierarchical category
-tree. Tags flow through every command — search, recall, list, and share — so you
-can slice your team's knowledge by topic.
-
-### Default Category Tree
-
-Smriti ships with 7 top-level categories and 21 subcategories:
-
-| Category | Subcategories |
-| -------------- | ----------------------------------------------------------------------- |
-| `code` | `code/implementation`, `code/pattern`, `code/review`, `code/snippet` |
-| `architecture` | `architecture/design`, `architecture/decision`, `architecture/tradeoff` |
-| `bug` | `bug/report`, `bug/fix`, `bug/investigation` |
-| `feature` | `feature/requirement`, `feature/design`, `feature/implementation` |
-| `project` | `project/setup`, `project/config`, `project/dependency` |
-| `decision` | `decision/technical`, `decision/process`, `decision/tooling` |
-| `topic` | `topic/learning`, `topic/explanation`, `topic/comparison` |
-
-### Auto-Classification
-
-Smriti uses a two-stage pipeline to classify messages:
-
-1. **Rule-based** — 24 keyword patterns with weighted confidence scoring. Each
- pattern targets a specific subcategory (e.g., words like "crash",
- "stacktrace", "panic" map to `bug/report`). Confidence is calculated from
- keyword density and rule weight.
-2. **LLM fallback** — When rule confidence falls below the threshold (default
- `0.5`, configurable via `SMRITI_CLASSIFY_THRESHOLD`), Ollama classifies the
- message. Only activated when you pass `--llm`.
-
-The most frequent category across a session's messages becomes the session-level
-tag.
-
-```bash
-# Auto-categorize all uncategorized sessions (rule-based)
-smriti categorize
-
-# Include LLM fallback for ambiguous sessions
-smriti categorize --llm
-
-# Categorize a specific session
-smriti categorize --session
-```
-
-### Manual Tagging
-
-Override or supplement auto-classification with manual tags:
-
-```bash
-smriti tag
-
-# Examples
-smriti tag abc123 decision/technical
-smriti tag abc123 bug/fix
-```
-
-Manual tags are stored with confidence `1.0` and source `"manual"`.
-
-### Custom Categories
-
-Add your own categories to extend the default tree:
-
-```bash
-# List the full category tree
-smriti categories
-
-# Add a top-level category
-smriti categories add ops --name "Operations"
-
-# Add a nested category under an existing parent
-smriti categories add ops/incident --name "Incident Response" --parent ops
-
-# Include a description
-smriti categories add ops/runbook --name "Runbooks" --parent ops --description "Operational runbook sessions"
-```
-
-### How Tags Filter Commands
-
-The `--category` flag works across search, recall, list, and share:
-
-| Command | Effect of `--category` |
-| --------------- | ------------------------------------------------------------------------------- |
-| `smriti list` | Shows categories column; filters sessions to matching category |
-| `smriti search` | Filters full-text search results to matching category |
-| `smriti recall` | Filters recall context; works with `--synthesize` |
-| `smriti share` | Controls which sessions are exported; files organized into `.smriti/knowledge/` |
-| `smriti status` | Shows session count per category (no filter flag — always shows all) |
-
-**Hierarchical filtering** — Filtering by a parent category automatically
-includes all its children. `--category decision` matches `decision/technical`,
-`decision/process`, and `decision/tooling`.
-
-### Categories in Share & Sync
-
-**Categories survive the share/sync roundtrip exactly.** What gets serialized
-during `smriti share` is exactly what gets deserialized during `smriti sync` —
-the same category ID goes in, the same category ID comes out. No
-reclassification, no transformation, no loss. The category a session was tagged
-with on one machine is the category it will be indexed under on every other
-machine that syncs it.
-
-When you share sessions, the category is embedded in YAML frontmatter inside
-each exported markdown file:
-
-```yaml
----
-id: 2e5f420a-e376-4ad4-8b35-ad94838cbc42
-category: project
-project: smriti
-agent: claude-code
-author: zero8
-shared_at: 2026-02-10T11:29:44.501Z
-tags: ["project", "project/dependency"]
---
-```
-
-When a teammate runs `smriti sync`, the frontmatter is parsed and the category
-is restored into their local `smriti_session_tags` table — indexed as `project`,
-searchable as `project`, filterable as `project`. The serialization and
-deserialization are symmetric: `share` writes `category: project` → `sync` reads
-`category: project` → `tagSession(db, sessionId, "project", 1.0, "team")`. No
-intermediate step reinterprets the value.
-
-Files are organized into subdirectories by primary category (e.g.,
-`.smriti/knowledge/project/`, `.smriti/knowledge/decision/`), but sync reads the
-category from frontmatter, not the directory path.
-
-> **Note:** Currently only the primary `category` field is restored on sync.
-> Secondary tags in the `tags` array are serialized in the frontmatter but not
-> yet imported. If a session had multiple tags (e.g., `project` +
-> `decision/tooling`), only the primary tag survives the roundtrip.
-
-```bash
-# Share decisions — category metadata travels with the files
-smriti share --project myapp --category decision
-
-# Teammate syncs — categories restored exactly from frontmatter
-smriti sync --project myapp
-```
-
-### Examples
-
-```bash
-# All architectural decisions
-smriti search "database" --category architecture
-
-# Recall only bug-related context
-smriti recall "connection timeout" --category bug --synthesize
-
-# List feature sessions for a specific project
-smriti list --category feature --project myapp
-
-# Share only decision sessions
-smriti share --project myapp --category decision
-```
-
-## Context: Token Reduction (North Star)
-
-Every new Claude Code session starts from zero — no awareness of what happened
-yesterday, which files were touched, what decisions were made. `smriti context`
-generates a compact project summary (~200-300 tokens) and injects it into
-`.smriti/CLAUDE.md`, which Claude Code auto-discovers.
-
-```bash
-smriti context # auto-detect project, write .smriti/CLAUDE.md
-smriti context --dry-run # preview without writing
-smriti context --project myapp # explicit project
-smriti context --days 14 # 14-day lookback (default: 7)
-```
-
-The output looks like this:
-
-```markdown
-## Project Context
-
-> Auto-generated by `smriti context` on 2026-02-11. Do not edit manually.
-
-### Recent Sessions (last 7 days)
-
-- **2h ago** Enriched ingestion pipeline (12 turns) [code]
-- **1d ago** Search & recall pipeline (8 turns) [feature]
-
-### Hot Files
-
-`src/db.ts` (14 ops), `src/ingest/claude.ts` (11 ops), `src/search/index.ts` (8
-ops)
-
-### Git Activity
-
-- commit `main`: "Fix auth token refresh" (2026-02-10)
-
-### Usage
-
-5 sessions, 48 turns, ~125K input / ~35K output tokens
-```
-
-No Ollama, no network calls, no model loading. Pure SQL queries against sidecar
-tables, rendered as markdown. Runs in < 100ms.
-
-### Measuring the Impact
-
-Does this actually save tokens? Honestly — we don't know yet. We built the tools
-to measure it, ran A/B tests, and the results so far are... humbling. Claude is
-annoyingly good at finding the right files even without help.
-
-But this is the north star, not the destination. We believe context injection
-will matter most on large codebases without detailed docs, ambiguous tasks that
-require exploration, and multi-session continuity. We just need the data to
-prove it (or disprove it and try something else).
-
-So we're shipping the measurement tools and asking you to help. Run A/B tests on
-your projects, paste the results in
-[Issue #13](https://github.com/zero8dotdev/smriti/issues/13), and let's figure
-this out together.
-
-#### A/B Testing Guide
-
-```bash
-# Step 1: Baseline session (no context)
-mv .smriti/CLAUDE.md .smriti/CLAUDE.md.bak
-# Start a Claude Code session, give it a task, let it finish, exit
-
-# Step 2: Context session
-mv .smriti/CLAUDE.md.bak .smriti/CLAUDE.md
-smriti context
-# Start a new session, give the EXACT same task, let it finish, exit
-
-# Step 3: Ingest and compare
-smriti ingest claude
-smriti compare --last --project myapp
-```
-
-#### Compare Command
-
-```bash
-smriti compare # by session ID (supports partial IDs)
-smriti compare --last # last 2 sessions for current project
-smriti compare --last --project myapp # last 2 sessions for specific project
-smriti compare --last --json # machine-readable output
-```
-
-Output:
-
-```
-Session A: Fix auth bug (no context)
-Session B: Fix auth bug (with context)
-
-Metric A B Diff
-----------------------------------------------------------------
-Turns 12 8 -4 (-33%)
-Total tokens 45K 32K -13000 (-29%)
-Tool calls 18 11 -7 (-39%)
-File reads 10 4 -6 (-60%)
-
-Tool breakdown:
- Bash 4 3
- Glob 3 0
- Read 10 4
- Write 1 4
-```
-
-#### What We've Tested So Far
-
-| Task Type | Context Impact | Notes |
-| ----------------------------------------- | -------------- | ---------------------------------------------------------------------- |
-| Knowledge questions ("how does X work?") | Minimal | Both sessions found the right files immediately from project CLAUDE.md |
-| Implementation tasks ("add --since flag") | Minimal | Small, well-scoped tasks don't need exploration |
-| Ambiguous/exploration tasks | Untested | Expected sweet spot — hot files guide Claude to the right area |
-| Large codebases (no project CLAUDE.md) | Untested | Expected sweet spot — context replaces missing documentation |
-
-**We need your help.** If you run A/B tests on your projects, please share your
-results in [GitHub Issues](https://github.com/zero8dotdev/smriti/issues).
-Include the `smriti compare` output and a description of the task. This data
-will help us understand where context injection actually matters.
-
-### Token Savings (Search & Recall)
-
-Separate from context injection, Smriti's search and recall pipeline compresses
-past conversations:
-
-| Scenario | Raw Conversations | Via Smriti | Reduction |
-| ----------------------------------- | ----------------- | ----------- | --------- |
-| Relevant context from past sessions | ~20,000 tokens | ~500 tokens | **40x** |
-| Multi-session recall + synthesis | ~10,000 tokens | ~200 tokens | **50x** |
-| Full project conversation history | 50,000+ tokens | ~500 tokens | **100x** |
-
-Lower token spend, faster responses, more room for the actual work in your
-context window.
## Privacy
@@ -534,68 +274,87 @@ Smriti is local-first by design. No cloud, no telemetry, no accounts.
- Synthesis via local [Ollama](https://ollama.ai) (optional)
- Team sharing happens through git — you control what gets committed
+---
+
## FAQ
-**When does knowledge get captured?** Automatically. Smriti hooks into your AI
-coding tool (Claude Code, Cursor, etc.) and captures every session without any
-manual step. You just code normally and `smriti ingest` pulls in the
-conversations.
+**When does knowledge get captured?** Automatically. Smriti hooks into Claude
+Code and captures every session without any manual step. For other agents, run
+`smriti ingest all` to pull in conversations on demand.
**Who has access to my data?** Only you. Everything lives in a local SQLite
-database (`~/.cache/qmd/index.sqlite`). There's no cloud, no accounts, no
-telemetry. Team sharing is explicit — you run `smriti share` to export, commit
-the `.smriti/` folder to git, and teammates run `smriti sync` to import.
+database. There's no cloud, no accounts, no telemetry. Team sharing is
+explicit — you run `smriti share`, commit the `.smriti/` folder, and teammates
+run `smriti sync`.
**Can AI agents query the knowledge base?** Yes. `smriti recall "query"` returns
-relevant past context that agents can use. When you run `smriti share`, it
-generates a `.smriti/CLAUDE.md` index so Claude Code automatically discovers
-shared knowledge. Agents can search, grep, and recall from the full knowledge
-base.
+relevant past context. `smriti share` generates a `.smriti/CLAUDE.md` so Claude
+Code automatically discovers shared knowledge at the start of every session.
**How do multiple projects stay separate?** Each project gets its own `.smriti/`
-folder in its repo root. Sessions are tagged with project IDs in the central
-database. Search works cross-project by default, but you can scope to a single
-project with `--project `. Knowledge shared via git stays within that
-project's repo.
+folder. Sessions are tagged with project IDs in the central database. Search
+works cross-project by default, scoped with `--project `.
**Does this work with Jira or other issue trackers?** Not yet — Smriti is
-git-native today. Issue tracker integrations are on the roadmap. If you have
-ideas, open a discussion in
-[GitHub Issues](https://github.com/zero8dotdev/smriti/issues).
+git-native today. Issue tracker integrations are on the roadmap.
-**How does this help preserve existing features during changes?** The reasoning
-behind each code change is captured and searchable. When an AI agent starts a
-new session, it can recall _why_ something was built a certain way — reducing
-the chance of accidentally breaking existing behavior.
+**Further reading:** See [docs/cli.md](./docs/cli.md) for the full command
+reference, [INGEST_ARCHITECTURE.md](./INGEST_ARCHITECTURE.md) for the ingestion
+pipeline, and [CLAUDE.md](./CLAUDE.md) for the database schema and
+architecture.
-## Uninstall
+---
-```bash
-curl -fsSL https://raw.githubusercontent.com/zero8dotdev/smriti/main/uninstall.sh | bash
-```
+## About
-To also remove hook state, prepend `SMRITI_PURGE=1` to the command.
+I've been coding with AI agents for about 8 months. At some point the
+frustration became impossible to ignore — every new session, you start from
+zero. Explaining the same context, the same decisions, the same constraints.
+That's not a great developer experience.
+
+I started small: custom prompts to export Claude sessions. That grew old fast. I
+needed categorization. Found QMD. Started building on top of it. Dogfooded it.
+Hit walls. Solved one piece at a time.
+
+At some point it worked well enough that I shared it with some friends. Some
+used it, some ignored it — fair, the AI tooling space is noisy. But I kept
+exploring, and found others building toward the same problem: Claude-mem, Letta,
+a growing community of people who believe memory is the next foundational layer
+for AI.
-## Documentation
+That's what Smriti is, really. An exploration. The developer tool is one layer.
+But the deeper question is: what does memory for autonomous agents actually need
+to look like? The answer probably mirrors how our own brain works — **Ingest →
+Categorize → Recall → Search**. We're figuring that out, one piece at a time.
-See [CLAUDE.md](./CLAUDE.md) for the full reference — API docs, database schema,
-architecture details, and troubleshooting.
+I come from the developer tooling space. Bad tooling bothers me. There's always
+a better way. This is that project.
+
+---
## Special Thanks
Smriti is built on top of [QMD](https://github.com/tobi/qmd) — a beautifully
designed local search engine for markdown files created by
-[Tobi Lütke](https://github.com/tobi), CEO of Shopify.
+[Tobi Lütke](https://github.com/tobi), CEO of Shopify. QMD gave us fast,
+local-first SQLite with full-text search, vector embeddings, and
+content-addressable hashing — all on your machine, zero cloud dependencies.
+Instead of rebuilding that infrastructure from scratch, we focused entirely on
+the memory layer, multi-agent ingestion, and team sharing.
-QMD gave us the foundation we needed: a fast, local-first SQLite store with
-full-text search, vector embeddings, and content-addressable hashing — all
-running on your machine with zero cloud dependencies. Instead of rebuilding that
-infrastructure from scratch, we were able to focus entirely on the memory layer,
-multi-agent ingestion, and team sharing that makes Smriti useful.
+Thank you, Tobi, for open-sourcing it.
+
+---
-Thank you, Tobi, for open-sourcing QMD. It's a reminder that the best tools are
-often the ones that quietly do the hard work so others can build something new
-on top.
+## Uninstall
+
+```bash
+curl -fsSL https://raw.githubusercontent.com/zero8dotdev/smriti/main/uninstall.sh | bash
+```
+
+To also remove hook state, prepend `SMRITI_PURGE=1` to the command.
+
+---
## License
diff --git a/docs/architecture.md b/docs/architecture.md
index 5ec1d33..1740953 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,149 +1,139 @@
# Architecture
-## Overview
+Smriti's architecture follows the same pattern as memory in your brain:
+**Ingest → Categorize → Recall → Search**.
-```
- Claude Code Cursor Codex Other Agents
- | | | |
- v v v v
- ┌──────────────────────────────────────────┐
- │ Smriti Ingestion Layer │
- │ │
- │ src/ingest/claude.ts (JSONL parser) │
- │ src/ingest/codex.ts (JSONL parser) │
- │ src/ingest/cursor.ts (JSON parser) │
- │ src/ingest/generic.ts (file import) │
- └──────────────────┬───────────────────────┘
- │
- v
- ┌──────────────────────────────────────────┐
- │ QMD Core (via src/qmd.ts) │
- │ │
- │ addMessage() content-addressed │
- │ searchMemoryFTS() BM25 full-text │
- │ searchMemoryVec() vector similarity │
- │ recallMemories() dedup + synthesis │
- └──────────────────┬───────────────────────┘
- │
- v
- ┌──────────────────────────────────────────┐
- │ SQLite Database │
- │ ~/.cache/qmd/index.sqlite │
- │ │
- │ QMD tables: │
- │ memory_sessions memory_messages │
- │ memory_fts content_vectors │
- │ │
- │ Smriti tables: │
- │ smriti_session_meta (agent, project) │
- │ smriti_projects (registry) │
- │ smriti_categories (taxonomy) │
- │ smriti_session_tags (categorization) │
- │ smriti_message_tags (categorization) │
- │ smriti_shares (team dedup) │
- └──────────────────────────────────────────┘
-```
+Every layer has one job. Parsers extract conversations. The resolver maps
+them to projects. The store persists them. Search retrieves them. Nothing
+crosses those boundaries.
-## QMD Integration
+---
-Smriti builds on top of [QMD](https://github.com/tobi/qmd), a local-first search engine. QMD provides:
-
-- **Content-addressable storage** — Messages are SHA256-hashed, no duplicates
-- **FTS5 full-text search** — BM25 ranking with Porter stemming
-- **Vector embeddings** — 384-dim vectors via embeddinggemma (node-llama-cpp)
-- **Reciprocal Rank Fusion** — Combines FTS and vector results
+## System Overview
-All QMD imports go through a single re-export hub at `src/qmd.ts`:
-
-```ts
-// Every file imports from here, never from qmd directly
-import { addMessage, searchMemoryFTS, recallMemories } from "./qmd";
-import { hashContent } from "./qmd";
-import { ollamaRecall } from "./qmd";
+```
+Claude Code Cursor Codex Cline Copilot
+ | | | | |
+ v v v v v
+┌──────────────────────────────────────────────┐
+│ Smriti Ingestion Layer │
+│ │
+│ parsers/claude.ts (JSONL) │
+│ parsers/codex.ts (JSONL) │
+│ parsers/cursor.ts (JSON) │
+│ parsers/cline.ts (task files) │
+│ parsers/copilot.ts (VS Code storage) │
+│ parsers/generic.ts (file import) │
+│ │
+│ session-resolver.ts (project detection) │
+│ store-gateway.ts (persistence) │
+└──────────────────┬───────────────────────────┘
+ │
+ v
+┌──────────────────────────────────────────────┐
+│ QMD Core (via src/qmd.ts) │
+│ │
+│ addMessage() content-addressed │
+│ searchMemoryFTS() BM25 full-text │
+│ searchMemoryVec() vector similarity │
+│ recallMemories() dedup + synthesis │
+└──────────────────┬───────────────────────────┘
+ │
+ v
+┌──────────────────────────────────────────────┐
+│ SQLite (~/.cache/qmd/index.sqlite) │
+│ │
+│ QMD tables: │
+│ memory_sessions memory_messages │
+│ memory_fts (BM25) content_vectors │
+│ │
+│ Smriti tables: │
+│ smriti_session_meta (agent, project) │
+│ smriti_projects (registry) │
+│ smriti_categories (taxonomy) │
+│ smriti_session_tags (categorization) │
+│ smriti_message_tags (categorization) │
+│ smriti_shares (team dedup) │
+└──────────────────────────────────────────────┘
```
-This creates a clean boundary — if QMD's API changes, only `src/qmd.ts` needs updating.
+Everything runs locally. Nothing leaves your machine.
-## Ingestion Pipeline
+---
-Each agent has a dedicated parser. The flow:
+## Built on QMD
-1. **Discover** — Glob for session files in agent-specific log directories
-2. **Deduplicate** — Check `smriti_session_meta` for already-ingested session IDs
-3. **Parse** — Agent-specific parsing into a common `ParsedMessage[]` format
-4. **Store** — Save via QMD's `addMessage()` (content-addressed, SHA256 hashed)
-5. **Annotate** — Attach Smriti metadata (agent ID, project ID) to `smriti_session_meta`
+Smriti builds on [QMD](https://github.com/tobi/qmd) — a local-first search
+engine for markdown files by Tobi Lütke. QMD handles the hard parts:
-### Project Detection (Claude Code)
+- **Content-addressable storage** — messages are SHA256-hashed, no duplicates
+- **FTS5 full-text search** — BM25 ranking with Porter stemming
+- **Vector embeddings** — 384-dim via EmbeddingGemma (node-llama-cpp),
+ computed entirely on-device
+- **Reciprocal Rank Fusion** — combines FTS and vector results
-Claude Code stores sessions in `~/.claude/projects//`. The directory name encodes the filesystem path with `-` replacing `/`:
+All QMD imports go through a single re-export hub at `src/qmd.ts`. No file
+in the codebase imports from QMD directly — only through this hub. If QMD's
+API changes, one file needs updating.
+```ts
+import { addMessage, searchMemoryFTS, recallMemories } from "./qmd";
+import { hashContent, ollamaRecall } from "./qmd";
```
--Users-zero8-zero8.dev-openfga → /Users/zero8/zero8.dev/openfga
-```
-
-Since folder names can also contain dashes, `deriveProjectPath()` uses greedy `existsSync()` matching: it tries candidate paths from left to right, picking the longest existing directory at each step.
-`deriveProjectId()` then strips the configured `PROJECTS_ROOT` (default `~/zero8.dev`) to produce a clean project name like `openfga` or `avkash/regulation-hub`.
+---
-## Search Architecture
-
-### Filtered Search
+## Ingestion Pipeline
-`searchFiltered()` in `src/search/index.ts` extends QMD's FTS5 search with JOINs to Smriti's metadata tables:
+Ingestion is a four-stage pipeline with clean separation between stages:
-```sql
-FROM memory_fts mf
-JOIN memory_messages mm ON mm.rowid = mf.rowid
-JOIN memory_sessions ms ON ms.id = mm.session_id
-LEFT JOIN smriti_session_meta sm ON sm.session_id = mm.session_id
-WHERE mf.content MATCH ?
- AND sm.project_id = ? -- project filter
- AND sm.agent_id = ? -- agent filter
- AND EXISTS (...) -- category filter via smriti_message_tags
-```
+1. **Parse** — agent-specific parsers extract conversations into a normalized
+ `ParsedMessage[]` format. No DB writes, no side effects. Pure functions.
+2. **Resolve** — `session-resolver.ts` maps sessions to projects, handles
+ incremental ingestion (picks up where it left off), derives clean project
+ IDs from agent-specific path formats.
+3. **Store** — `store-gateway.ts` persists messages, session metadata,
+ sidecars, and cost data. All writes go through here.
+4. **Orchestrate** — `ingest/index.ts` drives the flow with per-session error
+ isolation. One broken session doesn't stop the rest.
-### Recall
+### Project Detection
-`recall()` in `src/search/recall.ts` wraps search with:
+Claude Code encodes project paths into directory names like
+`-Users-zero8-zero8.dev-openfga` (slashes become dashes). Since folder
+names can also contain real dashes, `deriveProjectPath()` uses greedy
+`existsSync()` matching — trying candidate paths left to right, picking the
+longest valid directory at each step.
-1. **Session deduplication** — Keep only the best-scoring result per session
-2. **Optional synthesis** — Sends results to Ollama's `ollamaRecall()` for a coherent summary
+`deriveProjectId()` then strips `SMRITI_PROJECTS_ROOT` to produce a clean
+name: `openfga`, `avkash/regulation-hub`.
-When no filters are specified, it delegates directly to QMD's native `recallMemories()`.
+---
-## Team Sharing
+## Search
-### Export (`smriti share`)
+Smriti adds a metadata filter layer on top of QMD's native search:
-Sessions are exported as markdown files with YAML frontmatter:
-
-```
-.smriti/
-├── config.json
-├── index.json # Manifest of all shared files
-└── knowledge/
- ├── decision/
- │ └── 2026-02-10_auth-migration-approach.md
- └── bug/
- └── 2026-02-09_connection-pool-fix.md
-```
+**`smriti search`** — FTS5 full-text with JOINs to Smriti's metadata tables.
+Filters by project, agent, and category without touching the vector index.
+Fast, synchronous, no model loading.
-Each file contains:
-- YAML frontmatter (session ID, category, project, agent, author, tags)
-- Session title as heading
-- Summary (if available)
-- Full conversation in `**role**: content` format
+**`smriti recall`** — Two paths depending on whether filters are applied:
-Content hashes prevent re-exporting the same content.
+- *No filters* → delegates to QMD's native `recallMemories()`: FTS + vector
+ + Reciprocal Rank Fusion + session dedup. Full hybrid pipeline.
+- *With filters* → filtered FTS search + session dedup. Vector search is
+ currently bypassed when filters are active. (This is a known gap — see
+ [search.md](./search.md) for details.)
-### Import (`smriti sync`)
+**`smriti embed`** — builds vector embeddings for all unembedded messages.
+Required before vector search works. Runs locally via node-llama-cpp.
-Reads markdown files from `.smriti/knowledge/`, parses frontmatter and conversation, and imports via `addMessage()`. Content hashing prevents duplicate imports.
+---
## Database Schema
-### QMD Tables (not modified by Smriti)
+### QMD Tables
| Table | Purpose |
|-------|---------|
@@ -156,10 +146,26 @@ Reads markdown files from `.smriti/knowledge/`, parses frontmatter and conversat
| Table | Purpose |
|-------|---------|
-| `smriti_agents` | Agent registry (claude-code, codex, cursor) |
+| `smriti_agents` | Agent registry (claude-code, codex, cursor...) |
| `smriti_projects` | Project registry (id, filesystem path) |
| `smriti_session_meta` | Maps sessions to agents and projects |
| `smriti_categories` | Hierarchical category taxonomy |
-| `smriti_session_tags` | Category tags on sessions (with confidence) |
-| `smriti_message_tags` | Category tags on messages (with confidence) |
+| `smriti_session_tags` | Category tags on sessions (with confidence score) |
+| `smriti_message_tags` | Category tags on messages (with confidence score) |
| `smriti_shares` | Deduplication tracking for team sharing |
+
+---
+
+## Team Sharing
+
+Export (`smriti share`) converts sessions to markdown with YAML frontmatter
+and writes them to `.smriti/knowledge/`, organized by category. The YAML
+carries session ID, category, project, agent, author, and tags — enough to
+reconstruct the full metadata on import.
+
+Import (`smriti sync`) parses frontmatter, restores categories, and inserts
+via `addMessage()`. Content hashing prevents duplicate imports. The
+roundtrip is symmetric: what gets written during share is exactly what gets
+read during sync.
+
+See [team-sharing.md](./team-sharing.md) for the workflow.
diff --git a/docs/cli.md b/docs/cli.md
index 25b5071..13f1116 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -1,40 +1,82 @@
-# CLI Reference
+# Smriti CLI Reference
+
+Everything you can do with `smriti`. For the big picture, see the
+[README](../README.md).
+
+---
+
+## Global Flags
+
+```bash
+smriti --version # Print version
+smriti --help # Print command overview
+smriti help # Same as --help
+```
+
+---
+
+## Global Filters
+
+These flags work across `search`, `recall`, `list`, and `share`:
+
+| Flag | Description |
+|------|-------------|
+| `--category ` | Filter by category (e.g. `decision`, `bug/fix`) |
+| `--project ` | Filter by project ID |
+| `--agent ` | Filter by agent (`claude-code`, `codex`, `cursor`, `cline`, `copilot`) |
+| `--limit ` | Max results returned |
+| `--json` | Machine-readable JSON output |
+
+Hierarchical category filtering: `--category decision` matches `decision`,
+`decision/technical`, `decision/process`, and `decision/tooling`.
+
+---
## Ingestion
### `smriti ingest `
-Import conversations from an AI agent into Smriti's memory.
+Pull conversations from an AI agent into Smriti's memory.
-| Agent | Source | Format |
-|-------|--------|--------|
-| `claude` / `claude-code` | `~/.claude/projects/*/*.jsonl` | JSONL |
-| `codex` | `~/.codex/**/*.jsonl` | JSONL |
-| `cursor` | `.cursor/**/*.json` (requires `--project-path`) | JSON |
-| `file` / `generic` | Any file path | Chat or JSONL |
-| `all` | All known agents at once | — |
+| Agent | Source |
+|-------|--------|
+| `claude` / `claude-code` | `~/.claude/projects/*/*.jsonl` |
+| `codex` | `~/.codex/**/*.jsonl` |
+| `cline` | `~/.cline/tasks/**` |
+| `copilot` | VS Code `workspaceStorage` (auto-detected per OS) |
+| `cursor` | `.cursor/**/*.json` (requires `--project-path`) |
+| `file` / `generic` | Any file path |
+| `all` | All known agents at once |
```bash
smriti ingest claude
smriti ingest codex
+smriti ingest cline
+smriti ingest copilot
smriti ingest cursor --project-path /path/to/project
smriti ingest file ~/transcript.txt --title "Planning Session" --format chat
smriti ingest all
```
**Options:**
-- `--project-path ` — Project directory (required for Cursor)
-- `--file ` — File path (for generic ingest)
-- `--format ` — File format (default: `chat`)
-- `--title ` — Session title
-- `--session ` — Custom session ID
-- `--project ` — Assign to a project
-## Search
+| Flag | Description |
+|------|-------------|
+| `--project-path ` | Project directory (required for Cursor) |
+| `--file ` | File path (alternative to positional arg for generic ingest) |
+| `--format ` | File format (default: `chat`) |
+| `--title ` | Session title override |
+| `--session ` | Custom session ID |
+| `--project ` | Assign ingested sessions to a specific project |
+
+---
+
+## Search & Recall
### `smriti search `
-Hybrid search across all memory using BM25 full-text and vector similarity.
+Hybrid full-text + vector search across all memory. Returns ranked results
+with session and message context.
```bash
smriti search "rate limiting"
@@ -43,51 +85,59 @@ smriti search "deployment" --category decision --limit 10
smriti search "API design" --json
```
-**Options:**
-- `--category ` — Filter by category
-- `--project ` — Filter by project
-- `--agent ` — Filter by agent (`claude-code`, `codex`, `cursor`)
-- `--limit ` — Max results (default: 20)
-- `--json` — JSON output
+**Options:** All global filters apply.
+
+---
### `smriti recall `
-Smart recall: searches, deduplicates by session, and optionally synthesizes results into a coherent summary.
+Like search, but deduplicates results by session and optionally synthesizes
+them into a single coherent summary via Ollama.
```bash
smriti recall "how did we handle caching"
smriti recall "database setup" --synthesize
smriti recall "auth flow" --synthesize --model qwen3:0.5b --max-tokens 200
-smriti recall "deployment" --project api --json
+smriti recall "deployment" --category decision --project api --json
```
**Options:**
-- `--synthesize` — Synthesize results into one summary via Ollama
-- `--model ` — Ollama model for synthesis (default: `qwen3:8b-tuned`)
-- `--max-tokens ` — Max synthesis output tokens
-- All filter options from `search`
+
+| Flag | Description |
+|------|-------------|
+| `--synthesize` | Synthesize results into one summary via Ollama (requires Ollama running) |
+| `--model ` | Ollama model to use (default: `qwen3:8b-tuned`) |
+| `--max-tokens ` | Max tokens for synthesized output |
+| All global filters | `--category`, `--project`, `--agent`, `--limit`, `--json` |
+
+---
## Sessions
### `smriti list`
-List recent sessions with optional filtering.
+List recent sessions with filtering.
```bash
smriti list
smriti list --project myapp --agent claude-code
smriti list --category decision --limit 20
-smriti list --all --json
+smriti list --all
+smriti list --json
```
**Options:**
-- `--all` — Include inactive sessions
-- `--json` — JSON output
-- All filter options from `search`
+
+| Flag | Description |
+|------|-------------|
+| `--all` | Include inactive/archived sessions |
+| All global filters | `--category`, `--project`, `--agent`, `--limit`, `--json` |
+
+---
### `smriti show `
-Display all messages in a session.
+Display all messages in a session. Supports partial session IDs.
```bash
smriti show abc12345
@@ -95,15 +145,27 @@ smriti show abc12345 --limit 10
smriti show abc12345 --json
```
+**Options:**
+
+| Flag | Description |
+|------|-------------|
+| `--limit ` | Max messages to display |
+| `--json` | JSON output |
+
+---
+
### `smriti status`
-Memory statistics: session counts, message counts, agent breakdowns, project breakdowns, category distribution.
+Memory statistics: total sessions, messages, agent breakdown, project
+breakdown, category distribution.
```bash
smriti status
smriti status --json
```
+---
+
### `smriti projects`
List all registered projects.
@@ -113,11 +175,28 @@ smriti projects
smriti projects --json
```
+---
+
+## Embeddings
+
+### `smriti embed`
+
+Build vector embeddings for all unembedded messages. Required for semantic
+(vector) search to work. Runs locally via `node-llama-cpp` — no network
+calls.
+
+```bash
+smriti embed
+```
+
+---
+
## Categorization
### `smriti categorize`
-Auto-categorize uncategorized sessions using rule-based matching and optional LLM classification.
+Auto-categorize uncategorized sessions using rule-based keyword matching with
+an optional LLM fallback for ambiguous cases.
```bash
smriti categorize
@@ -126,60 +205,159 @@ smriti categorize --llm
```
**Options:**
-- `--session ` — Categorize a specific session only
-- `--llm` — Use Ollama LLM for ambiguous classifications
+
+| Flag | Description |
+|------|-------------|
+| `--session ` | Categorize a specific session only |
+| `--llm` | Enable Ollama LLM fallback for low-confidence classifications |
+
+---
### `smriti tag `
-Manually tag a session with a category.
+Manually assign a category to a session. Stored with confidence `1.0` and
+source `"manual"`.
```bash
smriti tag abc12345 decision/technical
smriti tag abc12345 bug/fix
```
+---
+
### `smriti categories`
-Show the category tree.
+Display the full category tree.
```bash
smriti categories
```
+**Default categories:**
+
+| Category | Subcategories |
+|----------|---------------|
+| `code` | `code/implementation`, `code/pattern`, `code/review`, `code/snippet` |
+| `architecture` | `architecture/design`, `architecture/decision`, `architecture/tradeoff` |
+| `bug` | `bug/report`, `bug/fix`, `bug/investigation` |
+| `feature` | `feature/requirement`, `feature/design`, `feature/implementation` |
+| `project` | `project/setup`, `project/config`, `project/dependency` |
+| `decision` | `decision/technical`, `decision/process`, `decision/tooling` |
+| `topic` | `topic/learning`, `topic/explanation`, `topic/comparison` |
+
+---
+
### `smriti categories add `
-Add a custom category.
+Add a custom category to the tree.
```bash
-smriti categories add infra/monitoring --name "Monitoring" --parent infra --description "Monitoring and observability"
+smriti categories add ops --name "Operations"
+smriti categories add ops/incident --name "Incident Response" --parent ops
+smriti categories add ops/runbook --name "Runbooks" --parent ops --description "Operational runbook sessions"
```
-## Embeddings
+**Options:**
-### `smriti embed`
+| Flag | Description |
+|------|-------------|
+| `--name ` | Display name (required) |
+| `--parent ` | Parent category ID |
+| `--description ` | Optional description |
-Build vector embeddings for all unembedded messages. Required for semantic search.
+---
+
+## Context & Compare
+
+### `smriti context`
+
+Generate a compact project summary (~200–300 tokens) and write it to
+`.smriti/CLAUDE.md`. Claude Code auto-discovers this file at session start.
+
+Runs entirely from SQL — no Ollama, no network, no model loading. Typically
+completes in under 100ms.
```bash
-smriti embed
+smriti context
+smriti context --dry-run
+smriti context --project myapp
+smriti context --days 14
+smriti context --json
```
+**Options:**
+
+| Flag | Description |
+|------|-------------|
+| `--project ` | Project to generate context for (auto-detected from `cwd` if omitted) |
+| `--days ` | Lookback window in days (default: `7`) |
+| `--dry-run` | Print output to stdout without writing the file |
+| `--json` | JSON output |
+
+---
+
+### `smriti compare `
+
+Compare two sessions across turns, tokens, tool calls, and file reads. Useful
+for A/B testing context injection impact.
+
+```bash
+smriti compare abc123 def456
+smriti compare --last
+smriti compare --last --project myapp
+smriti compare --last --json
+```
+
+**Options:**
+
+| Flag | Description |
+|------|-------------|
+| `--last` | Compare the two most recent sessions (for current project) |
+| `--project ` | Project scope for `--last` |
+| `--json` | JSON output |
+
+Partial session IDs are supported (first 7+ characters).
+
+---
+
## Team Sharing
### `smriti share`
-Export sessions as markdown files to a `.smriti/` directory for git-based sharing.
+Export sessions as clean markdown files to `.smriti/knowledge/` for
+git-based team sharing. Generates LLM reflections via Ollama by default.
+Also writes `.smriti/CLAUDE.md` so Claude Code auto-discovers shared
+knowledge.
```bash
smriti share --project myapp
smriti share --category decision
smriti share --session abc12345
smriti share --output /custom/path
+smriti share --no-reflect
+smriti share --reflect-model llama3.2
+smriti share --segmented --min-relevance 7
```
+**Options:**
+
+| Flag | Description |
+|------|-------------|
+| `--project ` | Export sessions for a specific project |
+| `--category ` | Export only sessions with this category |
+| `--session ` | Export a single session |
+| `--output ` | Custom output directory (default: `.smriti/`) |
+| `--no-reflect` | Skip LLM reflections (reflections are on by default) |
+| `--reflect-model ` | Ollama model for reflections |
+| `--segmented` | Use 3-stage segmentation pipeline — beta |
+| `--min-relevance ` | Relevance threshold for segmented mode (default: `6`) |
+
+---
+
### `smriti sync`
-Import team knowledge from a `.smriti/` directory.
+Import team knowledge from a `.smriti/knowledge/` directory into local
+memory. Deduplicates by content hash — same content won't import twice.
```bash
smriti sync
@@ -187,10 +365,32 @@ smriti sync --project myapp
smriti sync --input /custom/path
```
+**Options:**
+
+| Flag | Description |
+|------|-------------|
+| `--project ` | Scope sync to a specific project |
+| `--input ` | Custom input directory (default: `.smriti/`) |
+
+---
+
### `smriti team`
-View team contributions (authors, counts, categories).
+View team contributions: authors, session counts, and category breakdown.
```bash
smriti team
```
+
+---
+
+## Maintenance
+
+### `smriti upgrade`
+
+Pull the latest version from GitHub and reinstall dependencies. Equivalent to
+re-running the install script.
+
+```bash
+smriti upgrade
+```
diff --git a/docs/configuration.md b/docs/configuration.md
index 713e8c9..0941099 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1,81 +1,113 @@
# Configuration
-Smriti uses environment variables for configuration. Bun auto-loads `.env` files, so you can set these in a `.env.local` file in the smriti directory.
+Smriti uses environment variables for configuration. Bun auto-loads `.env`
+files, so you can put these in `~/.smriti/.env` and they'll be picked up
+automatically — no need to set them in your shell profile.
+
+Most people never need to touch these. The defaults work. The ones you're
+most likely to change are `SMRITI_PROJECTS_ROOT` (to match where your
+projects actually live) and `QMD_MEMORY_MODEL` (if you want a lighter Ollama
+model).
+
+---
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
-| `QMD_DB_PATH` | `~/.cache/qmd/index.sqlite` | Path to the shared SQLite database |
-| `CLAUDE_LOGS_DIR` | `~/.claude/projects` | Claude Code session logs directory |
-| `CODEX_LOGS_DIR` | `~/.codex` | Codex CLI session logs directory |
-| `SMRITI_PROJECTS_ROOT` | `~/zero8.dev` | Root directory for project detection |
+| `QMD_DB_PATH` | `~/.cache/qmd/index.sqlite` | SQLite database path |
+| `CLAUDE_LOGS_DIR` | `~/.claude/projects` | Claude Code session logs |
+| `CODEX_LOGS_DIR` | `~/.codex` | Codex CLI session logs |
+| `CLINE_LOGS_DIR` | `~/.cline/tasks` | Cline CLI tasks |
+| `COPILOT_STORAGE_DIR` | auto-detected per OS | VS Code workspaceStorage root |
+| `SMRITI_PROJECTS_ROOT` | `~/zero8.dev` | Root for project ID derivation |
| `OLLAMA_HOST` | `http://127.0.0.1:11434` | Ollama API endpoint |
-| `QMD_MEMORY_MODEL` | `qwen3:8b-tuned` | Ollama model for synthesis/summarization |
-| `SMRITI_CLASSIFY_THRESHOLD` | `0.5` | Confidence below which LLM classification triggers |
+| `QMD_MEMORY_MODEL` | `qwen3:8b-tuned` | Ollama model for synthesis |
+| `SMRITI_CLASSIFY_THRESHOLD` | `0.5` | LLM classification trigger threshold |
| `SMRITI_AUTHOR` | `$USER` | Author name for team sharing |
+| `SMRITI_DAEMON_DEBOUNCE_MS` | `30000` | File-stability wait before auto-ingest |
+
+---
## Projects Root
-The `SMRITI_PROJECTS_ROOT` variable controls how Smriti derives project IDs from Claude Code session paths.
+`SMRITI_PROJECTS_ROOT` is the most commonly changed setting. It controls how
+Smriti derives clean project IDs from Claude Code session paths.
-Claude Code encodes project paths in directory names like `-Users-zero8-zero8.dev-openfga`. Smriti reconstructs the real path and strips the projects root prefix:
+Claude Code encodes project paths into directory names like
+`-Users-zero8-zero8.dev-openfga`. Smriti reconstructs the real filesystem
+path and strips the projects root prefix to produce a readable ID:
-| Claude Dir Name | Derived Project ID |
-|----------------|-------------------|
-| `-Users-zero8-zero8.dev-openfga` | `openfga` |
-| `-Users-zero8-zero8.dev-avkash-regulation-hub` | `avkash/regulation-hub` |
-| `-Users-zero8-zero8.dev` | `zero8.dev` |
+| Claude dir name | Projects root | Derived ID |
+|-----------------|---------------|------------|
+| `-Users-zero8-zero8.dev-openfga` | `~/zero8.dev` | `openfga` |
+| `-Users-zero8-zero8.dev-avkash-regulation-hub` | `~/zero8.dev` | `avkash/regulation-hub` |
+| `-Users-alice-code-myapp` | `~/code` | `myapp` |
-To change the projects root:
+If your projects live under `~/code` instead of `~/zero8.dev`:
```bash
-export SMRITI_PROJECTS_ROOT="$HOME/projects"
+export SMRITI_PROJECTS_ROOT="$HOME/code"
```
+---
+
## Database Location
-By default, Smriti shares QMD's database at `~/.cache/qmd/index.sqlite`. This means your QMD document search and Smriti memory search share the same vector index — no duplication.
+By default, Smriti shares QMD's database at `~/.cache/qmd/index.sqlite`.
+This means QMD document search and Smriti memory search share the same vector
+index — one embedding store, no duplication.
-To use a separate database:
+To keep them separate:
```bash
export QMD_DB_PATH="$HOME/.cache/smriti/memory.sqlite"
```
+---
+
## Ollama Setup
-Ollama is optional. It's used for:
-- `smriti recall --synthesize` — Synthesize recalled context into a summary
-- `smriti categorize --llm` — LLM-assisted categorization
+Ollama is optional. Everything core — ingestion, search, recall, sharing —
+works without it. Ollama only powers the features that require a language
+model:
+
+- `smriti recall --synthesize` — Compress recalled context into a summary
+- `smriti share` — Generate session reflections (skip with `--no-reflect`)
+- `smriti categorize --llm` — LLM fallback for ambiguous categorization
-Install and start Ollama:
+Install and start:
```bash
-# Install (macOS)
+# macOS
brew install ollama
-
-# Start the server
ollama serve
# Pull the default model
ollama pull qwen3:8b-tuned
```
-To use a different model:
+The default model (`qwen3:8b-tuned`) is good but large (~4.7GB). For a
+lighter option:
```bash
-export QMD_MEMORY_MODEL="mistral:7b"
+export QMD_MEMORY_MODEL="qwen3:0.5b"
+ollama pull qwen3:0.5b
```
-## Claude Code Hook
+To point at a remote Ollama instance:
+
+```bash
+export OLLAMA_HOST="http://192.168.1.100:11434"
+```
-The install script sets up an auto-save hook at `~/.claude/hooks/save-memory.sh`. This requires:
+---
-- **jq** — for parsing the hook's JSON input
-- **Claude Code** — must be installed with hooks support
+## Claude Code Hook
-The hook is configured in `~/.claude/settings.json`:
+The install script creates `~/.claude/hooks/save-memory.sh` and registers it
+in `~/.claude/settings.json`. This is what captures sessions automatically
+when you end a Claude Code conversation.
```json
{
@@ -86,7 +118,7 @@ The hook is configured in `~/.claude/settings.json`:
"hooks": [
{
"type": "command",
- "command": "/path/to/.claude/hooks/save-memory.sh",
+ "command": "/Users/you/.claude/hooks/save-memory.sh",
"timeout": 30,
"async": true
}
@@ -97,4 +129,8 @@ The hook is configured in `~/.claude/settings.json`:
}
```
-To disable the hook, remove the entry from `settings.json` or set `SMRITI_NO_HOOK=1` during install.
+**Requires `jq`** — the hook parses JSON input from Claude Code. Install with
+`brew install jq` or `apt install jq`.
+
+To disable: remove the entry from `settings.json`. To skip hook setup during
+install, set `SMRITI_NO_HOOK=1` before running the installer.
diff --git a/docs/getting-started.md b/docs/getting-started.md
index a4fd4bc..8d328d3 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -1,17 +1,30 @@
# Getting Started
+You're about to give your AI agents memory.
+
+By the end of this guide, your Claude Code sessions will be automatically
+saved, searchable, and shareable — across sessions, across days, across your
+team.
+
+---
+
## Install
+**macOS / Linux:**
+
```bash
curl -fsSL https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.sh | bash
```
-The installer will:
-1. Check for (and install) [Bun](https://bun.sh)
-2. Clone Smriti to `~/.smriti`
-3. Install dependencies
-4. Create the `smriti` CLI at `~/.local/bin/smriti`
-5. Set up the Claude Code auto-save hook
+**Windows** (PowerShell):
+
+```powershell
+irm https://raw.githubusercontent.com/zero8dotdev/smriti/main/install.ps1 | iex
+```
+
+The installer: checks for Bun (installs it if missing) → clones Smriti to
+`~/.smriti` → creates the `smriti` CLI → sets up the Claude Code auto-save
+hook.
### Verify
@@ -22,63 +35,104 @@ smriti help
If `smriti` is not found, add `~/.local/bin` to your PATH:
```bash
-echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
-source ~/.zshrc
+echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc
```
+---
+
## First Run
-### 1. Ingest your Claude Code conversations
+### 1. Pull in your Claude Code sessions
```bash
smriti ingest claude
```
-This scans `~/.claude/projects/` for all session transcripts and imports them.
+Scans `~/.claude/projects/` and imports every conversation. On first run this
+might take a moment if you've been coding with Claude for a while.
-### 2. Check what was imported
+### 2. See what you have
```bash
smriti status
```
-Output shows session count, message count, and per-agent/per-project breakdowns.
+Session counts, message counts, breakdown by project and agent. This is your
+memory — everything Smriti knows about your past work.
-### 3. Search your memory
+### 3. Search it
```bash
smriti search "authentication"
```
+Keyword search across every session. Try something you remember working
+through in a past conversation.
+
### 4. Recall with context
```bash
smriti recall "how did we set up the database"
```
-This searches, deduplicates by session, and returns the most relevant snippets.
+Like search, but smarter — deduplicates by session and surfaces the most
+relevant snippets. Add `--synthesize` to compress results into a single
+coherent summary (requires Ollama).
-### 5. Build embeddings for semantic search
+### 5. Turn on semantic search
```bash
smriti embed
```
-After embedding, searches find semantically similar content — not just keyword matches.
+Builds vector embeddings locally. After this, searches find semantically
+similar content — not just keyword matches. "auth flow" starts surfacing
+results that talk about "login mechanism."
-## Auto-Save (Claude Code)
+---
-If the installer set up the hook, every Claude Code conversation is saved automatically. No action needed — just code as usual.
+## Auto-Save
-To verify the hook is active:
+If the install completed cleanly, you're done — every Claude Code session is
+saved automatically when you end it. No manual step, no copy-pasting.
+
+Verify the hook is active:
```bash
cat ~/.claude/settings.json | grep save-memory
```
+If the hook isn't there, re-run the installer or set it up manually — see
+[Configuration](./configuration.md#claude-code-hook).
+
+---
+
+## Share with Your Team
+
+Once you've built up memory, share the useful parts through git:
+
+```bash
+smriti share --project myapp --category decision
+cd ~/projects/myapp
+git add .smriti/ && git commit -m "Share auth migration decisions"
+git push
+```
+
+A teammate imports it:
+
+```bash
+git pull && smriti sync --project myapp
+smriti recall "auth migration" --project myapp
+```
+
+Their agent now has your context. See [Team Sharing](./team-sharing.md) for
+the full guide.
+
+---
+
## Next Steps
-- [CLI Reference](./cli.md) — All commands and options
-- [Team Sharing](./team-sharing.md) — Share knowledge via git
-- [Configuration](./configuration.md) — Environment variables and customization
+- [CLI Reference](./cli.md) — Every command and option
+- [Team Sharing](./team-sharing.md) — Share knowledge through git
+- [Configuration](./configuration.md) — Customize paths, models, and behavior
- [Architecture](./architecture.md) — How Smriti works under the hood
diff --git a/docs/CI_HARDENING_EXECUTION_PLAN.md b/docs/internal/ci-hardening.md
similarity index 100%
rename from docs/CI_HARDENING_EXECUTION_PLAN.md
rename to docs/internal/ci-hardening.md
diff --git a/docs/DESIGN.md b/docs/internal/design.md
similarity index 100%
rename from docs/DESIGN.md
rename to docs/internal/design.md
diff --git a/docs/e2e-dev-release-flow-test.md b/docs/internal/e2e-release-flow.md
similarity index 100%
rename from docs/e2e-dev-release-flow-test.md
rename to docs/internal/e2e-release-flow.md
diff --git a/docs/search-recall-architecture.md b/docs/internal/search-analysis.md
similarity index 100%
rename from docs/search-recall-architecture.md
rename to docs/internal/search-analysis.md
diff --git a/docs/website.md b/docs/internal/website.md
similarity index 100%
rename from docs/website.md
rename to docs/internal/website.md
diff --git a/docs/WORKFLOW_AUTOMATION.md b/docs/internal/workflow-automation.md
similarity index 100%
rename from docs/WORKFLOW_AUTOMATION.md
rename to docs/internal/workflow-automation.md
diff --git a/docs/search.md b/docs/search.md
new file mode 100644
index 0000000..d09204c
--- /dev/null
+++ b/docs/search.md
@@ -0,0 +1,134 @@
+# Search & Recall
+
+Smriti has two ways to retrieve memory: `search` and `recall`. They use
+different retrieval strategies and are optimized for different situations.
+
+---
+
+## search vs recall
+
+| | `smriti search` | `smriti recall` |
+|--|-----------------|-----------------|
+| **Retrieval** | Full-text (BM25) | Full-text + vector (hybrid) |
+| **Deduplication** | None — all matching messages | One best result per session |
+| **Synthesis** | No | Yes, with `--synthesize` |
+| **Best for** | Finding specific text, scanning results | Getting context before starting work |
+
+Use **search** when you know roughly what you're looking for and want to scan
+results. Use **recall** when you want the most relevant context from your
+history, deduplicated and optionally compressed.
+
+---
+
+## How Search Works
+
+`smriti search` runs a BM25 full-text query against every ingested message.
+It's fast, synchronous, and returns ranked results immediately — no model
+loading.
+
+```bash
+smriti search "rate limiting"
+smriti search "auth" --project myapp --agent claude-code
+smriti search "deployment" --category decision --limit 10
+```
+
+Filters (`--project`, `--category`, `--agent`) narrow results with SQL JOINs
+against Smriti's metadata tables. They compose — all filters apply together.
+
+---
+
+## How Recall Works
+
+`smriti recall` goes further. It runs full-text search, deduplicates results
+so you get at most one snippet per session (the highest-scoring one), and
+optionally synthesizes everything into a single coherent summary.
+
+```bash
+smriti recall "how did we handle rate limiting"
+smriti recall "database setup" --synthesize
+smriti recall "auth flow" --synthesize --model qwen3:0.5b
+```
+
+**Without filters:** recall uses QMD's full hybrid pipeline — BM25 +
+vector embeddings + Reciprocal Rank Fusion. Semantic matches work here: "auth
+flow" can surface results that talk about "login mechanism."
+
+**With filters:** recall currently uses full-text search only. The hybrid
+pipeline is bypassed when `--project`, `--category`, or `--agent` is applied.
+This is a known limitation — filtered recall loses semantic matching. It's
+on the roadmap to fix.
+
+---
+
+## Synthesis
+
+`--synthesize` sends the recalled context to Ollama and asks it to produce a
+single coherent summary. This is the difference between getting 10 raw
+snippets and getting a paragraph that distills what matters.
+
+```bash
+smriti recall "connection pooling decisions" --synthesize
+```
+
+Requires Ollama running locally. See [Configuration](./configuration.md#ollama-setup)
+for setup. Use `--model` to pick a lighter model if the default is too slow.
+
+---
+
+## Vector Search
+
+Vector search finds semantically similar content — results that mean the same
+thing even if they don't share the same words. It requires embeddings to be
+built first:
+
+```bash
+smriti embed
+```
+
+This runs locally via node-llama-cpp and EmbeddingGemma. It can take a few
+minutes on a large history, but only processes new messages — subsequent runs
+are fast.
+
+Once embeddings exist, unfiltered `smriti recall` automatically uses the full
+hybrid pipeline (BM25 + vector + RRF). Filtered recall and `smriti search`
+currently use BM25 only.
+
+---
+
+## Filtering
+
+All filters compose and work across both commands:
+
+```bash
+# Scope to a project
+smriti recall "auth" --project myapp
+
+# Scope to a specific agent
+smriti search "deployment" --agent cursor
+
+# Scope to a category
+smriti recall "why did we choose postgres" --category decision
+
+# Combine them
+smriti search "migration" --project api --category decision --limit 5
+```
+
+Category filtering is hierarchical — `--category decision` matches
+`decision`, `decision/technical`, `decision/process`, and
+`decision/tooling`.
+
+---
+
+## Token Compression
+
+The point of recall isn't just finding relevant content — it's making that
+content usable in a new session without blowing up the context window.
+
+| Scenario | Raw | Via Smriti | Reduction |
+|----------|-----|------------|-----------|
+| Relevant context from past sessions | ~20,000 tokens | ~500 tokens | **40x** |
+| Multi-session recall + synthesis | ~10,000 tokens | ~200 tokens | **50x** |
+| Full project conversation history | 50,000+ tokens | ~500 tokens | **100x** |
+
+That's what `--synthesize` is for — not a summary for you to read, but
+compressed context for your next agent session to start with.
diff --git a/docs/team-sharing.md b/docs/team-sharing.md
index 22c81ba..e9228fa 100644
--- a/docs/team-sharing.md
+++ b/docs/team-sharing.md
@@ -1,60 +1,107 @@
# Team Sharing
-Smriti's team sharing works through git — no cloud service, no accounts, no sync infrastructure.
+When you work through something hard with an AI agent — a tricky migration,
+an architectural decision, a bug that took three hours to trace — that
+knowledge shouldn't stay locked in your chat history. Smriti lets you export
+it, commit it to git, and make it available to every agent your team uses.
-## How It Works
+No cloud service. No accounts. No sync infrastructure. Just git.
-1. **Export** knowledge from your local memory to a `.smriti/` directory
-2. **Commit** the `.smriti/` directory to your project repo
-3. **Teammates pull** and import the shared knowledge into their local memory
+---
+
+## The Flow
+
+1. **Export** — `smriti share` converts your sessions into clean markdown
+ files and writes them to `.smriti/knowledge/` in your project directory
+2. **Commit** — you push `.smriti/` to your project repo like any other file
+3. **Import** — teammates run `smriti sync` to pull the knowledge into their
+ local memory
+4. **Recall** — any agent on the team can now recall that context
+
+---
+
+## End-to-End Example
+
+**Alice finishes a productive session on auth:**
+
+```bash
+smriti share --project myapp --category decision
-The `.smriti/` directory lives inside your project repo alongside your code.
+cd ~/projects/myapp
+git add .smriti/
+git commit -m "Share auth migration decisions"
+git push
+```
+
+**Bob starts a new session the next morning:**
+
+```bash
+cd ~/projects/myapp
+git pull
+smriti sync --project myapp
+
+smriti recall "auth migration" --project myapp
+```
+
+Bob's agent now has Alice's full context — the decisions made, the approaches
+considered and rejected, the trade-offs. Alice didn't have to explain
+anything. Bob didn't have to ask.
+
+---
## Exporting Knowledge
-### Share by project
+### By project
```bash
smriti share --project myapp
```
-This exports all sessions tagged with project `myapp` to the project's `.smriti/knowledge/` directory.
+Exports all sessions tagged to that project.
-### Share by category
+### By category
```bash
smriti share --category decision
smriti share --category architecture/design
```
-### Share a specific session
+Export only what matters. Decision sessions tend to have the highest
+signal — they capture the *why* behind code choices, not just the *what*.
+
+### A single session
```bash
smriti share --session abc12345
```
-### Custom output directory
+### Options
-```bash
-smriti share --project myapp --output /path/to/.smriti
-```
+| Flag | Description |
+|------|-------------|
+| `--no-reflect` | Skip LLM session reflections (on by default — requires Ollama) |
+| `--reflect-model ` | Ollama model for reflections |
+| `--output ` | Custom output directory |
+| `--segmented` | 3-stage segmentation pipeline (beta) |
+
+---
-## Output Format
+## What Gets Exported
```
.smriti/
-├── config.json # Sharing configuration
-├── index.json # Manifest of all shared files
+├── config.json
+├── index.json
└── knowledge/
├── decision/
│ └── 2026-02-10_auth-migration-approach.md
- ├── bug-fix/
+ ├── bug/
│ └── 2026-02-09_connection-pool-fix.md
└── uncategorized/
└── 2026-02-08_initial-setup.md
```
-Each knowledge file is markdown with YAML frontmatter:
+Each file is clean markdown with YAML frontmatter:
```markdown
---
@@ -69,31 +116,36 @@ tags: ["decision", "decision/technical"]
# Auth migration approach
-> Summary of the session if available
+> Generated reflection: Alice and the agent decided on a phased migration
+> approach, starting with read-path only to reduce risk...
**user**: How should we handle the auth migration?
**assistant**: I'd recommend a phased approach...
```
-## Importing Knowledge
+The reflection at the top is generated by Ollama — a short synthesis of what
+was decided and why. Use `--no-reflect` to skip it.
-When a teammate has shared knowledge:
+---
+
+## Importing Knowledge
```bash
-git pull # Get the latest .smriti/ files
-smriti sync --project myapp # Import into local memory
+git pull
+smriti sync --project myapp
```
-Or import from a specific directory:
+Content is hashed before import — the same session imported twice creates no
+duplicates. Run `smriti sync` as often as you like.
+
+Import from a specific directory:
```bash
smriti sync --input /path/to/.smriti
```
-### Deduplication
-
-Content is hashed before import. If the same knowledge has already been imported, it's skipped automatically. You can safely run `smriti sync` repeatedly.
+---
## Viewing Contributions
@@ -101,61 +153,31 @@ Content is hashed before import. If the same knowledge has already been imported
smriti team
```
-Shows who has shared what:
-
```
Author Shared Categories Latest
alice 12 decision, bug/fix 2026-02-10
bob 8 architecture, code 2026-02-09
```
-## Git Integration
-
-Add `.smriti/` to your repo:
-
-```bash
-cd /path/to/myapp
-git add .smriti/
-git commit -m "Share auth migration knowledge"
-git push
-```
-
-### `.gitignore` Recommendations
-
-The `config.json` and `index.json` should be committed. If you want to be selective:
-
-```gitignore
-# Commit everything in .smriti/
-!.smriti/
-```
-
-## Workflow Example
+---
-### Alice (shares knowledge)
+## Claude Code Auto-Discovery
-```bash
-# Alice had a productive session about auth
-smriti share --project myapp --category decision
+When you run `smriti share`, it writes a `.smriti/CLAUDE.md` index file.
+Claude Code auto-discovers this at the start of every session — giving it
+immediate awareness of your team's shared knowledge without any manual
+prompting.
-# Commit to the project repo
-cd ~/projects/myapp
-git add .smriti/
-git commit -m "Share auth migration decisions"
-git push
-```
-
-### Bob (imports knowledge)
+---
-```bash
-# Bob pulls the latest
-cd ~/projects/myapp
-git pull
+## Notes
-# Import Alice's shared knowledge
-smriti sync --project myapp
+**Categories survive the roundtrip.** The category a session was tagged with
+on one machine is the category it's indexed under on every machine that syncs
+it — no reclassification, no loss.
-# Now Bob can recall Alice's context
-smriti recall "auth migration" --project myapp
-```
+**Only the primary category is restored on sync.** If a session had multiple
+tags, only the primary one survives. Known limitation.
-Bob's AI agent now has access to Alice's decisions without Alice needing to explain anything.
+**You control what gets shared.** Nothing is exported unless you explicitly
+run `smriti share`. Your local memory stays local until you decide otherwise.
From b9b93ce0d7b1828b07cffb47d2eb5bc96d17f21c Mon Sep 17 00:00:00 2001
From: Baseline User
Date: Mon, 2 Mar 2026 14:14:59 +0530
Subject: [PATCH 3/4] =?UTF-8?q?chore:=20clean=20up=20project=20root=20?=
=?UTF-8?q?=E2=80=94=20move=20docs=20to=20docs/internal/,=20remove=20clutt?=
=?UTF-8?q?er?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move 9 root-level docs to docs/internal/ with kebab-case rename:
IMPLEMENTATION.md → docs/internal/implementation.md
IMPLEMENTATION_CHECKLIST.md → docs/internal/implementation-checklist.md
PHASE1_IMPLEMENTATION.md → docs/internal/phase1-implementation.md
INGEST_ARCHITECTURE.md → docs/internal/ingest-architecture.md
DEMO_RESULTS.md → docs/internal/demo-results.md
RULES_QUICK_REFERENCE.md → docs/internal/rules-quick-reference.md
QUICKSTART.md → docs/internal/segmentation-quickstart.md
majestic-sauteeing-papert.md → docs/internal/qmd-deep-dive.md
streamed-humming-curry.md → docs/internal/ingest-refactoring.md
Remove:
issues.json — GitHub issues export, not source or documentation
zsh-autosuggestions/, zsh-syntax-highlighting/ — personal zsh plugins,
unrelated to the project (already gitignored)
Update references to INGEST_ARCHITECTURE.md in README.md and CLAUDE.md.
Project root now contains only what belongs there: README, LICENSE,
CHANGELOG, CLAUDE.md, package.json, install/uninstall scripts, and
source directories.
---
CLAUDE.md | 2 +-
README.md | 2 +-
DEMO_RESULTS.md => docs/internal/demo-results.md | 0
.../internal/implementation-checklist.md | 0
IMPLEMENTATION.md => docs/internal/implementation.md | 0
INGEST_ARCHITECTURE.md => docs/internal/ingest-architecture.md | 0
.../internal/ingest-refactoring.md | 0
.../internal/phase1-implementation.md | 0
majestic-sauteeing-papert.md => docs/internal/qmd-deep-dive.md | 0
.../internal/rules-quick-reference.md | 0
QUICKSTART.md => docs/internal/segmentation-quickstart.md | 0
issues.json | 1 -
12 files changed, 2 insertions(+), 3 deletions(-)
rename DEMO_RESULTS.md => docs/internal/demo-results.md (100%)
rename IMPLEMENTATION_CHECKLIST.md => docs/internal/implementation-checklist.md (100%)
rename IMPLEMENTATION.md => docs/internal/implementation.md (100%)
rename INGEST_ARCHITECTURE.md => docs/internal/ingest-architecture.md (100%)
rename streamed-humming-curry.md => docs/internal/ingest-refactoring.md (100%)
rename PHASE1_IMPLEMENTATION.md => docs/internal/phase1-implementation.md (100%)
rename majestic-sauteeing-papert.md => docs/internal/qmd-deep-dive.md (100%)
rename RULES_QUICK_REFERENCE.md => docs/internal/rules-quick-reference.md (100%)
rename QUICKSTART.md => docs/internal/segmentation-quickstart.md (100%)
delete mode 100644 issues.json
diff --git a/CLAUDE.md b/CLAUDE.md
index 39aa414..9627a49 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -104,7 +104,7 @@ get a clean name like `openfga`.
4. Store message/meta/sidecars/costs (store gateway)
5. Aggregate results and continue on per-session errors (orchestrator)
-See `INGEST_ARCHITECTURE.md` for details.
+See `docs/internal/ingest-architecture.md` for details.
### Search
diff --git a/README.md b/README.md
index ad3ee91..b8515c7 100644
--- a/README.md
+++ b/README.md
@@ -299,7 +299,7 @@ works cross-project by default, scoped with `--project `.
git-native today. Issue tracker integrations are on the roadmap.
**Further reading:** See [docs/cli.md](./docs/cli.md) for the full command
-reference, [INGEST_ARCHITECTURE.md](./INGEST_ARCHITECTURE.md) for the ingestion
+reference, [docs/internal/ingest-architecture.md](./docs/internal/ingest-architecture.md) for the ingestion
pipeline, and [CLAUDE.md](./CLAUDE.md) for the database schema and
architecture.
diff --git a/DEMO_RESULTS.md b/docs/internal/demo-results.md
similarity index 100%
rename from DEMO_RESULTS.md
rename to docs/internal/demo-results.md
diff --git a/IMPLEMENTATION_CHECKLIST.md b/docs/internal/implementation-checklist.md
similarity index 100%
rename from IMPLEMENTATION_CHECKLIST.md
rename to docs/internal/implementation-checklist.md
diff --git a/IMPLEMENTATION.md b/docs/internal/implementation.md
similarity index 100%
rename from IMPLEMENTATION.md
rename to docs/internal/implementation.md
diff --git a/INGEST_ARCHITECTURE.md b/docs/internal/ingest-architecture.md
similarity index 100%
rename from INGEST_ARCHITECTURE.md
rename to docs/internal/ingest-architecture.md
diff --git a/streamed-humming-curry.md b/docs/internal/ingest-refactoring.md
similarity index 100%
rename from streamed-humming-curry.md
rename to docs/internal/ingest-refactoring.md
diff --git a/PHASE1_IMPLEMENTATION.md b/docs/internal/phase1-implementation.md
similarity index 100%
rename from PHASE1_IMPLEMENTATION.md
rename to docs/internal/phase1-implementation.md
diff --git a/majestic-sauteeing-papert.md b/docs/internal/qmd-deep-dive.md
similarity index 100%
rename from majestic-sauteeing-papert.md
rename to docs/internal/qmd-deep-dive.md
diff --git a/RULES_QUICK_REFERENCE.md b/docs/internal/rules-quick-reference.md
similarity index 100%
rename from RULES_QUICK_REFERENCE.md
rename to docs/internal/rules-quick-reference.md
diff --git a/QUICKSTART.md b/docs/internal/segmentation-quickstart.md
similarity index 100%
rename from QUICKSTART.md
rename to docs/internal/segmentation-quickstart.md
diff --git a/issues.json b/issues.json
deleted file mode 100644
index 58eb634..0000000
--- a/issues.json
+++ /dev/null
@@ -1 +0,0 @@
-[{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"# Smriti: Building Intelligent Memory for AI Agents\n\n## The Problem\nWhen Claude Code, Cline, or Aider run for months, they produce 1000s of sessions. But without proper categorization, that memory is just noise. You can't find \"that time we fixed the auth bug\" or \"our decision on Redis vs Memcached\" — it's all one big undifferentiated pile of text.\n\nMost teams treat categorization as an afterthought: hardcoded regex patterns, one-size-fits-all rules, no ability to adapt.\n\n## Our Approach: Categorization as First-Class Citizen\n\nWe've built **Smriti** — a unified memory layer for AI teams that makes categorization fast, accurate, and *evolving*.\n\n### ✅ What We Just Shipped (MVP)\n\n**3-Tier Rule System** — flexible, not rigid\n- **Tier 1 (Base)**: Language-specific rules (TypeScript, Python, Rust, Go)\n- **Tier 2 (Custom)**: Project-specific tweaks (git-tracked, team-shared)\n- **Tier 3 (Runtime)**: CLI overrides for experimentation\n\n**Language Detection** — automatic, no config needed\n- Detects your tech stack from filesystem markers\n- Identifies frameworks (Next.js, FastAPI, Axum, etc.)\n- Confidence scoring to know when we're guessing\n\n**Performance**\n- <50ms to categorize a message\n- Rules cached in memory (not re-parsing YAML every time)\n- GitHub rule cache with fallback (works offline)\n\n**27 Tests, 100% Pass Rate**\n- Language detection working on 5 languages\n- 3-tier merge logic verified\n- Backward compatible — existing projects work unchanged\n\n### 🚀 What's Coming (Phase 1.5 & 2)\n\n**Next 2 weeks**:\n- [ ] Language-specific rule sets (TypeScript, Python, Rust, Go, JavaScript)\n- [ ] `smriti init` command to auto-detect & set up project rules\n- [ ] `smriti rules` CLI for teams to add/validate custom rules\n- [ ] Framework-specific rules (Next.js, FastAPI patterns)\n\n**Months ahead**:\n- [ ] Community rule repository on GitHub\n- [ ] Auto-update checking (\"new rules available for TypeScript\")\n- [ ] A/B testing framework for rule accuracy\n- [ ] Entity extraction (people, projects, errors) for richer context\n\n### 💡 Why This Matters\n\n**For solo developers**: \"Find everything we discussed about authentication\" — instant, accurate\n\n**For teams**: Shared rules in git means everyone uses the same categorization schema. Knowledge transfer, not knowledge hoarding.\n\n**For AI agents**: Agents can search categorized memory, leading to better context and fewer hallucinations.\n\n### 🎯 Design Principles\n\n✓ **Not hardcoded** — YAML rules, easy to modify \n✓ **Evolving** — add/override rules without touching code \n✓ **Language-aware** — TypeScript rules ≠ Python rules \n✓ **Offline-first** — caches GitHub rules, works offline \n✓ **Testable** — 27 tests, clear precedence rules\n\n---\n\n**Status**: MVP complete, ready for real-world testing.\n\n**Related**: Issue #18 (Technical tracking) \n**Commit**: f15c532 (Phase 1 MVP implementation)\n\n**Building memory infrastructure for the agentic era.**\n\n#AI #DevTools #Memory #Categorization #Agents\n","comments":[{"id":"IC_kwDORM6Bzs7oi3Cz","author":{"login":"pankajmaurya"},"authorAssociation":"NONE","body":"Thanks for this","createdAt":"2026-02-14T08:45:22Z","includesCreatedEdit":false,"isMinimized":false,"minimizedReason":"","reactionGroups":[{"content":"ROCKET","users":{"totalCount":1}}],"url":"https://github.com/zero8dotdev/smriti/issues/19#issuecomment-3901452467","viewerDidAuthor":false}],"createdAt":"2026-02-14T08:20:40Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH7A","name":"documentation","description":"Improvements or additions to documentation","color":"0075ca"}],"number":19,"state":"OPEN","title":"📢 Progress Writeup: Rule-Based Engine MVP Complete","updatedAt":"2026-02-14T08:45:22Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## Overview\n\nImplement a flexible 3-tier rule system for message classification, replacing hardcoded regex patterns with YAML-based rules that support language-specific and project-specific customization.\n\n## Status\n\n### ✅ Phase 1: MVP (COMPLETE)\n- [x] Language detection (TypeScript, Python, Rust, Go, JavaScript)\n- [x] Framework detection (Next.js, FastAPI, Axum, Django, Actix)\n- [x] YAML rule loader with 3-tier merge logic\n- [x] Migrated 26 hardcoded rules to general.yml\n- [x] Pattern compilation and caching\n- [x] GitHub rule fetching with database cache\n- [x] Comprehensive test coverage (27 tests passing)\n- [x] Database schema extensions\n- [x] Backward compatibility maintained\n\n**Commit**: f15c532 - \"Implement Phase 1: 3-Tier Rule-Based Engine (MVP Complete)\"\n\n### 📋 Phase 1.5: Language-Specific Rules (Next)\n- [ ] Create TypeScript-specific rule set\n- [ ] Create JavaScript-specific rule set\n- [ ] Create Python-specific rule set\n- [ ] Create Rust-specific rule set\n- [ ] Create Go-specific rule set\n- [ ] Implement `smriti init` command with auto-detection\n- [ ] Implement `smriti rules add` command\n- [ ] Implement `smriti rules validate` command\n- [ ] Implement `smriti rules list` command\n\n### 📋 Phase 2: Auto-Update & Versioning\n- [ ] Implement `smriti rules update` command\n- [ ] Auto-check for rule updates on categorize\n- [ ] Add `--no-update` flag\n- [ ] Display changelog before update\n- [ ] Version tracking in database\n\n### 📋 Phase 4+: Community\n- [ ] GitHub community rule repository\n- [ ] Community-contributed rule sets\n- [ ] Plugin marketplace integration\n\n## Architecture\n\n### 3-Tier Rule System\n```\nTier 3 (Runtime Override) ← CLI flags, programmatic\n ↓ (highest precedence)\nTier 2 (Project Custom) ← .smriti/rules/custom.yml\n ↓ (overrides base)\nTier 1 (Base) ← general.yml (GitHub or local)\n (lowest precedence)\n```\n\n## Key Files\n- `src/detect/language.ts` - Language/framework detection\n- `src/categorize/rules/loader.ts` - YAML loader + 3-tier merge\n- `src/categorize/rules/github.ts` - GitHub fetcher + cache\n- `src/categorize/rules/general.yml` - 26 general rules\n- `PHASE1_IMPLEMENTATION.md` - Technical documentation\n- `RULES_QUICK_REFERENCE.md` - Developer guide\n\n## Test Results (Phase 1)\n- ✅ 27/27 new tests passing\n- ✅ 63 assertions verified\n- ✅ All existing categorization tests still working\n\n## Performance (Phase 1)\n- Language Detection: 20-50ms\n- Rule Loading: 50-100ms (cached)\n- Classification: 2-5ms per message\n\n## Related Issues\n- None yet","comments":[],"createdAt":"2026-02-14T08:10:57Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf1zw","name":"phase-2","description":"Phase 2: New agent parsers","color":"1D76DB"}],"number":18,"state":"OPEN","title":"Rule-Based Engine: 3-Tier YAML Rule System","updatedAt":"2026-02-14T08:10:57Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## TL;DR\n\nFine-tuned [EmbeddingGemma-300M](https://huggingface.co/google/embeddinggemma-300m) — the embedding model powering QMD search — on 420 Smriti coding sessions. Generated 1,700 training triplets using Gemini 2.0 Flash, trained on a free-tier Colab T4 GPU after failing on local M3 Pro (MPS OOM). Result: **accuracy 87.3% → 91.5% (+4.2pp), margin +43% relative**. The model now understands domain terms like \"LoRA rank\", \"RRF fusion\", and \"OpenFGA\" instead of treating them as generic text.\n\n## The Idea\n\nQMD uses a generic 300M-parameter embedding model. It doesn't know what \"LoRA rank\" means, or that \"RRF\" is about search fusion, or that when you say \"auth\" you mean OpenFGA — not OAuth. `smriti recall` and `smriti search` suffer because of this vocabulary mismatch.\n\nFine-tuning on actual sessions teaches the model *our* vocabulary. We generate (query, relevant passage, hard negative) triplets from real sessions, then train the model to push relevant results closer together and irrelevant ones apart.\n\n## Timeline\n\n| When | What |\n|------|------|\n| **Feb 12, 4:44 PM** | Built the full pipeline: export sessions → generate triplets → validate → train → eval → convert GGUF. First commit [`29df52b`](https://github.com/zero8dotdev/smriti-getting-smarter/commit/29df52b). |\n| **Feb 12, evening** | Tried Ollama (`qwen3:8b`) for triplet generation. Too slow for 420 sessions — would take hours locally. |\n| **Feb 12–13** | Switched to Gemini 2.0 Flash API. Fast and cheap. Generated 2,069 raw triplets → 1,700 after validation/dedup. |\n| **Feb 13, morning** | Attempted local training on M3 Pro (18GB). OOM immediately with `seq_length: 512, batch_size: 8`. Reduced batch size, seq length, disabled fp16, switched loss function. Still OOM. |\n| **Feb 13, ~10:00 AM** | Pivoted to Google Colab (T4 GPU, 15GB VRAM, free tier) |\n| **Feb 13, 10:00–10:44 AM** | 6+ failed Colab runs. T4 OOM with initial settings. Progressively lowered seq_length (512→256→128), added gradient checkpointing, tuned mini_batch_size, set `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`. |\n| **Feb 13, 10:44 AM** | First successful training run. Commit [`6af8a2b`](https://github.com/zero8dotdev/smriti-getting-smarter/commit/6af8a2b). |\n| **Feb 13, shortly after** | Evaluation: accuracy 87.3% → 91.5%, margin +43% relative. |\n\n## What Failed & What Fixed It\n\n| Failure | Root Cause | Fix |\n|---------|-----------|-----|\n| Ollama triplet generation too slow | `qwen3:8b` running locally on CPU, 420 sessions | Switched to Gemini 2.0 Flash API |\n| MPS OOM on M3 Pro (18GB) | `seq_length: 512`, `batch_size: 8`, fp16 on MPS | Reduced to `seq_length: 256`, `batch_size: 2`, disabled fp16, added gradient accumulation |\n| Still OOM on MPS after reductions | MPS memory management fundamentally limited for training | Pivoted to Colab T4 |\n| T4 OOM on Colab (attempts 1–6) | `seq_length: 256`, no gradient checkpointing, mini_batch too large | `seq_length: 128`, gradient checkpointing, `mini_batch_size: 4`, `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` |\n\n## The Pipeline\n\n```\nsmriti DB (420 sessions)\n → export_sessions.py → sessions.jsonl (7.9 MB)\n → generate_triplets.py (Gemini 2.0 Flash) → triplets.jsonl (2,069 triplets)\n → validate_data.py → train.jsonl (1,700) + val.jsonl (165)\n → train.py (sentence-transformers + CachedMNRL loss) → fine-tuned model\n → eval.py → metrics comparison\n → convert_gguf.py → GGUF for QMD\n```\n\nEach triplet contains:\n- **Query**: 2–8 word search query (what a user would type into `smriti search`)\n- **Positive**: 50–300 word relevant passage from the session\n- **Hard negative**: A passage from the *same* conversation that's topically related but answers a different question\n\nTrain/val split is by session (not by triplet) to prevent data leakage.\n\n## Results\n\n```\n Base Model Fine-Tuned Change\nAccuracy 0.8727 0.9152 +0.0424 (+4.9%)\nMargin 0.1716 0.2452 +0.0736 (+42.9%)\nPositive Sim 0.5608 0.5226 -0.0382\nNegative Sim 0.3893 0.2774 -0.1119\n```\n\nBoth positive and negative similarity dropped, but **negative similarity dropped 3x harder** (0.39 → 0.28 vs 0.56 → 0.52). The model learned to push irrelevant results far apart while keeping relevant ones close. This is exactly what you want for retrieval — fewer false positives, cleaner separation.\n\n### Final Working Colab Config\n\n| Parameter | Value |\n|-----------|-------|\n| `max_seq_length` | 128 |\n| `per_device_train_batch_size` | 4 |\n| `gradient_accumulation_steps` | 16 (effective batch = 64) |\n| `mini_batch_size` (CachedMNRL) | 4 |\n| `num_train_epochs` | 3 |\n| `learning_rate` | 2e-5 |\n| `gradient_checkpointing` | true |\n| `fp16` | true |\n\n## What's Next\n\nThe end state isn't a separate repo — it's `smriti finetune`:\n\n- **`smriti finetune`** — Subcommand that retrains the embedding model on accumulated sessions. Run after a week of coding, on a cron, or as a post-ingest hook.\n- **`smriti finetune --incremental`** — Don't retrain from scratch. Keep the last checkpoint and continue on new sessions only. The model accumulates knowledge over time.\n- **`smriti finetune --team`** — Pull sessions from teammates via `smriti sync`, train a shared model. The team's collective vocabulary becomes the model's vocabulary.\n- **Reranker fine-tuning** — QMD uses a 0.6B reranker (Qwen3-Reranker). Same triplet data, different training objective. Would compound the embedding improvements.\n- **Automatic quality signals** — Use implicit signals from actual usage (clicked results = positive, reformulated queries = hard negatives) instead of synthetic LLM-generated triplets.\n- **Per-project adapters** — Train project-specific LoRA adapters (~8MB each) that QMD swaps based on active project.\n- **Scheduled retraining** — Weekly cron that runs `smriti finetune --incremental --deploy`. Search silently gets better every Monday.\n\n## Repo\n\nhttps://github.com/zero8dotdev/smriti-getting-smarter","comments":[],"createdAt":"2026-02-13T08:24:57Z","labels":[],"number":17,"state":"OPEN","title":"Fine-tuned EmbeddingGemma-300M on Smriti sessions — journey, results, and next steps","updatedAt":"2026-02-13T08:24:57Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## Overview\n\nAdded multi-layered secret detection system to prevent accidental credential commits and ensure repository security.\n\n## Components Implemented\n\n### 1. Local Pre-commit Hook\n- **Tool**: Gitleaks v8.18.0\n- **Trigger**: Runs on every `git commit`\n- **Config**: `.pre-commit-config.yaml` with auto-installation\n- **Status**: ✅ All tests pass\n\n### 2. Gitleaks Configuration\n- **File**: `.gitleaks.toml`\n- **Features**:\n - Detects JWTs, API keys, passwords, private keys\n - Allowlist for test/demo tokens in `.smriti/knowledge/` documentation\n - Regex patterns to ignore common test emails (@test.com, @acme.com)\n - Scans full git history\n\n### 3. GitHub Actions CI Pipeline\n- **File**: `.github/workflows/secret-scan.yml`\n- **Runs on**: Push to main/staging and all PRs\n- **Tools**:\n - Gitleaks (primary detection)\n - detect-secrets (secondary verification)\n- **Features**:\n - Automated scanning on every push\n - Comments on PRs with findings\n - Blocks merges if secrets detected\n\n### 4. Additional Hooks\nVia pre-commit framework:\n- Detect private keys in code\n- Check for merge conflicts\n- Validate YAML files\n- Prevent large file commits (>500KB)\n\n## Setup & Usage\n\n### Installation\nThe setup is automatic when developers clone the repo:\n```bash\npre-commit install # (auto-runs on first commit)\n```\n\n### Manual Scanning\n```bash\n# Scan current directory\ngitleaks detect --source . -c .gitleaks.toml\n\n# Scan git history\ngitleaks detect --source . -c .gitleaks.toml --verbose\n\n# Run all pre-commit hooks\npre-commit run --all-files\n```\n\n## Configuration Details\n\n### .gitleaks.toml\n- **Paths allowlist**: Excludes `.smriti/knowledge/` and `test/` directories\n- **Regex allowlist**: Ignores test email patterns\n- **Entropy detection**: Enabled for high-entropy strings\n\n### Pre-commit Stages\n- **Default**: Runs on commits (prevent push of secrets)\n- **CI**: GitHub Actions validate on push and PRs\n\n## Testing\n\n✅ All hooks validated:\n- Gitleaks: PASSED\n- Detect private key: PASSED \n- Merge conflict detection: PASSED\n- YAML validation: PASSED\n- File size limits: PASSED\n- Trailing whitespace: PASSED\n\nBaseline established for knowledge base files containing test tokens.\n\n## Security Benefits\n\n1. **Prevention**: Stops secrets from entering git history\n2. **Detection**: Multi-tool approach catches edge cases\n3. **Automation**: No manual intervention required\n4. **CI/CD Integration**: Repository-wide enforcement\n5. **Documentation**: Clear ignoring patterns for legitimate test data\n\n## Future Enhancements\n\n- [ ] Setup GitGuardian API integration for real-time alerts\n- [ ] Add SAST scanning (static analysis)\n- [ ] Email notifications on secret detection\n- [ ] Automated rotation of compromised credentials\n- [ ] Team policy configuration\n\n## Related\n\nImplements response to security alert about exposed credentials. Prevents similar incidents through automated scanning.","comments":[],"createdAt":"2026-02-12T05:42:37Z","labels":[],"number":16,"state":"OPEN","title":"Implement comprehensive secret scanning infrastructure","updatedAt":"2026-02-12T05:42:37Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## Overview\n\nThis branch implements a **3-stage prompt architecture** for the `smriti share` command that intelligently segments sessions into distinct knowledge units, generates category-specific documentation, and exports team knowledge to `.smriti/` directories.\n\n## Architecture Stages\n\n### Stage 1: Segment\n- **Purpose**: Analyze sessions and extract distinct knowledge units\n- **Process**: LLM analyzes session content, identifies topics, categories, and relevance scores\n- **Metadata Injection**: Tool usage, files modified, git operations, and errors are extracted and injected into prompts for better context\n- **Output**: `KnowledgeUnit[]` with categories, relevance (1-10), and entity tags\n\n### Stage 2: Document \n- **Purpose**: Generate polished markdown documentation for each unit\n- **Process**: Select category-specific templates and apply unit content\n- **Categories Supported**:\n - `bug/*` - Symptoms → Root Cause → Investigation → Fix → Prevention\n - `architecture/*` / `decision/*` - Context → Options → Decision → Consequences\n - `code/*` - Implementation → Key Decisions → Gotchas\n - `feature/*` - Requirements → Design → Implementation Notes\n - `topic/*` - Concept → Relevance → Examples → Resources\n - `project/*` - What Changed → Why → Steps → Verification\n- **Output**: Markdown files organized in `.smriti/knowledge//`\n\n### Stage 3: Defer\n- **Purpose**: Metadata enrichment (phase 2)\n- **Future**: Entity extraction, freshness detection, version tracking\n\n## Key Design Patterns\n\n1. **Graceful Degradation**: Stage 1 fails → fallback to single unit → Stage 2 still generates docs\n2. **Category Validation**: LLM suggestions validated against `smriti_categories` table\n3. **Unit-Level Deduplication**: Hash(content + category + entities) prevents re-sharing\n4. **Sequential Processing**: Units processed one-by-one (safety) not in parallel\n5. **Template Flexibility**: Checks `.smriti/prompts/` first before using built-in templates\n\n## Implementation Details\n\n### Files Created\n- `src/team/types.ts` - Type definitions\n- `src/team/segment.ts` - Stage 1 segmentation logic\n- `src/team/document.ts` - Stage 2 documentation generation\n- `src/team/prompts/stage1-segment.md` - Segmentation prompt\n- `src/team/prompts/stage2-*.md` (7 templates) - Category-specific templates\n- `test/team-segmented.test.ts` - Comprehensive test suite (14 tests)\n\n### Files Modified\n- `src/db.ts` - Extended `smriti_shares` table with `unit_id`, `relevance_score`, `entities`\n- `src/team/share.ts` - Added `shareSegmentedKnowledge()` function + flag routing\n- `src/index.ts` - Added CLI flags: `--segmented`, `--min-relevance`\n\n## Usage\n\n```bash\n# Legacy (unchanged)\nsmriti share --project myapp\n\n# New 3-stage pipeline\nsmriti share --project myapp --segmented\n\n# With custom relevance threshold (default: 6/10)\nsmriti share --project myapp --segmented --min-relevance 7\n```\n\n## Testing\n\n- 14 unit tests covering:\n - Graceful fallback logic\n - Unit validation and filtering\n - Relevance thresholding\n - Edge cases\n- All tests passing\n- Uses in-memory DB (no external dependencies)\n\n## Backward Compatibility\n\n✅ No breaking changes - legacy `smriti share` behavior unchanged. New flags are optional.\n\n## Future Phases\n\n- **Phase 2**: Entity extraction, freshness detection, tech version tracking\n- **Phase 3**: Relationship graphs, contradiction detection, `smriti conflicts` command\n\n## Related Issues\n\nRelated to discussion of knowledge organization and team sharing workflows.\n","comments":[],"createdAt":"2026-02-12T05:23:04Z","labels":[],"number":14,"state":"OPEN","title":"3-Stage Knowledge Segmentation Pipeline for smriti share","updatedAt":"2026-02-12T05:23:04Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What is this?\n\n`smriti context` generates a compact project summary (~200-300 tokens) from your session history and injects it into `.smriti/CLAUDE.md`, which Claude Code auto-discovers. The idea is that new sessions start with awareness of recent work — hot files, git activity, recent sessions — instead of re-discovering everything from scratch.\n\n**We don't know yet if this actually saves tokens.** Our initial tests show mixed results, and we need data from real projects to understand where context injection matters.\n\n## How to test\n\n### Prerequisites\n\n```bash\nsmriti ingest claude # make sure sessions are ingested\n```\n\n### Step 1: Baseline session (no context)\n\n```bash\nmv .smriti/CLAUDE.md .smriti/CLAUDE.md.bak\n```\n\nStart a new Claude Code session, give it a task, let it finish, exit.\n\n### Step 2: Context session\n\n```bash\nmv .smriti/CLAUDE.md.bak .smriti/CLAUDE.md\nsmriti context\n```\n\nStart a new Claude Code session, give the **exact same task**, let it finish, exit.\n\n### Step 3: Compare\n\n```bash\nsmriti ingest claude\nsmriti compare --last\n```\n\n## What to share\n\nPost a comment here with:\n\n1. **The task prompt** you used (same for both sessions)\n2. **The `smriti compare` output** (copy-paste the table)\n3. **Project size** — rough number of files, whether you have a detailed `CLAUDE.md` in the repo\n4. **Your observations** — did the context-aware session behave differently? Fewer exploratory reads? Better first attempt?\n\n## What we've found so far\n\n| Task Type | Context Impact | Notes |\n|-----------|---------------|-------|\n| Knowledge questions (\"how does X work?\") | Minimal | Both sessions found the right files immediately from project CLAUDE.md |\n| Implementation tasks (\"add --since flag\") | Minimal | Small, well-scoped tasks don't need exploration |\n| Ambiguous/exploration tasks | Untested | Expected sweet spot — hot files guide Claude to the right area |\n| Large codebases (no project CLAUDE.md) | Untested | Expected sweet spot — context replaces missing documentation |\n\n## Good task prompts to try\n\nThese should stress-test whether context helps:\n\n- **Ambiguous bug fix**: \"There's a bug in the search results, fix it\" (forces exploration)\n- **Cross-cutting feature**: \"Add logging to all database operations\" (needs to find all DB touchpoints)\n- **Continuation task**: \"Continue the refactoring we started yesterday\" (tests session memory)\n- **Large codebase, no CLAUDE.md**: Any implementation task on a project without a detailed CLAUDE.md\n\n## Tips\n\n- Use `smriti compare --json` for machine-readable output\n- You can compare any two sessions: `smriti compare ` (supports partial IDs)\n- Run `smriti context --dry-run` to see what context your sessions will get","comments":[],"createdAt":"2026-02-11T11:14:43Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowIDw","name":"help wanted","description":"Extra attention is needed","color":"008672"}],"number":13,"state":"OPEN","title":"Help wanted: A/B test smriti context on your projects","updatedAt":"2026-02-11T11:14:43Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\n\nTransform Smriti from flat text ingestion to a **structured, queryable memory pipeline** — where every tool call, file edit, git operation, error, and thinking block is parsed, typed, stored in sidecar tables, and available for analytics, search, and team sharing.\n\n## Why\n\nCurrently Smriti drops 80%+ of the structured data in AI coding sessions. A Claude Code transcript contains tool calls with typed inputs, file diffs, command outputs, git operations, token costs, and thinking blocks — but the flat text parser reduces all of this to a single string. This means:\n\n- **No file tracking**: Can't answer \"what files did I edit this week?\"\n- **No error analysis**: Can't find sessions where builds failed or tests broke\n- **No cost visibility**: No token/cost tracking across sessions or projects\n- **No git correlation**: Can't link sessions to commits, branches, or PRs\n- **No cross-agent view**: Different agents (Claude, Cline, Aider) can't share a unified memory\n- **No security layer**: Secrets in sessions get shared without redaction\n\nThis roadmap addresses all of these gaps across 5 phases.\n\n## Sub-Issues\n\n- #5 **[DONE]** Enriched Claude Code Parser — Structured block extraction, 13 block types, 6 sidecar tables\n- #6 Cline + Aider Agent Parsers — New agent support for unified cross-tool memory\n- #7 Auto-Ingestion Watch Daemon — `smriti watch` with fs.watch for real-time ingestion\n- #8 Enhanced Search & Analytics on Structured Data — Query sidecar tables, activity timelines, cost tracking\n- #9 Secret Redaction & Policy Engine — Detect and redact secrets before storage and sharing\n- #10 Telemetry & Metrics Collection — Local-only opt-in usage metrics\n- #11 Real User Testing & Performance Validation — Benchmarks, stress tests, security tests\n\n## Phase Overview\n\n| Phase | Deliverable | Status |\n|-------|------------|--------|\n| **Phase 1** | Enriched Claude Code Parser (#5) | **Done** — 13 block types, 6 sidecar tables, 142 tests |\n| **Phase 2** | Cline + Aider Parsers (#6) | Planned |\n| **Phase 3** | Watch Daemon (#7) + Search & Analytics (#8) | Planned |\n| **Phase 4** | Secret Redaction & Policy (#9) | Planned |\n| **Phase 5** | Telemetry (#10) + Testing & Perf (#11) | Planned |\n\n## Storage Inventory\n\nComplete map of every data type, where it lives, and whether it's indexed:\n\n| Data | Source | Table | Key Columns | Indexed? |\n|------|--------|-------|-------------|----------|\n| Session text (FTS) | All agents | `memory_fts` (QMD) | content | FTS5 full-text |\n| Session metadata | Ingestion | `smriti_session_meta` | session_id, agent_id, project_id | Yes (agent, project) |\n| Project registry | Path derivation | `smriti_projects` | id, path, description | PK |\n| Agent registry | Seed data | `smriti_agents` | id, parser, log_pattern | PK |\n| Tool usage | Block extraction | `smriti_tool_usage` | message_id, tool_name, success, duration_ms | Yes (session, tool_name) |\n| File operations | Block extraction | `smriti_file_operations` | message_id, operation, file_path, project_id | Yes (session, path) |\n| Commands | Block extraction | `smriti_commands` | message_id, command, exit_code, is_git | Yes (session, is_git) |\n| Git operations | Block extraction | `smriti_git_operations` | message_id, operation, branch, pr_url | Yes (session, operation) |\n| Errors | Block extraction | `smriti_errors` | message_id, error_type, message | Yes (session, type) |\n| Token costs | Metadata accumulation | `smriti_session_costs` | session_id, model, input/output/cache tokens, cost | PK |\n| Category tags (session) | Categorization | `smriti_session_tags` | session_id, category_id, confidence, source | Yes (category) |\n| Category tags (message) | Categorization | `smriti_message_tags` | message_id, category_id, confidence, source | Yes (category) |\n| Category taxonomy | Seed data | `smriti_categories` | id, name, parent_id | PK |\n| Share tracking | Team sharing | `smriti_shares` | session_id, content_hash, author | Yes (hash) |\n| Vector embeddings | `smriti embed` | `content_vectors` + `vectors_vec` (QMD) | content_hash, embedding | Virtual table |\n| Telemetry events | Opt-in collection | `~/.smriti/telemetry.json` | timestamp, event, data | N/A (JSONL file) |\n| Structured blocks | Block extraction | `memory_messages.metadata.blocks` (JSON) | MessageBlock[] | No (JSON blob) |\n| Message metadata | Parsing | `memory_messages.metadata` (JSON) | cwd, gitBranch, model, tokenUsage | No (JSON blob) |\n\n## Block Type Reference\n\nThe 13 `MessageBlock` types extracted during ingestion:\n\n| Block Type | Fields | Stored In |\n|-----------|--------|-----------|\n| `text` | text | FTS (via plainText) |\n| `thinking` | thinking, budgetTokens | JSON blob only |\n| `tool_call` | toolId, toolName, input | `smriti_tool_usage` |\n| `tool_result` | toolId, success, output, error, durationMs | Updates tool_usage success |\n| `file_op` | operation, path, diff, pattern | `smriti_file_operations` |\n| `command` | command, cwd, exitCode, stdout, stderr, isGit | `smriti_commands` |\n| `search` | searchType, pattern, path, url, resultCount | JSON blob only |\n| `git` | operation, branch, message, files, prUrl, prNumber | `smriti_git_operations` |\n| `error` | errorType, message, retryable | `smriti_errors` |\n| `image` | mediaType, path, dataHash | JSON blob only |\n| `code` | language, code, filePath, lineStart | JSON blob only |\n| `system_event` | eventType, data | Cost accumulation |\n| `control` | controlType, command | JSON blob only |\n\n## Real User Testing Plan\n\n| Scenario | What to Measure | Risk if Untested |\n|----------|----------------|-----------------|\n| Fresh install + first ingest | Time-to-first-search, error quality | Bad first impression, confusing errors |\n| 500+ sessions accumulated | Search latency, DB file size, `smriti status` accuracy | Performance cliff after months of use |\n| Multi-project workspace | Project ID derivation accuracy, cross-project search | Wrong project attribution for sessions |\n| Team sharing (2+ devs) | Sync conflicts, dedup accuracy, content hash stability | Duplicate or lost knowledge articles |\n| Long-running session (4+ hrs) | Memory during ingest, block count accuracy, cost tracking | OOM or missed data at end of session |\n| Rapid session creation | Watch daemon debouncing, no duplicate ingestion | Double-counting sessions |\n| Agent switch mid-task | Cross-agent file tracking, unified timeline | Gaps in activity log |\n| Secret in session | Detection rate, redaction completeness, share blocking | Leaked credentials in `.smriti/` |\n| Large JSONL file (50MB+) | Parse time, memory usage, incremental ingest | Crash or multi-minute ingest |\n| Corrupt/truncated files | Error messages, graceful skip, no data loss | Silent data corruption |\n\n## Configuration Reference\n\n| Env Var | Default | Phase | Description |\n|---------|---------|-------|-------------|\n| `QMD_DB_PATH` | `~/.cache/qmd/index.sqlite` | — | Database path |\n| `CLAUDE_LOGS_DIR` | `~/.claude/projects` | 1 | Claude Code logs |\n| `CODEX_LOGS_DIR` | `~/.codex` | — | Codex CLI logs |\n| `SMRITI_PROJECTS_ROOT` | `~/zero8.dev` | 1 | Projects root for ID derivation |\n| `OLLAMA_HOST` | `http://127.0.0.1:11434` | — | Ollama endpoint |\n| `QMD_MEMORY_MODEL` | `qwen3:8b-tuned` | — | Ollama model for synthesis |\n| `SMRITI_CLASSIFY_THRESHOLD` | `0.5` | — | LLM classification trigger |\n| `SMRITI_AUTHOR` | `$USER` | — | Git author for team sharing |\n| `SMRITI_WATCH_DEBOUNCE_MS` | `2000` | 3 | Watch daemon debounce interval |\n| `SMRITI_TELEMETRY` | `0` | 5 | Enable telemetry collection |\n\n## Current State\n\nPhase 1 is complete:\n- 13 structured block types defined in `src/ingest/types.ts`\n- Block extraction engine in `src/ingest/blocks.ts`\n- Enriched Claude parser in `src/ingest/claude.ts`\n- 6 sidecar tables in `src/db.ts` with indexes and insert helpers\n- 142 tests passing, 415 expect() calls across 9 test files","comments":[],"createdAt":"2026-02-11T10:22:11Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf3mg","name":"epic","description":"Epic / parent issue","color":"B60205"}],"number":12,"state":"OPEN","title":"Structured Memory Pipeline — Full Roadmap","updatedAt":"2026-02-11T10:22:11Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nA comprehensive testing and benchmarking plan that validates Smriti against real-world usage scenarios: large databases, concurrent access, cross-agent queries, and performance under load.\n\n## Why\nUnit tests verify correctness in isolation, but real usage involves hundreds of sessions, thousands of messages, multiple agents writing simultaneously, and databases that grow over months. We need to validate performance doesn't degrade and structured data stays consistent at scale.\n\n## Tasks\n\n### Correctness Testing\n- [ ] **Round-trip fidelity**: ingest → search → recall → share produces accurate, complete results\n- [ ] **Cross-agent dedup**: same session referenced by multiple agents doesn't create duplicates\n- [ ] **Sidecar consistency**: every tool_call block has a matching \\`smriti_tool_usage\\` row\n- [ ] **Category integrity**: hierarchical categories maintain parent-child relationships after bulk operations\n- [ ] **Share/sync round-trip**: \\`smriti share\\` → \\`smriti sync\\` on another machine restores all metadata\n\n### Performance Benchmarks\n- [ ] **Ingestion throughput**: time to ingest 100/500/1000 sessions\n- [ ] **Search latency**: FTS query time at 1k/10k/50k messages (target: < 50ms at 10k)\n- [ ] **Vector search latency**: embedding search at 1k/10k vectors (target: < 200ms at 10k)\n- [ ] **Sidecar query speed**: analytics queries on sidecar tables at scale\n- [ ] **Database size**: measure SQLite file size at 1k/10k/50k messages\n- [ ] **Memory usage**: peak RSS during ingestion of large sessions (target: < 256MB)\n- [ ] **Watch daemon overhead**: CPU/memory when idle vs during active session\n\n### Stress Testing\n- [ ] **Large session files**: JSONL files > 50MB (long coding sessions)\n- [ ] **Many small sessions**: 1000+ sessions with < 10 messages each\n- [ ] **Concurrent ingestion**: two agents writing to DB simultaneously\n- [ ] **Corrupt data handling**: malformed JSONL, truncated files, missing fields\n- [ ] **Disk space**: behavior when SQLite DB approaches filesystem limits\n\n### Security Testing\n- [ ] **Secret detection coverage**: test against curated list of real secret patterns\n- [ ] **Redaction completeness**: no secrets survive ingestion → search → share pipeline\n- [ ] **Path traversal**: crafted file paths in tool calls don't escape expected directories\n- [ ] **SQL injection**: category names, project IDs with special characters\n\n## Files\n- \\`test/benchmark.test.ts\\` — **new** Performance benchmarks\n- \\`test/stress.test.ts\\` — **new** Stress and edge case tests\n- \\`test/security.test.ts\\` — **new** Security validation tests\n- \\`test/e2e.test.ts\\` — **new** End-to-end round-trip tests\n- \\`test/fixtures/large/\\` — **new** Large synthetic test data\n- \\`scripts/generate-fixtures.ts\\` — **new** Test data generator\n\n## Acceptance Criteria\n- [ ] All correctness tests pass on a clean install\n- [ ] Ingestion throughput: ≥ 50 sessions/second\n- [ ] FTS search: < 50ms at 10k messages\n- [ ] Vector search: < 200ms at 10k vectors\n- [ ] No memory leaks during 1-hour watch daemon run\n- [ ] Zero secrets survive the full pipeline in security tests\n- [ ] Corrupt/malformed input produces clear error messages, never crashes\n\n## Real User Testing Plan\n\n| Scenario | What to Measure | Risk if Untested |\n|----------|----------------|-----------------|\n| Fresh install + first ingest | Time-to-first-search, error messages | Bad first impression |\n| 500+ sessions accumulated | Search latency, DB size, \\`smriti status\\` accuracy | Performance cliff |\n| Multi-project workspace | Project ID derivation accuracy, cross-project search | Wrong project attribution |\n| Team sharing (2+ developers) | Sync conflicts, dedup accuracy, content hash stability | Duplicate/lost knowledge |\n| Long-running session (4+ hours) | Memory during ingest, block count accuracy, cost tracking | OOM or missed data |\n| Rapid session creation | Watch daemon debouncing, no duplicate ingestion | Double-counting |\n| Agent switch mid-task | Cross-agent file operation tracking, timeline accuracy | Gaps in activity log |\n\n## Testing\n```bash\nbun test test/benchmark.test.ts # Performance benchmarks\nbun test test/stress.test.ts # Stress tests\nbun test test/security.test.ts # Security validation\nbun test test/e2e.test.ts # End-to-end round-trips\nbun run scripts/generate-fixtures.ts # Generate large test data\n```","comments":[],"createdAt":"2026-02-11T10:21:18Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf2xw","name":"phase-5","description":"Phase 5: Telemetry & validation","color":"5319E7"}],"number":11,"state":"OPEN","title":"Real User Testing & Performance Validation","updatedAt":"2026-02-11T10:21:18Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nOpt-in local telemetry that collects usage metrics to \\`~/.smriti/telemetry.json\\` — session counts, tool frequencies, search patterns, ingestion performance, and error rates. No network calls, fully local.\n\n## Why\nWithout telemetry, we're flying blind on how Smriti is actually used: which commands are popular, how large databases get, whether search is fast enough, and what errors users hit. Local-only collection respects privacy while enabling data-driven improvements.\n\n## Tasks\n- [ ] **Telemetry store**: append-only \\`~/.smriti/telemetry.json\\` (JSONL format)\n- [ ] **Automatic collection** (opt-in via \\`SMRITI_TELEMETRY=1\\` or \\`smriti telemetry --enable\\`):\n - Command invocations: which CLI commands are run, how often\n - Ingestion metrics: sessions ingested, messages processed, duration, errors\n - Search metrics: query count, result count, latency, filter usage\n - Database size: total sessions, messages, sidecar table row counts\n - Embedding metrics: vectors built, search latency\n- [ ] **\\`smriti telemetry\\`** command:\n - \\`smriti telemetry --enable\\` / \\`--disable\\` to toggle collection\n - \\`smriti telemetry --show\\` to view collected metrics\n - \\`smriti telemetry --clear\\` to delete collected data\n - \\`smriti telemetry --export\\` to dump as JSON for analysis\n- [ ] **Event structure**: \\`{ timestamp, event, data, version }\\`\n- [ ] **Rotation**: auto-rotate when file exceeds 10MB\n- [ ] **Privacy**: never collect message content, file paths, or search queries — only counts and durations\n- [ ] **Performance**: telemetry writes must not impact CLI latency (async append)\n\n## Files\n- \\`src/telemetry/collector.ts\\` — **new** Event collection and storage\n- \\`src/telemetry/events.ts\\` — **new** Event type definitions\n- \\`src/telemetry/report.ts\\` — **new** Telemetry reporting/export\n- \\`src/index.ts\\` — Add \\`telemetry\\` command, instrument existing commands\n- \\`src/config.ts\\` — Add \\`SMRITI_TELEMETRY\\` config\n- \\`test/telemetry.test.ts\\` — **new** Telemetry collection tests\n\n## Data We Collect\n\n| Metric | Example Value | Purpose |\n|--------|--------------|---------|\n| \\`command_invoked\\` | \\`{ command: \"search\", flags: [\"--agent\"] }\\` | Command popularity |\n| \\`ingest_completed\\` | \\`{ agent: \"claude-code\", sessions: 5, messages: 120, durationMs: 340 }\\` | Ingestion performance |\n| \\`search_executed\\` | \\`{ resultCount: 8, latencyMs: 12, hasFilters: true }\\` | Search performance |\n| \\`db_stats\\` | \\`{ sessions: 200, messages: 15000, toolUsage: 8500 }\\` | Database growth |\n| \\`error_occurred\\` | \\`{ command: \"ingest\", errorType: \"parse_error\" }\\` | Error tracking |\n| \\`embed_completed\\` | \\`{ vectors: 500, latencyMs: 2100 }\\` | Embedding performance |\n\n## Acceptance Criteria\n- [ ] Telemetry is off by default — requires explicit opt-in\n- [ ] \\`smriti telemetry --enable\\` starts collecting, \\`--disable\\` stops\n- [ ] \\`smriti telemetry --show\\` displays human-readable summary\n- [ ] No message content, file paths, or search queries are ever recorded\n- [ ] Telemetry writes don't add > 1ms to CLI command latency\n- [ ] File auto-rotates at 10MB\n- [ ] \\`smriti telemetry --clear\\` completely removes all collected data\n\n## Testing\n```bash\nbun test test/telemetry.test.ts # Collection + rotation tests\nSMRITI_TELEMETRY=1 smriti ingest claude # Verify metrics recorded\nsmriti telemetry --show # View collected data\nsmriti telemetry --clear # Verify deletion\n```","comments":[],"createdAt":"2026-02-11T10:21:13Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf2xw","name":"phase-5","description":"Phase 5: Telemetry & validation","color":"5319E7"}],"number":10,"state":"OPEN","title":"Telemetry & Metrics Collection","updatedAt":"2026-02-11T10:21:13Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nA configurable policy engine that detects and redacts secrets, PII, and sensitive data during ingestion and before team sharing, with configurable rules and audit logging.\n\n## Why\nAI coding sessions routinely contain API keys, database passwords, auth tokens, and internal URLs — either typed by the user or surfaced in tool outputs. Without redaction, \\`smriti share\\` could leak secrets into git-committed \\`.smriti/\\` knowledge files, and even local search results could expose credentials.\n\n## Tasks\n- [ ] **Built-in secret patterns**: AWS keys, GitHub tokens, JWT, API keys, private keys, database URLs, .env values\n- [ ] **PII detection**: email addresses, IP addresses, phone numbers (configurable)\n- [ ] **Redaction during ingestion**: scan \\`plainText\\` and block content before storage\n- [ ] **Redaction during sharing**: additional pass before \\`smriti share\\` writes to \\`.smriti/\\`\n- [ ] **Policy configuration**: \\`.smriti/policy.json\\` or env vars to customize rules\n - Enable/disable specific pattern categories\n - Add custom regex patterns\n - Allowlist specific values (e.g., public test keys)\n- [ ] **Audit log**: record what was redacted, when, in which session (without storing the secret)\n- [ ] **\\`smriti scan\\`** command: dry-run that reports potential secrets without redacting\n- [ ] **Pre-commit hook support**: \\`smriti scan --check .smriti/\\` for CI pipelines\n- [ ] **Redaction format**: \\`[REDACTED:aws-key]\\`, \\`[REDACTED:github-token]\\` — preserves context while removing value\n\n## Files\n- \\`src/policy/patterns.ts\\` — **new** Built-in secret detection patterns\n- \\`src/policy/redactor.ts\\` — **new** Redaction engine\n- \\`src/policy/config.ts\\` — **new** Policy configuration loader\n- \\`src/policy/audit.ts\\` — **new** Audit log writer\n- \\`src/ingest/claude.ts\\` — Hook redactor into ingestion pipeline\n- \\`src/team/share.ts\\` — Hook redactor into share pipeline\n- \\`src/index.ts\\` — Add \\`scan\\` command\n- \\`test/redactor.test.ts\\` — **new** Redaction tests\n- \\`test/fixtures/secrets/\\` — **new** Test fixtures with fake secrets\n\n## Acceptance Criteria\n- [ ] AWS access keys (\\`AKIA...\\`) are redacted to \\`[REDACTED:aws-key]\\` during ingestion\n- [ ] GitHub tokens (\\`ghp_\\`, \\`gho_\\`, \\`github_pat_\\`) are detected and redacted\n- [ ] \\`smriti scan\\` reports potential secrets without modifying data\n- [ ] Custom patterns in \\`.smriti/policy.json\\` are applied alongside built-ins\n- [ ] Redacted content is still searchable by surrounding context (not the secret itself)\n- [ ] Audit log records redaction events with session ID, pattern name, and timestamp\n- [ ] Zero false positives on common code patterns (hex colors, UUIDs, base64 test data)\n- [ ] \\`smriti share\\` refuses to export if unredacted secrets are detected (unless \\`--force\\`)\n\n## Testing\n```bash\nbun test test/redactor.test.ts # Pattern matching + redaction tests\nsmriti scan # Dry-run secret detection\nsmriti ingest claude # Verify redaction during ingestion\nsmriti share --project smriti # Verify redaction before export\n```","comments":[],"createdAt":"2026-02-11T10:21:03Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf2WQ","name":"phase-4","description":"Phase 4: Security & policy","color":"FBCA04"}],"number":9,"state":"OPEN","title":"Secret Redaction & Policy Engine","updatedAt":"2026-02-11T10:21:03Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nQuery APIs and CLI commands that leverage the sidecar tables (tool usage, file operations, commands, git operations, errors, costs) for analytics, filtering, and intelligent recall.\n\n## Why\nThe sidecar tables from Phase 1 store rich structured data but there's no way to query them yet. Developers should be able to ask \"what files did I edit today?\", \"show me all failed commands in project X\", or \"which sessions cost the most tokens\".\n\n## Tasks\n- [ ] **File activity queries**: \"what files were touched in session X\" / \"most-edited files this week\"\n- [ ] **Tool usage analytics**: tool frequency, success rates, average duration per tool\n- [ ] **Error analysis**: error type distribution, most common errors, sessions with highest error rate\n- [ ] **Git activity**: commits per session, PR creation timeline, branch activity\n- [ ] **Cost tracking**: token usage per session/project/day, cost trends, cache hit rates\n- [ ] **Search filters**: extend \\`smriti search\\` with \\`--tool\\`, \\`--file\\`, \\`--error-type\\`, \\`--git-op\\` flags\n- [ ] **\\`smriti stats\\`** command overhaul: show sidecar table summaries alongside existing stats\n- [ ] **\\`smriti activity\\`** command: timeline of file operations + commands for a session\n- [ ] **Recall enrichment**: include sidecar data in recall context (e.g., \"this session edited 5 files and ran 12 commands\")\n- [ ] JSON output for all analytics queries (\\`--format json\\`)\n\n## Files\n- \\`src/search/index.ts\\` — Add sidecar-aware search filters\n- \\`src/search/analytics.ts\\` — **new** Analytics query functions\n- \\`src/search/recall.ts\\` — Enrich recall with sidecar context\n- \\`src/index.ts\\` — Add \\`stats\\`, \\`activity\\` CLI commands\n- \\`src/format.ts\\` — Format analytics output (table, JSON, CSV)\n- \\`test/analytics.test.ts\\` — **new** Analytics query tests\n\n## Acceptance Criteria\n- [ ] \\`smriti search \"auth\" --tool Bash\\` returns only sessions where Bash tool was used\n- [ ] \\`smriti search \"auth\" --file \"src/auth.ts\"\\` returns sessions that touched that file\n- [ ] \\`smriti stats\\` shows tool usage, error rates, and cost summaries\n- [ ] \\`smriti activity \\` shows chronological timeline of operations\n- [ ] \\`smriti recall \"query\" --synthesize\\` includes sidecar context in synthesis\n- [ ] All analytics queries return results in < 100ms for databases with 10k+ messages\n\n## Testing\n```bash\nbun test test/analytics.test.ts # Analytics query tests\nsmriti stats # Overview with sidecar data\nsmriti activity # Session activity timeline\nsmriti search \"fix bug\" --tool Bash --format json\n```","comments":[],"createdAt":"2026-02-11T10:17:44Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf2Ag","name":"phase-3","description":"Phase 3: Auto-ingestion & search","color":"D93F0B"}],"number":8,"state":"OPEN","title":"Enhanced Search & Analytics on Structured Data","updatedAt":"2026-02-11T10:17:44Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nA \\`smriti watch\\` command that monitors agent log directories via \\`fs.watch()\\` and auto-ingests new/changed sessions in real-time.\n\n## Why\nCurrently ingestion is manual (\\`smriti ingest claude\\`). Developers forget to run it, or run it too late after context is cold. Auto-ingestion means Smriti always has the latest session data available for search and recall.\n\n## Tasks\n- [ ] Implement \\`smriti watch\\` CLI command with graceful start/stop\n- [ ] Use \\`fs.watch()\\` (or Bun's equivalent) to monitor \\`~/.claude/projects/\\` and other agent log dirs\n- [ ] Debounce file change events (JSONL files get appended to frequently during active sessions)\n- [ ] Incremental ingestion: track file size/mtime, only re-parse appended content\n- [ ] Handle session file rotation (new session creates new file)\n- [ ] PID file at \\`~/.smriti/watch.pid\\` for single-instance enforcement\n- [ ] \\`smriti watch --daemon\\` for background mode (detached process)\n- [ ] \\`smriti watch --stop\\` to kill running daemon\n- [ ] \\`smriti watch --status\\` to check if daemon is running\n- [ ] Optional auto-embed: trigger embedding generation after ingestion\n- [ ] Optional auto-categorize: trigger categorization after ingestion\n- [ ] Configurable debounce interval via \\`SMRITI_WATCH_DEBOUNCE_MS\\` (default: 2000)\n\n## Files\n- \\`src/watch.ts\\` — **new** Watch daemon implementation\n- \\`src/index.ts\\` — Add \\`watch\\` command to CLI\n- \\`src/config.ts\\` — Add watch-related config vars\n- \\`test/watch.test.ts\\` — **new** Watch daemon tests (using temp directories)\n\n## Acceptance Criteria\n- [ ] \\`smriti watch\\` starts monitoring and logs ingestion events\n- [ ] New Claude sessions appear in \\`smriti search\\` within seconds of creation\n- [ ] Appending to existing session files triggers incremental re-ingestion\n- [ ] Only one watch daemon runs at a time (PID file enforcement)\n- [ ] \\`smriti watch --stop\\` cleanly terminates the daemon\n- [ ] CPU usage stays below 1% when idle (no busy polling)\n- [ ] Handles agent log directory not existing (waits for creation)\n\n## Testing\n```bash\nbun test test/watch.test.ts # Unit tests with temp dirs\nsmriti watch # Manual: start watching\n# In another terminal, use Claude Code — sessions should auto-ingest\nsmriti watch --status # Check daemon status\nsmriti watch --stop # Stop cleanly\n```","comments":[],"createdAt":"2026-02-11T10:17:19Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf2Ag","name":"phase-3","description":"Phase 3: Auto-ingestion & search","color":"D93F0B"}],"number":7,"state":"OPEN","title":"Auto-Ingestion Watch Daemon","updatedAt":"2026-02-11T10:17:19Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nAdd ingestion parsers for Cline (VS Code extension) and Aider (terminal-based coding agent) conversation logs, producing the same `StructuredMessage` format as the Claude parser.\n\n## Why\nTeams using multiple AI agents lose cross-tool visibility. A developer might debug with Aider, implement with Claude Code, and review with Cline — all touching the same files. Without unified ingestion, Smriti only captures one agent's perspective.\n\n## Tasks\n- [ ] Research Cline log format (VS Code extension storage, `.cline/` or workspace-level)\n- [ ] Implement `parseClineSession()` → `StructuredMessage[]`\n- [ ] Map Cline tool calls to `MessageBlock` types (file edits, terminal commands, browser actions)\n- [ ] Research Aider log format (`.aider.chat.history.md`, `.aider.input.history`)\n- [ ] Implement `parseAiderSession()` → `StructuredMessage[]`\n- [ ] Extract Aider-specific data: `/commands`, edit format (diff/whole/architect), lint results\n- [ ] Add `cline` and `aider` to `smriti_agents` seed data\n- [ ] Session discovery for both agents (`discoverClineSessions()`, `discoverAiderSessions()`)\n- [ ] Register parsers in `src/ingest/index.ts` orchestrator\n- [ ] Test with real session files from both agents\n\n## Files\n- `src/ingest/cline.ts` — **new** Cline parser\n- `src/ingest/aider.ts` — **new** Aider parser\n- `src/ingest/index.ts` — Register new agents in ingest orchestrator\n- `src/db.ts` — Add `cline`/`aider` to `DEFAULT_AGENTS`\n- `test/cline.test.ts` — **new** Cline parser tests\n- `test/aider.test.ts` — **new** Aider parser tests\n- `test/fixtures/cline/` — **new** Sample Cline session files\n- `test/fixtures/aider/` — **new** Sample Aider session files\n\n## Acceptance Criteria\n- [ ] `smriti ingest cline` ingests Cline sessions with structured blocks\n- [ ] `smriti ingest aider` ingests Aider sessions with structured blocks\n- [ ] `smriti ingest all` includes both new agents\n- [ ] File operations, commands, and errors populate sidecar tables\n- [ ] Cross-agent search returns results from all three agents\n- [ ] No regressions in existing Claude parser tests\n\n## Testing\n```bash\nbun test test/cline.test.ts # Cline parser unit tests\nbun test test/aider.test.ts # Aider parser unit tests\nbun test # Full suite — no regressions\nsmriti ingest all # Real ingestion of all agents\nsmriti search \"fix auth\" --agent cline # Cross-agent search\n```","comments":[],"createdAt":"2026-02-11T10:17:14Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf1zw","name":"phase-2","description":"Phase 2: New agent parsers","color":"1D76DB"}],"number":6,"state":"OPEN","title":"Cline + Aider Agent Parsers","updatedAt":"2026-02-11T10:17:14Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## What\nStructured block extraction from Claude Code JSONL transcripts — every tool call, file operation, git command, error, and thinking block is parsed into typed `MessageBlock` objects and stored in queryable sidecar tables.\n\n## Why\nPreviously Smriti ingested sessions as flat text, losing 80%+ of structured data: which files were edited, what commands ran, token costs, git operations, and error patterns. This phase makes that data queryable.\n\n## Tasks\n- [x] Define `StructuredMessage` and `MessageBlock` union type with 13 block types (`src/ingest/types.ts`)\n- [x] Implement block extraction from raw Claude API content blocks (`src/ingest/blocks.ts`)\n- [x] Git command detection and parsing (commit messages, branches, PR creation)\n- [x] `gh pr create` detection via `parseGhPrCommand()`\n- [x] Storage limits and truncation for all block types\n- [x] `flattenBlocksToText()` for backward-compatible FTS indexing\n- [x] System event parsing (turn_duration, pr-link, file-history-snapshot)\n- [x] Enriched `parseClaudeJsonlStructured()` parser alongside legacy `parseClaudeJsonl()`\n- [x] Sidecar table schema: `smriti_tool_usage`, `smriti_file_operations`, `smriti_commands`, `smriti_errors`, `smriti_git_operations`, `smriti_session_costs`\n- [x] Sidecar table population during ingestion pipeline\n- [x] Token/cost accumulation via `upsertSessionCosts()`\n- [x] Full test coverage for block extraction, git parsing, structured parsing, and sidecar inserts\n\n## Files\n- `src/ingest/types.ts` — `StructuredMessage`, `MessageBlock` union, `MessageMetadata`, storage limits\n- `src/ingest/blocks.ts` — `extractBlocks()`, `toolCallToBlocks()`, `parseGitCommand()`, `flattenBlocksToText()`\n- `src/ingest/claude.ts` — `parseClaudeJsonlStructured()`, enriched `ingestClaude()` with sidecar population\n- `src/ingest/index.ts` — Updated orchestrator types\n- `src/db.ts` — 6 new sidecar tables + indexes + insert helpers\n- `test/blocks.test.ts` — Block extraction tests\n- `test/structured-ingest.test.ts` — End-to-end structured parsing tests\n- `test/team.test.ts` — Updated for new schema\n\n## Acceptance Criteria\n- [x] All 13 block types extracted from real Claude JSONL transcripts\n- [x] Git commands parsed into structured `GitBlock` with operation, branch, message\n- [x] Tool calls decomposed into both generic `ToolCallBlock` + domain-specific blocks\n- [x] Sidecar tables populated atomically during ingestion\n- [x] Legacy `parseClaudeJsonl()` still works unchanged\n- [x] 142 tests passing, 415 expect() calls\n\n## Testing\n```bash\nbun test # All 142 tests pass\nbun test test/blocks.test.ts # Block extraction unit tests\nbun test test/structured-ingest.test.ts # Structured parsing integration\n```","comments":[],"createdAt":"2026-02-11T10:16:02Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXwf1eQ","name":"phase-1","description":"Phase 1: Enriched ingestion","color":"0E8A16"},{"id":"LA_kwDORM6Bzs8AAAACXwf3Ng","name":"done","description":"Completed work","color":"0E8A16"}],"number":5,"state":"OPEN","title":"[DONE] Enriched Claude Code Parser","updatedAt":"2026-02-11T10:16:02Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"Ideas to explore:\n\n1. **Searchable auto-generated documentation** — Use ingested sessions to auto-generate searchable project documentation from the knowledge base.\n\n2. **Onboarding-driven prompt generation** — During onboarding, talk to the user to understand their team's ethos and coding philosophy, then auto-generate category-specific prompts that reflect those values.\n\n3. **Further token cost optimization** — Explore more aggressive deduplication, smarter context selection, and compression strategies to push token savings even further.\n\n4. **Open exploration** — What else can a persistent, searchable AI memory layer enable? Plugin system? IDE integrations beyond Claude Code? Cross-team knowledge graphs?\n\n---\n\n> I have to stop building anything on this and start reaching out to devs to try this out. Happy coding. Happy vibe coding, let ideas flow. See ya!","comments":[],"createdAt":"2026-02-10T18:40:13Z","labels":[],"number":4,"state":"OPEN","title":"Future ideas & possibilities","updatedAt":"2026-02-10T18:40:13Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## The question\n\nWhen smriti shares a session about a bug fix, should the resulting article look the same as one about an architecture decision? Or a code pattern?\n\nRight now, every session — regardless of category — goes through the same reflection prompt and produces the same 5-section structure. That works, but it means a bug investigation article emphasizes the same things as a design tradeoff article. They probably shouldn't.\n\n## What exists today\n\nThe `share --reflect` pipeline works like this:\n\n1. Sessions are categorized into one of 7 top-level categories (with 21 subcategories): `bug`, `code`, `architecture`, `decision`, `feature`, `project`, `topic`\n2. When sharing, **all categories** go through the same prompt template: `src/team/prompts/share-reflect.md`\n3. That prompt produces 5 fixed sections: **Summary**, **Changes**, **Decisions**, **Insights**, **Context**\n4. Projects can override the prompt by placing a custom `share-reflect.md` at `.smriti/prompts/share-reflect.md` — but that's a single override for the whole project, not per-category\n\nThe prompt loading in `reflect.ts` is straightforward — `loadPromptTemplate()` checks for a project-level override, then falls back to the built-in default. There's no category awareness in the resolution path.\n\n## The idea\n\nWhat if prompt templates were resolved per-category? Something like:\n\n```\n.smriti/prompts/\n├── share-reflect.md # default fallback (exists today)\n├── bug/\n│ └── share-reflect.md # bug-specific template\n├── architecture/\n│ └── share-reflect.md # architecture-specific template\n└── code/\n └── share-reflect.md # code-specific template\n```\n\nThe resolution order would be:\n\n1. `.smriti/prompts/{category}/share-reflect.md` — project + category override\n2. Built-in category default (shipped with smriti)\n3. `.smriti/prompts/share-reflect.md` — project-wide override\n4. Built-in default (what exists today)\n\n## Concrete examples\n\nHere's how different categories might benefit from different section structures:\n\n**Bug fix** (`bug/fix`):\n\n```markdown\n### Summary\n### Root Cause\n### Reproduction Steps\n### Fix Applied\n### Verification\n### Related Areas\n```\n\nThe emphasis is on *what went wrong and how to prevent it*. \"Decisions\" and \"Insights\" from the generic template don't guide the LLM toward root cause analysis.\n\n**Architecture decision** (`architecture/decision`):\n\n```markdown\n### Summary\n### Problem Statement\n### Options Considered\n### Decision & Rationale\n### Tradeoffs Accepted\n### Implications\n```\n\nHere the value is in *capturing alternatives that were rejected and why*. The generic \"Decisions\" section doesn't explicitly prompt for alternatives considered.\n\n**Code pattern** (`code/pattern`):\n\n```markdown\n### Summary\n### Pattern Description\n### When to Use\n### Usage Example\n### Gotchas\n```\n\nA code pattern article should be *reference material* — something you can skim and apply. The generic template's \"Changes\" and \"Context\" sections add noise here.\n\n## Possible directions\n\nA few ways this could work — not mutually exclusive:\n\n**1. Hierarchical prompt resolution**\nExtend `loadPromptTemplate()` to accept a category ID and walk up the hierarchy: `bug/fix` → `bug` → default. This is the minimal change — mostly just path resolution logic.\n\n**2. Category-specific section structures**\nShip built-in prompt templates for each top-level category. The `parseSynthesis()` function would need to become more flexible — instead of looking for hardcoded `### Summary`, `### Changes`, etc., it would parse whatever `###` sections the template defines.\n\n**3. Category-specific sanitization**\nDifferent categories might also benefit from different content filtering. A bug session might want to preserve error messages and stack traces that the current sanitizer strips. A code pattern might want to preserve more code blocks. This is a secondary concern but worth thinking about alongside prompt templates.\n\n**4. Template inheritance / composition**\nInstead of fully separate templates, allow templates to extend a base. E.g., a bug template could say \"use the default sections, but add Root Cause after Summary and rename Changes to Fix Applied.\" This is more complex but avoids template drift.\n\n## Open questions\n\nThese are the things I'm not sure about — would love input:\n\n- **Is per-category the right granularity?** Should it be per top-level category (`bug`), per subcategory (`bug/fix` vs `bug/investigation`), or something else entirely?\n- **Should sections vary or stay fixed?** There's a simplicity argument for keeping the same 5 sections but changing the *instructions within each section* per category. Versus fully different section structures per category.\n- **How should subcategories resolve?** If `bug/fix` doesn't have a template, should it fall back to `bug`, then to default? Or is one level enough?\n- **Built-in vs user-only?** Should smriti ship opinionated per-category templates, or just provide the mechanism for users to create their own?\n- **What about the parser?** `parseSynthesis()` currently looks for 5 specific section headers. If sections vary by category, the parser needs to become dynamic. What's the right abstraction?\n\n## Current extension points\n\nFor anyone who wants to prototype this, here's where things connect:\n\n- **Prompt loading**: `src/team/reflect.ts` → `loadPromptTemplate(projectSmritiDir?)` — this is where category-aware resolution would go\n- **Prompt template**: `src/team/prompts/share-reflect.md` — the `{{conversation}}` placeholder and section structure\n- **Synthesis parsing**: `src/team/reflect.ts` → `parseSynthesis(text)` — hardcoded section headers that would need to flex\n- **Category info**: `src/categorize/schema.ts` — category IDs and hierarchy\n- **Share entry point**: `src/team/share.ts` → `shareKnowledge()` — where category is known and could be passed to `synthesizeSession()`\n- **Session tags**: `smriti_session_tags` table — maps sessions to categories with confidence scores\n\nThe minimal prototype would be: pass the session's category ID into `loadPromptTemplate()`, check for `prompts/{category}/share-reflect.md` before the default, and see if the output quality improves for a few specific categories.","comments":[],"createdAt":"2026-02-10T18:00:48Z","labels":[{"id":"LA_kwDORM6Bzs8AAAACXowH-Q","name":"enhancement","description":"New feature or request","color":"a2eeef"},{"id":"LA_kwDORM6Bzs8AAAACXrxBJA","name":"discussion","description":"Open-ended discussion or RFC","color":"c2e0c6"}],"number":3,"state":"OPEN","title":"RFC: Per-category prompt templates for knowledge representation","updatedAt":"2026-02-10T18:00:48Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## Problem\n\nCustom categories are per-machine only. They live in each user's local SQLite `smriti_categories` table and never travel with the repo.\n\nWhen a team defines custom categories to organize their codebase (e.g., `client/web-ui`, `infra/k8s`, `ops/incident`), every teammate has to manually recreate them. Worse — if someone shares a session tagged with a custom category, `smriti sync` writes the tag into `smriti_session_tags` but the category doesn't exist in the importing user's `smriti_categories` table. The tag becomes an orphan: it exists in the tags table but can't be filtered, listed, or validated.\n\n### Current state of `.smriti/config.json`\n\nThe file already exists — `share.ts` creates it at line 331-344:\n\n```json\n{\n \"version\": 1,\n \"allowedCategories\": [\"*\"],\n \"autoSync\": false\n}\n```\n\nBut it's **write-only**: `sync.ts` never reads it. It has no category definitions.\n\n## Proposal\n\nExtend `.smriti/config.json` to be the team's shared configuration file. It gets committed to git with the rest of `.smriti/` and is read by `smriti sync` to bootstrap the importing user's environment.\n\n### Config format\n\n```json\n{\n \"version\": 2,\n \"categories\": [\n {\n \"id\": \"client\",\n \"name\": \"Client-side\",\n \"description\": \"Frontend and client-side development\"\n },\n {\n \"id\": \"client/web-ui\",\n \"name\": \"Web UI\",\n \"parent\": \"client\"\n },\n {\n \"id\": \"client/mobile\",\n \"name\": \"Mobile\",\n \"parent\": \"client\"\n },\n {\n \"id\": \"infra\",\n \"name\": \"Infrastructure\"\n },\n {\n \"id\": \"infra/k8s\",\n \"name\": \"Kubernetes\",\n \"parent\": \"infra\"\n }\n ],\n \"allowedCategories\": [\"*\"],\n \"autoSync\": false\n}\n```\n\nOnly custom categories need to be listed — the 7 built-in top-level categories and 21 subcategories are always present (seeded in `db.ts`).\n\n## Implementation Plan\n\n### 1. Define config schema (`src/team/config.ts` — new file)\n\n```ts\ninterface SmritiConfig {\n version: number;\n categories?: CustomCategoryDef[];\n allowedCategories?: string[];\n autoSync?: boolean;\n}\n\ninterface CustomCategoryDef {\n id: string;\n name: string;\n parent?: string;\n description?: string;\n}\n```\n\nAdd functions:\n- `readConfig(projectPath: string): SmritiConfig` — reads and validates `.smriti/config.json`\n- `writeConfig(projectPath: string, config: SmritiConfig)` — writes config (used by share)\n- `mergeCategories(db: Database, categories: CustomCategoryDef[])` — idempotently ensures all listed categories exist in the local DB\n\n### 2. Update `share.ts` to export custom categories\n\nDuring `smriti share`, query `smriti_categories` for any categories **not** in the built-in `DEFAULT_CATEGORIES` list. Write them into the `categories` array in `config.json`.\n\n```ts\n// Pseudocode\nconst builtinIds = new Set(DEFAULT_CATEGORIES.flatMap(c => [c.id, ...c.children.map(ch => ch.id)]));\nconst custom = db.prepare(\n `SELECT id, name, parent_id, description FROM smriti_categories WHERE id NOT IN (${[...builtinIds].map(() => '?').join(',')})`\n).all(...builtinIds);\n\nconfig.categories = custom.map(c => ({\n id: c.id,\n name: c.name,\n parent: c.parent_id || undefined,\n description: c.description || undefined,\n}));\n```\n\nBump version to `2` when categories are present.\n\n### 3. Update `sync.ts` to import custom categories\n\nBefore importing knowledge files, read `.smriti/config.json` and call `mergeCategories()`:\n\n```ts\nconst config = readConfig(smritiDir);\nif (config.categories?.length) {\n mergeCategories(db, config.categories);\n}\n// Then proceed with existing file import...\n```\n\n`mergeCategories` should:\n- Sort categories so parents come before children (topological order)\n- For each category, call `createCategory()` if it doesn't already exist (use `INSERT OR IGNORE` semantics)\n- Skip categories that already exist with the same ID (idempotent)\n- Log newly created categories so the user sees what was added\n\n### 4. Add CLI command to manage team config\n\n```bash\n# Initialize .smriti/config.json in the current project\nsmriti config init\n\n# Add a custom category to the team config (writes to .smriti/config.json)\nsmriti config add-category --name [--parent ] [--description ]\n\n# Show current team config\nsmriti config show\n```\n\n`smriti config add-category` should both:\n- Add the category to the local SQLite DB (so it's immediately usable)\n- Append it to `.smriti/config.json` (so it travels with git)\n\nThis gives teams a single command to define a shared custom category.\n\n### 5. Backward compatibility\n\n- `version: 1` configs (no `categories` field) continue to work — sync just skips category import\n- `version: 2` configs are forward-compatible — unknown fields are ignored\n- The existing `allowedCategories` and `autoSync` fields are preserved\n\n### 6. Update classifier to include custom categories (`src/categorize/classifier.ts`)\n\nCurrently `classifyByLLM()` sends only `ALL_CATEGORY_IDS` (built-in) in its prompt. After this change:\n- Query the DB for all categories (built-in + custom)\n- Include custom category IDs in the LLM prompt so Ollama can classify into them\n- Custom categories won't have rule-based patterns (no keyword rules), so they'll rely on LLM classification or manual tagging\n\n### 7. Tests\n\n| Test | File | What it verifies |\n|------|------|-----------------|\n| Config roundtrip | `test/team.test.ts` | Write config with categories → read it back → same data |\n| Sync imports categories | `test/team.test.ts` | Sync from a `.smriti/` with custom categories → categories exist in local DB |\n| Idempotent merge | `test/team.test.ts` | Sync twice with same config → no duplicates, no errors |\n| Share exports custom cats | `test/team.test.ts` | Add custom category → share → config.json contains it |\n| Parent ordering | `test/team.test.ts` | Config with child before parent → merge still works (topological sort) |\n| Version 1 compat | `test/team.test.ts` | Sync with v1 config (no categories) → no errors |\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `src/team/config.ts` | **New** — Config schema, read/write/merge functions |\n| `src/team/share.ts` | Export custom categories to config.json |\n| `src/team/sync.ts` | Read config.json and import categories before syncing files |\n| `src/index.ts` | Add `smriti config` subcommand |\n| `src/categorize/classifier.ts` | Include custom categories in LLM classification prompt |\n| `test/team.test.ts` | Config roundtrip, sync, idempotency, backward compat tests |\n\n## End-to-End Example\n\n```bash\n# Alice sets up custom categories for her team\nsmriti categories add client --name \"Client-side\"\nsmriti categories add client/web-ui --name \"Web UI\" --parent client\n\n# Alice shares — custom categories are written to .smriti/config.json\nsmriti share --project myapp\n\n# Alice commits\ngit add .smriti/ && git commit -m \"Share team knowledge\"\ngit push\n\n# Bob pulls and syncs\ngit pull\nsmriti sync --project myapp\n# Output:\n# Imported 2 custom categories: client, client/web-ui\n# Imported 5 sessions from .smriti/knowledge/\n\n# Bob can now filter by the team's custom categories\nsmriti list --category client\nsmriti search \"button styling\" --category client/web-ui\n```","comments":[],"createdAt":"2026-02-10T17:46:45Z","labels":[],"number":2,"state":"OPEN","title":"Add .smriti/config.json as team-shared config with custom categories","updatedAt":"2026-02-10T17:46:45Z"},{"author":{"id":"MDQ6VXNlcjc5MjY2NjE=","is_bot":false,"login":"ashu17706","name":"Ashutosh Tripathi"},"body":"## Problem\n\nWhen sessions are shared via `smriti share`, **all** category tags are serialized into the YAML frontmatter — the primary category as a scalar `category` field and all tags (including secondary ones) as a `tags` array:\n\n```yaml\n---\ncategory: project\ntags: [\"project\", \"project/dependency\", \"decision/tooling\"]\n---\n```\n\nHowever, when a teammate runs `smriti sync`, **only the primary `category` field is read**. The `tags` array is ignored entirely. This means secondary tags are silently lost during the roundtrip.\n\n### Example\n\nA session tagged with `project`, `project/dependency`, and `decision/tooling`:\n\n| Stage | Tags |\n|-------|------|\n| Before share | `project`, `project/dependency`, `decision/tooling` |\n| In frontmatter | `category: project` + `tags: [\"project\", \"project/dependency\", \"decision/tooling\"]` |\n| After sync | `project` only |\n\n## Goal\n\nMake serialization and deserialization symmetric — every tag written by `share` must be restored by `sync`.\n\n## Implementation Plan\n\n### 1. Fix `parseFrontmatter()` array parsing (`src/team/sync.ts`)\n\nThe current `parseFrontmatter()` is a naive key-value parser that treats every value as a plain string. It does not handle JSON-style arrays like `[\"project\", \"project/dependency\"]`.\n\n**Changes:**\n- After splitting on the first `:`, detect if the trimmed value starts with `[` and ends with `]`\n- If so, parse the array elements (split by `,`, trim whitespace and quotes from each element)\n- Return the parsed array instead of the raw string\n\n```ts\n// Before\nmeta[key] = value.replace(/^[\"']|[\"']$/g, \"\");\n\n// After\nif (value.startsWith(\"[\") && value.endsWith(\"]\")) {\n meta[key] = value\n .slice(1, -1)\n .split(\",\")\n .map((s) => s.trim().replace(/^[\"']|[\"']$/g, \"\"));\n} else {\n meta[key] = value.replace(/^[\"']|[\"']$/g, \"\");\n}\n```\n\n### 2. Restore all tags during sync (`src/team/sync.ts`)\n\nCurrently sync only calls `tagSession()` once for `meta.category`. After parsing `meta.tags` as an array, iterate and restore each tag.\n\n**Changes** (around line 191-193 in `sync.ts`):\n\n```ts\n// Before\nif (meta.category) {\n tagSession(db, sessionId, meta.category, 1.0, \"team\");\n}\n\n// After\nif (meta.tags && Array.isArray(meta.tags)) {\n for (const tag of meta.tags) {\n if (isValidCategory(db, tag)) {\n tagSession(db, sessionId, tag, 1.0, \"team\");\n }\n }\n} else if (meta.category) {\n // Fallback for older exports that only have the scalar field\n tagSession(db, sessionId, meta.category, 1.0, \"team\");\n}\n```\n\nThis is backward-compatible: older shared files without a `tags` array still work via the `category` fallback.\n\n### 3. Validate tags on import\n\nUse `isValidCategory(db, tag)` (already exists in `src/categorize/schema.ts`) to skip any tag IDs that don't exist in the importing user's category tree. This prevents sync from crashing if the sharer had custom categories the importer hasn't added yet.\n\nOptionally log a warning: `\"Skipping unknown category: ops/incident\"` so the user knows to run `smriti categories add` if needed.\n\n### 4. Add tests (`test/team.test.ts`)\n\n- **Roundtrip test**: Create a session with multiple tags → share → sync into a fresh DB → assert all tags are present\n- **Backward compat test**: Sync a file with only `category:` (no `tags:` array) → assert primary tag is restored\n- **Invalid tag test**: Sync a file with a `tags` array containing an unknown category → assert valid tags are restored and invalid ones are skipped with a warning\n- **Frontmatter parser test**: Verify `parseFrontmatter()` correctly parses `tags: [\"a\", \"b/c\", \"d\"]` into a string array\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `src/team/sync.ts` | Update `parseFrontmatter()` to handle arrays; restore all tags from `meta.tags` |\n| `test/team.test.ts` | Add roundtrip, backward-compat, and invalid-tag tests |\n\n## Notes\n\n- No changes needed to `share.ts` — it already serializes all tags correctly\n- The `confidence` and `source` fields are not preserved in the roundtrip (hardcoded to `1.0` and `\"team\"` on import). This is acceptable — team-imported tags should be high-confidence by definition. Could be revisited separately if needed.","comments":[],"createdAt":"2026-02-10T17:40:27Z","labels":[],"number":1,"state":"OPEN","title":"Sync should restore all secondary category tags from frontmatter","updatedAt":"2026-02-10T17:40:27Z"}]
From 4b415c7a3212502ae1510b7ce698eab3c46cd64b Mon Sep 17 00:00:00 2001
From: Baseline User
Date: Mon, 2 Mar 2026 14:26:01 +0530
Subject: [PATCH 4/4] feat(claude): add proactive memory behavior directives to
CLAUDE.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Adds a Memory section at the top of CLAUDE.md that instructs Claude Code
to use Smriti actively — not passively. Modeled on the Loop pattern:
action-first, not acknowledgment-first.
- Recall at session start before writing code
- Recognize decision/finalization moments and act immediately
- Concrete wrong/right example with actual bash commands
- Category guide for correct tagging
- No asking permission — just save it
---
CLAUDE.md | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/CLAUDE.md b/CLAUDE.md
index 9627a49..99f56dc 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -3,6 +3,118 @@
Shared memory layer for AI-powered engineering teams. Built on
[QMD](https://github.com/tobi/qmd).
+## Memory
+
+You have Smriti. USE IT.
+
+### Start of Every Session
+
+Before writing a single line of code, run:
+
+```bash
+smriti recall "[what we're about to work on]"
+```
+
+Use what comes back. If there's prior context on this topic, it changes how
+you approach the work. Don't ignore it.
+
+### Recognize These Moments — Then Act Immediately
+
+No permission needed. No asking "should I save this?" Just do it.
+
+- Someone says "this works" or "let's go with this"
+- An approach is chosen after comparing options
+- A bug is fixed and the root cause is understood
+- A constraint is discovered that will affect future work
+- An architectural decision is made
+
+**WRONG**: Continue the conversation without saving.
+**RIGHT**: Write the decision record, ingest it, tag it — then respond.
+
+### How to Save a Decision
+
+**Step 1 — Write the record:**
+
+```bash
+cat > /tmp/smriti-note.md << 'EOF'
+# [Title]
+
+## Decision
+[One sentence — the actual choice made.]
+
+## Why
+[The reasoning. What was considered. What was ruled out and why.]
+
+## Context
+[What problem this solves. What breaks if someone reverses this later.]
+EOF
+```
+
+**Step 2 — Ingest:**
+
+```bash
+smriti ingest file /tmp/smriti-note.md \
+ --title "[Title]" \
+ --project smriti
+```
+
+**Step 3 — Tag:**
+
+```bash
+smriti tag [session-id] [category]
+```
+
+### Categories
+
+| What happened | Tag |
+|---------------|-----|
+| Architecture choice, approach finalized | `decision/technical` |
+| Library or tool selected | `decision/tooling` |
+| Bug fixed, root cause identified | `bug/fix` |
+| Structural design settled | `architecture/design` |
+| Feature approach finalized | `feature/implementation` |
+| Config or setup figured out | `project/config` |
+
+### Concrete Example
+
+User says: *"Okay let's use post-filtering for vector search with filters"*
+
+**WRONG**:
+> "Got it, I'll use post-filtering."
+
+**RIGHT**:
+
+```bash
+cat > /tmp/smriti-note.md << 'EOF'
+# Vector search with filters: post-filter approach
+
+## Decision
+Use post-filter strategy for applying metadata filters to vector search
+results, rather than combining sqlite-vec queries with EXISTS subqueries.
+
+## Why
+sqlite-vec's two-step query pattern hangs when combined with Smriti's
+EXISTS subqueries on metadata tables. Post-filtering runs vector search
+unfiltered (3x overfetch), then filters results against session metadata
+in a separate batch lookup — no sqlite-vec changes needed.
+
+## Context
+Affects smriti recall when --project, --category, or --agent flags are
+used. Without this, filtered recall loses semantic search entirely and
+falls back to BM25 only.
+EOF
+
+smriti ingest file /tmp/smriti-note.md \
+ --title "Vector search with filters: post-filter approach" \
+ --project smriti
+
+smriti tag [session-id] decision/technical
+```
+
+Then respond to the user.
+
+---
+
## Quick Reference
```bash