sakrut · sakrut · Feb 4, 2026 · Feb 1, 2026 · Feb 1, 2026 · Feb 2, 2026
diff --git a/.claude/commands/cg/check-deps.md b/.claude/commands/cg/check-deps.md
@@ -0,0 +1,11 @@
+Check for forbidden dependencies: $ARGUMENTS
+
+Steps:
+1. Run `ai-code-graph check-deps --db ./ai-code-graph/graph.db` (use $ARGUMENTS for custom rules if provided)
+2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
+3. Present any violations of dependency rules:
+   - Layer violations (e.g., Domain -> Infrastructure)
+   - Circular dependencies
+   - Forbidden namespace dependencies
+4. For each violation, show the dependency chain and suggest how to fix it
+5. If no violations found, confirm the architecture is clean
diff --git a/.claude/commands/cg/layers.md b/.claude/commands/cg/layers.md
@@ -0,0 +1,11 @@
+Show architectural layer assignments: $ARGUMENTS
+
+Steps:
+1. Run `ai-code-graph layers --db ./ai-code-graph/graph.db` (filter by $ARGUMENTS if provided)
+2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
+3. Present the layer assignments showing which namespaces/types belong to which architectural layers:
+   - Presentation (Controllers, Views, Pages)
+   - Application (Services, Handlers, UseCases)
+   - Domain (Entities, ValueObjects, Aggregates)
+   - Infrastructure (Repositories, DbContexts, External)
+4. Highlight any layer violations (e.g., Domain depending on Infrastructure)
diff --git a/.claude/commands/cg/query.md b/.claude/commands/cg/query.md
@@ -0,0 +1,13 @@
+Graph-based method retrieval: $ARGUMENTS
+
+Steps:
+1. Parse $ARGUMENTS for quick options or JSON query:
+   - `--callers MethodName` -> find all callers of a method
+   - `--callees MethodName` -> find all callees of a method
+   - `--impact MethodName` -> transitive impact analysis
+   - `--cluster ClusterLabel` -> methods in a cluster
+   - JSON query for advanced use
+2. Run `ai-code-graph query $ARGUMENTS --db ./ai-code-graph/graph.db`
+3. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
+4. Present the results with method IDs for stable references
+5. Use `--format json` for structured output if needed
diff --git a/.claude/commands/cg/semantic-search.md b/.claude/commands/cg/semantic-search.md
@@ -1,5 +1,8 @@
 Search code by semantic meaning: $ARGUMENTS
 
+Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
+Use semantic-search as a fallback when you need natural language matching or when query returns no results.
+
 Steps:
 1. Run `ai-code-graph semantic-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
 2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first

diff --git a/.claude/commands/cg/status.md b/.claude/commands/cg/status.md
@@ -0,0 +1,11 @@
+Show database status and staleness detection.
+
+Steps:
+1. Run `ai-code-graph status --db ./ai-code-graph/graph.db`
+2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
+3. Present the status information:
+   - Database path and size
+   - Last analysis timestamp
+   - Method/type/namespace counts
+   - Staleness indicator (files changed since last analysis)
+4. If database is stale, suggest re-running `ai-code-graph analyze`
diff --git a/.claude/commands/cg/token-search.md b/.claude/commands/cg/token-search.md
@@ -1,5 +1,8 @@
 Search code by token overlap: $ARGUMENTS
 
+Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
+Use token-search as a fallback when query returns no results or when you need fuzzy text matching.
+
 Steps:
 1. Run `ai-code-graph token-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
 2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first

diff --git a/.gitignore b/.gitignore
@@ -23,6 +23,9 @@ Thumbs.db
 # AI Code Graph output
 ai-code-graph/
 
+# Local benchmark artifacts (generated)
+benchmark/
+
 # Test results
 TestResults/
 *.trx

diff --git a/.taskmaster/docs/prd-gpt-direction.md b/.taskmaster/docs/prd-gpt-direction.md
@@ -0,0 +1,90 @@
+# ai-code-graph — Product Direction & Technical Roadmap (GPT PDR)
+
+> Source: user-provided PDR. Assumption: this document is correct and should drive planning.
+
+## 1) What This Repository IS (and IS NOT)
+
+### IS: Semantic Code Intelligence Engine for AI Agents in Legacy .NET
+- Roslyn-based semantic graph as the source of truth
+- Precomputed, deterministic analysis
+- AI agents consume facts, never infer architecture
+- CLI / MCP-first integration (Claude Code, Codex, Continue)
+
+### IS NOT
+- Not a coding agent
+- Not an IDE replacement
+- Not a generic RAG framework
+- Not a vector-search-first system
+
+## 2) Core Principles (Non-Negotiable)
+1. Roslyn > LLM inference
+2. Graph-first, AI-second
+3. Precompute what is expensive
+4. .NET-first focus (avoid multi-language dilution)
+
+## 3) Current Strengths (Keep & Double Down)
+- Roslyn semantic graph (accurate symbol resolution, call graphs, dependencies, generics, DI)
+- Precomputed graph as a knowledge base (fast, deterministic, stable across sessions)
+- MCP / tool interface (`cg:*`) for infra-level integration
+
+## 4) Key Problems to Fix
+
+### 4.1 Token search as primary retrieval
+Problem: shallow relevance, no structural understanding.
+Direction: replace with graph-first retrieval: graph traversal → ranking → optional vector recall.
+
+### 4.2 No formal query model
+Problem: many commands, no unified query abstraction.
+Direction: introduce a Graph Query Schema (seed/expand/depth/filters/rank). Benefits: easier for AI, cacheable, testable.
+
+### 4.3 Missing architectural facts
+Problem: architecture is implicit.
+Direction: precompute architectural primitives:
+- layer detection (API/Application/Domain/Infra)
+- hotspots (churn + complexity)
+- blast radius
+- forbidden dependencies
+- “do not touch” zones
+
+## 5) What to explicitly avoid
+- Generic vector RAG as the primary approach
+- Competing with agents/IDEs via UX/codegen
+
+## 6) Strategic positioning
+ai-code-graph = Semantic Code Intelligence Layer for AI agents working in legacy .NET.
+Target users: senior devs, tech leads, architects, AI-assisted teams onboarding legacy systems.
+
+## 7) Recommended technical roadmap
+
+### Sprint 1 — Graph-native retrieval
+- graph traversal engine
+- ranking strategies: blast radius, complexity, coupling
+- replace token search as default
+
+### Sprint 2 — Query & architecture layer
+- unified query schema
+- architectural facts extraction
+- layer detection
+- dependency violation detection
+
+### Sprint 3 — Hybrid retrieval (optional)
+- embeddings per graph node
+- vector search only for recall
+- graph always decides relevance
+
+### Sprint 4 — Memory integration
+- integrate with Zep / Mem0
+- store decisions, historical reasons, danger zones
+
+## 8) Ideal AI workflow
+1) AI asks high-level question
+2) ai-code-graph returns subgraph + architectural facts + ranked nodes
+3) AI reasons on stable context
+4) coding agent executes changes
+
+## 9) Success criteria
+- fewer tokens required
+- fewer exploratory calls
+- stable understanding across sessions
+- safer refactors
+- faster onboarding
diff --git a/.taskmaster/docs/prd-next.md b/.taskmaster/docs/prd-next.md
@@ -0,0 +1,114 @@
+# AI Code Graph — Next Milestone PRD (Token-Efficient Code Navigation for LLMs)
+
+## 0) Intent
+Refocus AI Code Graph into a **high-signal / low-token** code navigation layer for LLM agents working on .NET repos.
+
+Primary value proposition: **fast, semantically correct context reconstruction** (call graph + complexity + coupling + dead-code) with minimal output.
+
+## What Changed vs v1
+
+This milestone prioritizes **token economy** over feature breadth. Key shifts:
+
+1. **Compact-first outputs** — Default CLI output is now optimized for LLM consumption: one-line-per-item, bounded lists, no ASCII art tables. Verbose/table formats remain available via `--format`.
+
+2. **Pipeline slimming** — The default `analyze` pipeline (`--stages core`) focuses on high-signal stages (extract, callgraph, metrics). Optional stages (semantic search, advanced clustering) are gated behind `--stages full`.
+
+3. **DB staleness awareness** — New metadata tracks when analysis was run, against which commit, and tool version. A `status` command surfaces staleness so agents avoid stale data.
+
+**Why?** LLM agents pay per token. Every extra line of output is cost and latency. v1 optimized for human readability; v2 optimizes for agent efficiency.
+
+## 1) Problem
+LLMs are slow and token-expensive when they have to discover:
+- where code lives (structure),
+- what depends on what (call graph + interface dispatch),
+- what is risky to change (impact, coupling),
+- what is worth refactoring (hotspots),
+- what can be deleted safely (dead-code).
+
+Pure grep/read exploration is:
+- O(N) tool calls,
+- noisy (false positives),
+- not semantically aware (interface dispatch, overrides),
+- very expensive in tokens.
+
+## 2) Goals (next milestone)
+### G1 — Token economy as default
+- Provide `--compact` output across the CLI.
+- Make compact mode the default for agent-facing commands (`context`, `impact`, `callgraph`, `hotspots`, `dead-code`, `coupling`).
+
+### G2 — Make the “agent flow” effortless
+- A single recommended workflow: analyze → context → impact/callgraph.
+- Clear docs for agent integration.
+
+### G3 — Keep only high-leverage features in the default pipeline
+- Make weaker features optional (hash-only semantic search / token-search).
+- Ensure the default stages maximize signal-per-token.
+
+### G4 — Reliability & staleness detection
+- Make it obvious when the db is out-of-date.
+- Provide a cheap staleness check (commit hash + file timestamps).
+
+## 3) Non-goals (this milestone)
+- Multi-repo / monorepo federation.
+- Runtime tracing.
+- Cloud-only dependency (keep local-first).
+- Perfect semantic search quality (optional stage).
+
+## 4) Scope / Deliverables
+### D1 — Output contract: compact-first
+- Add `--format compact|table|json|csv` where applicable.
+- `compact` rules:
+  - one line per row item
+  - stable identifiers
+  - no ASCII tables
+  - bounded lists (top N + “...”) with `--top` / `--max-items`
+
+### D2 — Method identity & selection
+- Consistent, stable `MethodId` in outputs.
+- Allow selecting a method by:
+  - exact fully qualified signature,
+  - substring match,
+  - `--id <MethodId>`.
+
+### D3 — Staleness metadata
+- Store analysis metadata in DB:
+  - analyzedAt
+  - solution path
+  - git commit hash (if available)
+  - tool version
+- Add `ai-code-graph status` (or `ai-code-graph db-info`) that prints:
+  - whether db looks stale
+  - what solution it was built from
+  - last analyzed timestamp
+
+### D4 — Feature gating / pipeline slimming
+- Introduce a simple stage selector:
+  - `ai-code-graph analyze ... --stages core` (default)
+  - `--stages full` (includes optional stages)
+- `core` stages should include: extract, callgraph, metrics, (optional) hash-embed only if required by duplicates/clusters.
+- Optional stages: token-search/semantic-search improvements.
+
+### D5 — Documentation refresh
+- Add a “LLM quickstart” doc focused on minimal context.
+- Keep README short; move deep docs to `docs/`.
+
+## 5) User Stories
+1. As an LLM agent, I can run `context` and get a small, deterministic summary for a method before editing.
+2. As an engineer, I can quickly identify the riskiest modules (coupling/instability) before introducing changes.
+3. As an engineer, I can identify top complexity hotspots without reading the entire repo.
+4. As an engineer, I can spot likely dead code safely.
+5. As an LLM agent, I can detect staleness and avoid using outdated graphs.
+
+## 6) Acceptance Criteria
+- `context` output in compact mode is <= ~25 lines for typical methods.
+- `hotspots`, `dead-code`, `coupling` have bounded outputs by default.
+- `db-info/status` clearly indicates when db is likely stale.
+- CLI help documents compact mode and recommended flows.
+- No regression in existing command names/options without a compatibility note.
+
+## 7) Risks
+- Refactoring CLI output may break scripts → mitigate with `--format json` stability.
+- Staleness heuristics can produce false positives → provide “best-effort” and clear messaging.
+
+## 8) Notes
+This PRD intentionally optimizes for **signal-per-token**. If a feature does not improve signal-per-token, it should be optional.