Skip to content

Commit 5886ce5

Browse files
sakrutclaudeKrystian Mikrut
authored
Major release introducing a unified graph query system, architectural analysis capabilities, and significant CLI improvements for better LLM integration. (#1)
* Improve OpenAI embedding error handling and model selection * Add next-milestone PRD, Task Master plan, and token-efficient docs * Expand next-milestone tasks (64-73) with subtasks * Add v1→v2 changelog section to next-milestone PRD (task 64) Documents the key shifts: compact-first outputs, pipeline slimming, and DB staleness awareness. Explains why token economy is the priority. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --format compact|table|json|csv with compact as default (task 65) - Add OutputFormat enum and OutputOptions shared helper - Add docs/output-contract.md specifying compact format rules - Update agent-facing commands to default to compact format: - hotspots, dead-code, coupling, callgraph, impact, context - Compact format: one line per item, bounded lists, stable IDs - JSON schema uses consistent field names (methodId, items, metadata) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --id option for stable method selection (task 66) - Add MethodResolver helper for consistent method resolution - Add --id option to context, callgraph, impact, similar commands - Resolution precedence: --id > exact match > substring match - Context command now prints method ID for agent copy-paste - Update output-contract.md and LLM-QUICKSTART.md with --id examples Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add status command with staleness detection (task 67) - Save analysis metadata: analyzed_at, solution_path, tool_version, git_commit - Add 'status' command to show db info and staleness check - Staleness heuristics: git commit change, source file modification times - Add GitHelpers.GetCurrentCommitHash() and GetLastModifiedTime() - Compact, table, and JSON output formats supported Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --stages core|full option for pipeline slimming (task 68) - Add --stages option to analyze command (default: core) - core: all stages except clustering (fast, essential features) - full: all stages including intent clustering - Save stages metadata for status command - ClustersCommand shows helpful message when run after core analysis Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Streamline docs: compact LLM quickstart + trim README (task 69) - Update LLM-QUICKSTART.md with new features (status, --stages, --id) - Trim README from 329 to 109 lines (67% reduction) - Add links to detailed docs (output-contract, LLM quickstart) - Focus README on essentials: install, quick start, command table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MCP: compact responses, bounded defaults, MethodId in output (task 70) - ContextHandler: Include MethodId in output for agent copy-paste - QueryHandler hotspots: Compact one-line-per-item format - QueryHandler dead-code: Add top parameter, compact format - All handlers now default to bounded outputs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add CLI output snapshot tests for regression prevention - Add SnapshotTests.cs with golden file comparisons for: - hotspots (compact/json) - dead-code (compact/json) - callgraph (compact/json) - impact (compact/json) - coupling (compact/json) - context (compact/json) - tree (compact) - Add snapshot update workflow: UPDATE_SNAPSHOTS=1 dotnet test - Add docs/snapshot-testing.md documenting the workflow - Fix CliCommandTests JSON key expectation (hotspots -> items) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add GraphTraversalEngine with configurable strategies Implements Task 74: Graph Traversal Engine with Configurable Strategies Core components: - TraversalTypes.cs: Direction, Strategy, Ranking enums - TraversalConfig.cs: Configuration record with validation - TraversalResult.cs: Node, Edge, and Result records - FilterConfig.cs: Namespace/type/accessibility filtering - GraphTraversalEngine.cs: BFS/DFS traversal with ranking Features: - BFS and DFS traversal strategies - Direction control: Callers, Callees, Both - Configurable depth limits and max results - Four ranking strategies: - BlastRadius: Transitive caller count - Complexity: Cognitive complexity from metrics - Coupling: Afferent + Efferent coupling - Combined: Weighted normalized combination - Session-level caching for performance - Filter support for namespaces, types, accessibility Tests: 37 new tests covering types, traversal, and ranking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add LayerDetector with architectural pattern matching Implements Task 76.1: Core layer detection infrastructure - ArchitecturalLayer enum: Presentation, Application, Domain, Infrastructure, Shared, Unknown - LayerAssignment record for storing detection results - LayerDetector with pattern-based detection: - Default patterns for Clean Architecture/DDD - Type name hints (Controller, Repository, Handler suffixes) - Confidence scoring (1.0 for exact match, 0.5 for partial) - Dependency validation rules for Clean Architecture - 12 unit tests covering detection and validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add TypeLayers storage support for architectural layer detection Implements Task 76.2: Storage methods for layer assignments Schema changes: - Add TypeLayers table (TypeId, Layer, Confidence, Reason) - Add IX_TypeLayers_Layer index IStorageService interface additions: - SaveLayerAssignmentsAsync - GetLayerAssignmentsAsync - GetLayerForTypeAsync StorageService implementation with proper transaction handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add dependency-direction refinement to LayerDetector Implements RefineByDependencyDirectionAsync() that analyzes call graph dependencies between types and adjusts layer assignment confidence based on Clean Architecture dependency rules: - Lowers confidence when types violate allowed dependency directions (e.g., Domain depending on Infrastructure) - Boosts confidence for low-confidence types with consistent valid deps - Clamps confidence to minimum 0.1 to prevent negative values - Adds violation warnings to Reason field Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add layers CLI command for architectural layer assignments Implements LayersCommand following ICommandHandler pattern: - Displays type-to-layer assignments with confidence scores - Supports filtering by layer (--layer) and minimum confidence - Outputs in compact, table, JSON, or CSV formats - Shows violations with [!] marker in compact mode Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add BlastRadiusAnalyzer for transitive impact computation Implements blast radius computation using BFS on reverse call graph: - Counts direct and transitive callers for each method - Computes depth (max distance from entry points) - Identifies entry points (methods with no callers) that can trigger code - Performance: handles 1000+ methods in <2 seconds Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add BlastRadius and BlastDepth columns to Metrics table - Extends schema with BlastRadius and BlastDepth columns - Adds index on BlastRadius for efficient sorting - Implements SaveBlastRadiusAsync using UPSERT pattern - Updates GetMethodMetricsAsync to return blast radius data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Integrate blast radius computation into analysis pipeline - Adds ComputeBlastRadiusStage to AnalysisStageHelpers - Runs after StoreResultsStage when call graph is available - Shows verbose output with max blast radius and high-impact count - Persists results to Metrics table via SaveBlastRadiusAsync Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add blast radius display to hotspots and context commands Hotspots command: - Adds --sort option (complexity|blast-radius|risk) - Shows blast radius in output when sorting by blast or risk - Includes risk score: complexity * log(blast_radius + 1) Context command: - Shows blast radius with depth and computed risk score Updates GetHotspotsWithThresholdAsync to include BlastRadius/BlastDepth and support custom sort ordering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add GraphQuery record hierarchy for unified query schema Defines complete query model with: - QuerySeed: starting points (MethodId, Pattern, Namespace, Cluster) - QueryExpand: traversal control (Direction, MaxDepth, Transitive) - QueryFilter: inclusion/exclusion rules (namespaces, types, complexity) - QueryRank: result ordering (BlastRadius, Complexity, Coupling, Combined) - QueryOutput: format and limits (Compact/Json/Table, MaxResults) Supporting enums: ExpandDirection, RankStrategy, QueryOutputFormat Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add GraphQueryValidator for query validation Implements validation rules: - Seed must have at least one non-null property - MaxDepth bounds (0-100) - MinComplexity <= MaxComplexity - No overlapping Include/Exclude namespaces - MaxResults bounds (1-1000) - Empty/whitespace checks for all string properties Includes ValidationResult record and extension method for fluent usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add GraphQueryExecutor with TraversalEngine bridge Implements GraphQueryExecutor that: - Validates GraphQuery via GraphQueryValidator before execution - Resolves seeds (MethodId, MethodPattern, Namespace, Cluster) - Translates GraphQuery to TraversalConfig for GraphTraversalEngine - Handles ExpandDirection.None to return seed-only results - Applies ranking strategies (Complexity, BlastRadius, Coupling, Combined) - Formats results with optional metrics and location info - Supports MaxResults limiting and execution time tracking Includes 19 comprehensive tests covering all query scenarios. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add query plan caching for GraphQueryExecutor Implements QueryPlanCache with: - Thread-safe ConcurrentDictionary storage - LRU eviction when cache exceeds max size (default 100) - Time-based expiration (default 5 minutes) - SHA256-based query hashing (excludes Output settings) - Hit/miss tracking with GetStats() method GraphQueryExecutor integration: - Optional useCache parameter (default true) - ClearCache() and GetCacheStats() methods - Caches resolved seeds and expand direction Includes 13 QueryPlanCache tests and 4 caching integration tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add JSON serialization and query command for GraphQuery GraphQuerySerializer provides: - JSON serialization with camelCase naming and enum strings - Deserialization with TryDeserialize error handling - JSON Schema generation (draft-07) with full property docs QueryCommand CLI command: - Execute queries from --query-file or inline JSON argument - --schema flag outputs JSON schema - Supports all output formats (compact, json, table) - Validates queries before execution Includes 13 serializer tests covering round-trip, enums, and schema. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add forbidden dependency detection with check-deps command DependencyRuleEngine provides: - Glob pattern matching for source/target namespaces - Built-in Clean Architecture rules (12 default rules) - Custom rules via JSON file with includeDefaults option - Violation detection with file:line locations - Severity levels (Error, Warning, Info) check-deps CLI command: - --rules <file> for custom rules - --show-rules to display loaded rules - --sample to generate example rules.json - Groups violations by rule in output - JSON output format supported Includes 18 tests covering pattern matching, rule loading, and detection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add quick query options to QueryCommand for easier CLI usage - Added --seed option for quick method pattern or ID matching - Added --depth, --direction, --rank, --top options for common parameters - Kept --json and --query-file for full JSON query support - Auto-detects pattern vs exact ID based on wildcards (* or ?) - Updated command description for agent usage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add cg_query MCP tool for unified graph queries - Exposes GraphQueryExecutor via MCP as cg_query tool - Supports seed pattern, direction, depth, rank, and top parameters - Auto-detects wildcards to use pattern vs exact ID lookup - Excludes test methods by default - Token-optimized compact response format Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add ProtectedZoneManager for 'do not touch' zone marking - ProtectedZone model with DoNotModify, RequireApproval, Deprecated levels - JSON config loading from .ai-code-graph/protected-zones.json - Glob pattern matching for method/namespace/type identification - Methods for checking protection and filtering protected methods - 20 unit tests covering all functionality Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Integrate protected zone warnings into context, impact, callgraph, and MCP - Context command: shows warning if method is in protected zone - Impact command: lists protected methods in blast radius - Callgraph command: marks protected methods in call graph - MCP cg_query: includes protection warnings in results Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add architectural summary to context command - Added layer assignment with confidence score - Enhanced blast radius with entry points detection - Added architectural notes section with warnings for: - High blast radius (>50 callers) - High complexity (CC>15) - Protection zone status - Deprecated callee calls - Layer violation detection - Updated snapshot tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Deprecate token/semantic search in favor of graph query - Updated CLI help text for search commands to point to query - Updated MCP tool descriptions to indicate search is fallback - Updated slash commands with deprecation notes - Updated CLAUDE.md with recommended workflow (query first) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update TaskMaster tasks status to done All tasks completed in this session: - Task 80: Graph-First Query CLI Command - Task 81: MCP Graph Query Tool - Task 79: Protected Zone Marking - Task 83: Architectural Summary in Context - Task 82: Deprecate Token Search - Task 71: Benchmark artifacts gitignore (was already done) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * remove unessery doc * 0.3.0 * Sync all slash commands with CLI commands - Add missing slash commands: query, status, layers, check-deps - Update CLAUDE.md with all 21 user-facing commands - Update SetupClaudeCommand.cs to generate all command files - Add content generators for: impact, dead-code, coupling, diff, semantic-search, query, status, layers, check-deps Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix windows issue with unicodes. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Krystian Mikrut <krmk@softwaremind.com>
1 parent 4f9bbea commit 5886ce5

163 files changed

Lines changed: 21518 additions & 1231 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/cg/check-deps.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Check for forbidden dependencies: $ARGUMENTS
2+
3+
Steps:
4+
1. Run `ai-code-graph check-deps --db ./ai-code-graph/graph.db` (use $ARGUMENTS for custom rules if provided)
5+
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
6+
3. Present any violations of dependency rules:
7+
- Layer violations (e.g., Domain -> Infrastructure)
8+
- Circular dependencies
9+
- Forbidden namespace dependencies
10+
4. For each violation, show the dependency chain and suggest how to fix it
11+
5. If no violations found, confirm the architecture is clean

.claude/commands/cg/layers.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Show architectural layer assignments: $ARGUMENTS
2+
3+
Steps:
4+
1. Run `ai-code-graph layers --db ./ai-code-graph/graph.db` (filter by $ARGUMENTS if provided)
5+
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
6+
3. Present the layer assignments showing which namespaces/types belong to which architectural layers:
7+
- Presentation (Controllers, Views, Pages)
8+
- Application (Services, Handlers, UseCases)
9+
- Domain (Entities, ValueObjects, Aggregates)
10+
- Infrastructure (Repositories, DbContexts, External)
11+
4. Highlight any layer violations (e.g., Domain depending on Infrastructure)

.claude/commands/cg/query.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Graph-based method retrieval: $ARGUMENTS
2+
3+
Steps:
4+
1. Parse $ARGUMENTS for quick options or JSON query:
5+
- `--callers MethodName` -> find all callers of a method
6+
- `--callees MethodName` -> find all callees of a method
7+
- `--impact MethodName` -> transitive impact analysis
8+
- `--cluster ClusterLabel` -> methods in a cluster
9+
- JSON query for advanced use
10+
2. Run `ai-code-graph query $ARGUMENTS --db ./ai-code-graph/graph.db`
11+
3. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
12+
4. Present the results with method IDs for stable references
13+
5. Use `--format json` for structured output if needed

.claude/commands/cg/semantic-search.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
Search code by semantic meaning: $ARGUMENTS
22

3+
Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
4+
Use semantic-search as a fallback when you need natural language matching or when query returns no results.
5+
36
Steps:
47
1. Run `ai-code-graph semantic-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
58
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first

.claude/commands/cg/status.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Show database status and staleness detection.
2+
3+
Steps:
4+
1. Run `ai-code-graph status --db ./ai-code-graph/graph.db`
5+
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
6+
3. Present the status information:
7+
- Database path and size
8+
- Last analysis timestamp
9+
- Method/type/namespace counts
10+
- Staleness indicator (files changed since last analysis)
11+
4. If database is stale, suggest re-running `ai-code-graph analyze`

.claude/commands/cg/token-search.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
Search code by token overlap: $ARGUMENTS
22

3+
Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
4+
Use token-search as a fallback when query returns no results or when you need fuzzy text matching.
5+
36
Steps:
47
1. Run `ai-code-graph token-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
58
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ Thumbs.db
2323
# AI Code Graph output
2424
ai-code-graph/
2525

26+
# Local benchmark artifacts (generated)
27+
benchmark/
28+
2629
# Test results
2730
TestResults/
2831
*.trx
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# ai-code-graph — Product Direction & Technical Roadmap (GPT PDR)
2+
3+
> Source: user-provided PDR. Assumption: this document is correct and should drive planning.
4+
5+
## 1) What This Repository IS (and IS NOT)
6+
7+
### IS: Semantic Code Intelligence Engine for AI Agents in Legacy .NET
8+
- Roslyn-based semantic graph as the source of truth
9+
- Precomputed, deterministic analysis
10+
- AI agents consume facts, never infer architecture
11+
- CLI / MCP-first integration (Claude Code, Codex, Continue)
12+
13+
### IS NOT
14+
- Not a coding agent
15+
- Not an IDE replacement
16+
- Not a generic RAG framework
17+
- Not a vector-search-first system
18+
19+
## 2) Core Principles (Non-Negotiable)
20+
1. Roslyn > LLM inference
21+
2. Graph-first, AI-second
22+
3. Precompute what is expensive
23+
4. .NET-first focus (avoid multi-language dilution)
24+
25+
## 3) Current Strengths (Keep & Double Down)
26+
- Roslyn semantic graph (accurate symbol resolution, call graphs, dependencies, generics, DI)
27+
- Precomputed graph as a knowledge base (fast, deterministic, stable across sessions)
28+
- MCP / tool interface (`cg:*`) for infra-level integration
29+
30+
## 4) Key Problems to Fix
31+
32+
### 4.1 Token search as primary retrieval
33+
Problem: shallow relevance, no structural understanding.
34+
Direction: replace with graph-first retrieval: graph traversal → ranking → optional vector recall.
35+
36+
### 4.2 No formal query model
37+
Problem: many commands, no unified query abstraction.
38+
Direction: introduce a Graph Query Schema (seed/expand/depth/filters/rank). Benefits: easier for AI, cacheable, testable.
39+
40+
### 4.3 Missing architectural facts
41+
Problem: architecture is implicit.
42+
Direction: precompute architectural primitives:
43+
- layer detection (API/Application/Domain/Infra)
44+
- hotspots (churn + complexity)
45+
- blast radius
46+
- forbidden dependencies
47+
- “do not touch” zones
48+
49+
## 5) What to explicitly avoid
50+
- Generic vector RAG as the primary approach
51+
- Competing with agents/IDEs via UX/codegen
52+
53+
## 6) Strategic positioning
54+
ai-code-graph = Semantic Code Intelligence Layer for AI agents working in legacy .NET.
55+
Target users: senior devs, tech leads, architects, AI-assisted teams onboarding legacy systems.
56+
57+
## 7) Recommended technical roadmap
58+
59+
### Sprint 1 — Graph-native retrieval
60+
- graph traversal engine
61+
- ranking strategies: blast radius, complexity, coupling
62+
- replace token search as default
63+
64+
### Sprint 2 — Query & architecture layer
65+
- unified query schema
66+
- architectural facts extraction
67+
- layer detection
68+
- dependency violation detection
69+
70+
### Sprint 3 — Hybrid retrieval (optional)
71+
- embeddings per graph node
72+
- vector search only for recall
73+
- graph always decides relevance
74+
75+
### Sprint 4 — Memory integration
76+
- integrate with Zep / Mem0
77+
- store decisions, historical reasons, danger zones
78+
79+
## 8) Ideal AI workflow
80+
1) AI asks high-level question
81+
2) ai-code-graph returns subgraph + architectural facts + ranked nodes
82+
3) AI reasons on stable context
83+
4) coding agent executes changes
84+
85+
## 9) Success criteria
86+
- fewer tokens required
87+
- fewer exploratory calls
88+
- stable understanding across sessions
89+
- safer refactors
90+
- faster onboarding

.taskmaster/docs/prd-next.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# AI Code Graph — Next Milestone PRD (Token-Efficient Code Navigation for LLMs)
2+
3+
## 0) Intent
4+
Refocus AI Code Graph into a **high-signal / low-token** code navigation layer for LLM agents working on .NET repos.
5+
6+
Primary value proposition: **fast, semantically correct context reconstruction** (call graph + complexity + coupling + dead-code) with minimal output.
7+
8+
## What Changed vs v1
9+
10+
This milestone prioritizes **token economy** over feature breadth. Key shifts:
11+
12+
1. **Compact-first outputs** — Default CLI output is now optimized for LLM consumption: one-line-per-item, bounded lists, no ASCII art tables. Verbose/table formats remain available via `--format`.
13+
14+
2. **Pipeline slimming** — The default `analyze` pipeline (`--stages core`) focuses on high-signal stages (extract, callgraph, metrics). Optional stages (semantic search, advanced clustering) are gated behind `--stages full`.
15+
16+
3. **DB staleness awareness** — New metadata tracks when analysis was run, against which commit, and tool version. A `status` command surfaces staleness so agents avoid stale data.
17+
18+
**Why?** LLM agents pay per token. Every extra line of output is cost and latency. v1 optimized for human readability; v2 optimizes for agent efficiency.
19+
20+
## 1) Problem
21+
LLMs are slow and token-expensive when they have to discover:
22+
- where code lives (structure),
23+
- what depends on what (call graph + interface dispatch),
24+
- what is risky to change (impact, coupling),
25+
- what is worth refactoring (hotspots),
26+
- what can be deleted safely (dead-code).
27+
28+
Pure grep/read exploration is:
29+
- O(N) tool calls,
30+
- noisy (false positives),
31+
- not semantically aware (interface dispatch, overrides),
32+
- very expensive in tokens.
33+
34+
## 2) Goals (next milestone)
35+
### G1 — Token economy as default
36+
- Provide `--compact` output across the CLI.
37+
- Make compact mode the default for agent-facing commands (`context`, `impact`, `callgraph`, `hotspots`, `dead-code`, `coupling`).
38+
39+
### G2 — Make the “agent flow” effortless
40+
- A single recommended workflow: analyze → context → impact/callgraph.
41+
- Clear docs for agent integration.
42+
43+
### G3 — Keep only high-leverage features in the default pipeline
44+
- Make weaker features optional (hash-only semantic search / token-search).
45+
- Ensure the default stages maximize signal-per-token.
46+
47+
### G4 — Reliability & staleness detection
48+
- Make it obvious when the db is out-of-date.
49+
- Provide a cheap staleness check (commit hash + file timestamps).
50+
51+
## 3) Non-goals (this milestone)
52+
- Multi-repo / monorepo federation.
53+
- Runtime tracing.
54+
- Cloud-only dependency (keep local-first).
55+
- Perfect semantic search quality (optional stage).
56+
57+
## 4) Scope / Deliverables
58+
### D1 — Output contract: compact-first
59+
- Add `--format compact|table|json|csv` where applicable.
60+
- `compact` rules:
61+
- one line per row item
62+
- stable identifiers
63+
- no ASCII tables
64+
- bounded lists (top N + “...”) with `--top` / `--max-items`
65+
66+
### D2 — Method identity & selection
67+
- Consistent, stable `MethodId` in outputs.
68+
- Allow selecting a method by:
69+
- exact fully qualified signature,
70+
- substring match,
71+
- `--id <MethodId>`.
72+
73+
### D3 — Staleness metadata
74+
- Store analysis metadata in DB:
75+
- analyzedAt
76+
- solution path
77+
- git commit hash (if available)
78+
- tool version
79+
- Add `ai-code-graph status` (or `ai-code-graph db-info`) that prints:
80+
- whether db looks stale
81+
- what solution it was built from
82+
- last analyzed timestamp
83+
84+
### D4 — Feature gating / pipeline slimming
85+
- Introduce a simple stage selector:
86+
- `ai-code-graph analyze ... --stages core` (default)
87+
- `--stages full` (includes optional stages)
88+
- `core` stages should include: extract, callgraph, metrics, (optional) hash-embed only if required by duplicates/clusters.
89+
- Optional stages: token-search/semantic-search improvements.
90+
91+
### D5 — Documentation refresh
92+
- Add a “LLM quickstart” doc focused on minimal context.
93+
- Keep README short; move deep docs to `docs/`.
94+
95+
## 5) User Stories
96+
1. As an LLM agent, I can run `context` and get a small, deterministic summary for a method before editing.
97+
2. As an engineer, I can quickly identify the riskiest modules (coupling/instability) before introducing changes.
98+
3. As an engineer, I can identify top complexity hotspots without reading the entire repo.
99+
4. As an engineer, I can spot likely dead code safely.
100+
5. As an LLM agent, I can detect staleness and avoid using outdated graphs.
101+
102+
## 6) Acceptance Criteria
103+
- `context` output in compact mode is <= ~25 lines for typical methods.
104+
- `hotspots`, `dead-code`, `coupling` have bounded outputs by default.
105+
- `db-info/status` clearly indicates when db is likely stale.
106+
- CLI help documents compact mode and recommended flows.
107+
- No regression in existing command names/options without a compatibility note.
108+
109+
## 7) Risks
110+
- Refactoring CLI output may break scripts → mitigate with `--format json` stability.
111+
- Staleness heuristics can produce false positives → provide “best-effort” and clear messaging.
112+
113+
## 8) Notes
114+
This PRD intentionally optimizes for **signal-per-token**. If a feature does not improve signal-per-token, it should be optional.

0 commit comments

Comments
 (0)