Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a45d8c7
Improve OpenAI embedding error handling and model selection
sakrut Feb 1, 2026
6a66f2b
Add next-milestone PRD, Task Master plan, and token-efficient docs
sakrut Feb 1, 2026
5d36113
Expand next-milestone tasks (64-73) with subtasks
sakrut Feb 2, 2026
2cd6f93
Add v1→v2 changelog section to next-milestone PRD (task 64)
sakrut Feb 3, 2026
b6e1311
Add --format compact|table|json|csv with compact as default (task 65)
sakrut Feb 3, 2026
60b22bf
Add --id option for stable method selection (task 66)
sakrut Feb 3, 2026
401fed9
Add status command with staleness detection (task 67)
sakrut Feb 3, 2026
7324583
Add --stages core|full option for pipeline slimming (task 68)
sakrut Feb 3, 2026
298bc28
Streamline docs: compact LLM quickstart + trim README (task 69)
sakrut Feb 3, 2026
de00048
MCP: compact responses, bounded defaults, MethodId in output (task 70)
sakrut Feb 3, 2026
6361807
Add CLI output snapshot tests for regression prevention
sakrut Feb 3, 2026
dd8403f
Add GraphTraversalEngine with configurable strategies
sakrut Feb 3, 2026
5f933a4
Add LayerDetector with architectural pattern matching
sakrut Feb 3, 2026
1cdc2a0
Add TypeLayers storage support for architectural layer detection
sakrut Feb 3, 2026
2ac452a
Add dependency-direction refinement to LayerDetector
sakrut Feb 3, 2026
97b4087
Add layers CLI command for architectural layer assignments
sakrut Feb 3, 2026
b88ed15
Add BlastRadiusAnalyzer for transitive impact computation
sakrut Feb 3, 2026
5a17076
Add BlastRadius and BlastDepth columns to Metrics table
sakrut Feb 3, 2026
a099fbd
Integrate blast radius computation into analysis pipeline
sakrut Feb 3, 2026
c64a3ef
Add blast radius display to hotspots and context commands
sakrut Feb 3, 2026
3f09f8f
Add GraphQuery record hierarchy for unified query schema
sakrut Feb 3, 2026
552c120
Add GraphQueryValidator for query validation
sakrut Feb 3, 2026
d64b3e3
Add GraphQueryExecutor with TraversalEngine bridge
sakrut Feb 3, 2026
8db13da
Add query plan caching for GraphQueryExecutor
sakrut Feb 3, 2026
23f7061
Add JSON serialization and query command for GraphQuery
sakrut Feb 3, 2026
7bf0a58
Add forbidden dependency detection with check-deps command
sakrut Feb 3, 2026
81c5321
Add quick query options to QueryCommand for easier CLI usage
sakrut Feb 3, 2026
95d0ab5
Add cg_query MCP tool for unified graph queries
sakrut Feb 3, 2026
61b4c1a
Add ProtectedZoneManager for 'do not touch' zone marking
sakrut Feb 3, 2026
480639b
Integrate protected zone warnings into context, impact, callgraph, an…
sakrut Feb 3, 2026
8e90c9f
Add architectural summary to context command
sakrut Feb 3, 2026
d76d425
Deprecate token/semantic search in favor of graph query
sakrut Feb 3, 2026
c085ba5
Update TaskMaster tasks status to done
sakrut Feb 3, 2026
dc22428
remove unessery doc
sakrut Feb 4, 2026
7065cc9
0.3.0
sakrut Feb 4, 2026
271176c
Sync all slash commands with CLI commands
sakrut Feb 4, 2026
364c6df
fix windows issue with unicodes.
Feb 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .claude/commands/cg/check-deps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Check for forbidden dependencies: $ARGUMENTS

Steps:
1. Run `ai-code-graph check-deps --db ./ai-code-graph/graph.db` (use $ARGUMENTS for custom rules if provided)
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
3. Present any violations of dependency rules:
- Layer violations (e.g., Domain -> Infrastructure)
- Circular dependencies
- Forbidden namespace dependencies
4. For each violation, show the dependency chain and suggest how to fix it
5. If no violations found, confirm the architecture is clean
11 changes: 11 additions & 0 deletions .claude/commands/cg/layers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Show architectural layer assignments: $ARGUMENTS

Steps:
1. Run `ai-code-graph layers --db ./ai-code-graph/graph.db` (filter by $ARGUMENTS if provided)
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
3. Present the layer assignments showing which namespaces/types belong to which architectural layers:
- Presentation (Controllers, Views, Pages)
- Application (Services, Handlers, UseCases)
- Domain (Entities, ValueObjects, Aggregates)
- Infrastructure (Repositories, DbContexts, External)
4. Highlight any layer violations (e.g., Domain depending on Infrastructure)
13 changes: 13 additions & 0 deletions .claude/commands/cg/query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Graph-based method retrieval: $ARGUMENTS

Steps:
1. Parse $ARGUMENTS for quick options or JSON query:
- `--callers MethodName` -> find all callers of a method
- `--callees MethodName` -> find all callees of a method
- `--impact MethodName` -> transitive impact analysis
- `--cluster ClusterLabel` -> methods in a cluster
- JSON query for advanced use
2. Run `ai-code-graph query $ARGUMENTS --db ./ai-code-graph/graph.db`
3. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
4. Present the results with method IDs for stable references
5. Use `--format json` for structured output if needed
3 changes: 3 additions & 0 deletions .claude/commands/cg/semantic-search.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
Search code by semantic meaning: $ARGUMENTS

Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
Use semantic-search as a fallback when you need natural language matching or when query returns no results.

Steps:
1. Run `ai-code-graph semantic-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
Expand Down
11 changes: 11 additions & 0 deletions .claude/commands/cg/status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Show database status and staleness detection.

Steps:
1. Run `ai-code-graph status --db ./ai-code-graph/graph.db`
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
3. Present the status information:
- Database path and size
- Last analysis timestamp
- Method/type/namespace counts
- Staleness indicator (files changed since last analysis)
4. If database is stale, suggest re-running `ai-code-graph analyze`
3 changes: 3 additions & 0 deletions .claude/commands/cg/token-search.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
Search code by token overlap: $ARGUMENTS

Note: For most use cases, use `/cg:query` instead for graph-based retrieval (faster, deterministic).
Use token-search as a fallback when query returns no results or when you need fuzzy text matching.

Steps:
1. Run `ai-code-graph token-search "$ARGUMENTS" --top 10 --db ./ai-code-graph/graph.db`
2. If the database doesn't exist, inform the user to run `ai-code-graph analyze` first
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Thumbs.db
# AI Code Graph output
ai-code-graph/

# Local benchmark artifacts (generated)
benchmark/

# Test results
TestResults/
*.trx
Expand Down
90 changes: 90 additions & 0 deletions .taskmaster/docs/prd-gpt-direction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# ai-code-graph — Product Direction & Technical Roadmap (GPT PDR)

> Source: user-provided PDR. Assumption: this document is correct and should drive planning.

## 1) What This Repository IS (and IS NOT)

### IS: Semantic Code Intelligence Engine for AI Agents in Legacy .NET
- Roslyn-based semantic graph as the source of truth
- Precomputed, deterministic analysis
- AI agents consume facts, never infer architecture
- CLI / MCP-first integration (Claude Code, Codex, Continue)

### IS NOT
- Not a coding agent
- Not an IDE replacement
- Not a generic RAG framework
- Not a vector-search-first system

## 2) Core Principles (Non-Negotiable)
1. Roslyn > LLM inference
2. Graph-first, AI-second
3. Precompute what is expensive
4. .NET-first focus (avoid multi-language dilution)

## 3) Current Strengths (Keep & Double Down)
- Roslyn semantic graph (accurate symbol resolution, call graphs, dependencies, generics, DI)
- Precomputed graph as a knowledge base (fast, deterministic, stable across sessions)
- MCP / tool interface (`cg:*`) for infra-level integration

## 4) Key Problems to Fix

### 4.1 Token search as primary retrieval
Problem: shallow relevance, no structural understanding.
Direction: replace with graph-first retrieval: graph traversal → ranking → optional vector recall.

### 4.2 No formal query model
Problem: many commands, no unified query abstraction.
Direction: introduce a Graph Query Schema (seed/expand/depth/filters/rank). Benefits: easier for AI, cacheable, testable.

### 4.3 Missing architectural facts
Problem: architecture is implicit.
Direction: precompute architectural primitives:
- layer detection (API/Application/Domain/Infra)
- hotspots (churn + complexity)
- blast radius
- forbidden dependencies
- “do not touch” zones

## 5) What to explicitly avoid
- Generic vector RAG as the primary approach
- Competing with agents/IDEs via UX/codegen

## 6) Strategic positioning
ai-code-graph = Semantic Code Intelligence Layer for AI agents working in legacy .NET.
Target users: senior devs, tech leads, architects, AI-assisted teams onboarding legacy systems.

## 7) Recommended technical roadmap

### Sprint 1 — Graph-native retrieval
- graph traversal engine
- ranking strategies: blast radius, complexity, coupling
- replace token search as default

### Sprint 2 — Query & architecture layer
- unified query schema
- architectural facts extraction
- layer detection
- dependency violation detection

### Sprint 3 — Hybrid retrieval (optional)
- embeddings per graph node
- vector search only for recall
- graph always decides relevance

### Sprint 4 — Memory integration
- integrate with Zep / Mem0
- store decisions, historical reasons, danger zones

## 8) Ideal AI workflow
1) AI asks high-level question
2) ai-code-graph returns subgraph + architectural facts + ranked nodes
3) AI reasons on stable context
4) coding agent executes changes

## 9) Success criteria
- fewer tokens required
- fewer exploratory calls
- stable understanding across sessions
- safer refactors
- faster onboarding
114 changes: 114 additions & 0 deletions .taskmaster/docs/prd-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# AI Code Graph — Next Milestone PRD (Token-Efficient Code Navigation for LLMs)

## 0) Intent
Refocus AI Code Graph into a **high-signal / low-token** code navigation layer for LLM agents working on .NET repos.

Primary value proposition: **fast, semantically correct context reconstruction** (call graph + complexity + coupling + dead-code) with minimal output.

## What Changed vs v1

This milestone prioritizes **token economy** over feature breadth. Key shifts:

1. **Compact-first outputs** — Default CLI output is now optimized for LLM consumption: one-line-per-item, bounded lists, no ASCII art tables. Verbose/table formats remain available via `--format`.

2. **Pipeline slimming** — The default `analyze` pipeline (`--stages core`) focuses on high-signal stages (extract, callgraph, metrics). Optional stages (semantic search, advanced clustering) are gated behind `--stages full`.

3. **DB staleness awareness** — New metadata tracks when analysis was run, against which commit, and tool version. A `status` command surfaces staleness so agents avoid stale data.

**Why?** LLM agents pay per token. Every extra line of output is cost and latency. v1 optimized for human readability; v2 optimizes for agent efficiency.

## 1) Problem
LLMs are slow and token-expensive when they have to discover:
- where code lives (structure),
- what depends on what (call graph + interface dispatch),
- what is risky to change (impact, coupling),
- what is worth refactoring (hotspots),
- what can be deleted safely (dead-code).

Pure grep/read exploration is:
- O(N) tool calls,
- noisy (false positives),
- not semantically aware (interface dispatch, overrides),
- very expensive in tokens.

## 2) Goals (next milestone)
### G1 — Token economy as default
- Provide `--compact` output across the CLI.
- Make compact mode the default for agent-facing commands (`context`, `impact`, `callgraph`, `hotspots`, `dead-code`, `coupling`).

### G2 — Make the “agent flow” effortless
- A single recommended workflow: analyze → context → impact/callgraph.
- Clear docs for agent integration.

### G3 — Keep only high-leverage features in the default pipeline
- Make weaker features optional (hash-only semantic search / token-search).
- Ensure the default stages maximize signal-per-token.

### G4 — Reliability & staleness detection
- Make it obvious when the db is out-of-date.
- Provide a cheap staleness check (commit hash + file timestamps).

## 3) Non-goals (this milestone)
- Multi-repo / monorepo federation.
- Runtime tracing.
- Cloud-only dependency (keep local-first).
- Perfect semantic search quality (optional stage).

## 4) Scope / Deliverables
### D1 — Output contract: compact-first
- Add `--format compact|table|json|csv` where applicable.
- `compact` rules:
- one line per row item
- stable identifiers
- no ASCII tables
- bounded lists (top N + “...”) with `--top` / `--max-items`

### D2 — Method identity & selection
- Consistent, stable `MethodId` in outputs.
- Allow selecting a method by:
- exact fully qualified signature,
- substring match,
- `--id <MethodId>`.

### D3 — Staleness metadata
- Store analysis metadata in DB:
- analyzedAt
- solution path
- git commit hash (if available)
- tool version
- Add `ai-code-graph status` (or `ai-code-graph db-info`) that prints:
- whether db looks stale
- what solution it was built from
- last analyzed timestamp

### D4 — Feature gating / pipeline slimming
- Introduce a simple stage selector:
- `ai-code-graph analyze ... --stages core` (default)
- `--stages full` (includes optional stages)
- `core` stages should include: extract, callgraph, metrics, (optional) hash-embed only if required by duplicates/clusters.
- Optional stages: token-search/semantic-search improvements.

### D5 — Documentation refresh
- Add a “LLM quickstart” doc focused on minimal context.
- Keep README short; move deep docs to `docs/`.

## 5) User Stories
1. As an LLM agent, I can run `context` and get a small, deterministic summary for a method before editing.
2. As an engineer, I can quickly identify the riskiest modules (coupling/instability) before introducing changes.
3. As an engineer, I can identify top complexity hotspots without reading the entire repo.
4. As an engineer, I can spot likely dead code safely.
5. As an LLM agent, I can detect staleness and avoid using outdated graphs.

## 6) Acceptance Criteria
- `context` output in compact mode is <= ~25 lines for typical methods.
- `hotspots`, `dead-code`, `coupling` have bounded outputs by default.
- `db-info/status` clearly indicates when db is likely stale.
- CLI help documents compact mode and recommended flows.
- No regression in existing command names/options without a compatibility note.

## 7) Risks
- Refactoring CLI output may break scripts → mitigate with `--format json` stability.
- Staleness heuristics can produce false positives → provide “best-effort” and clear messaging.

## 8) Notes
This PRD intentionally optimizes for **signal-per-token**. If a feature does not improve signal-per-token, it should be optional.
Loading