From 31e5d748bc65681686642e19252282a440785520 Mon Sep 17 00:00:00 2001 From: Giancarlo Erra Date: Sun, 15 Mar 2026 23:47:46 +0000 Subject: [PATCH] feat: add Claude Code plugin with skills, agent, and MCP bundling - Add .claude-plugin/plugin.json with MCP server reference and hooks - Add codebase-exploration skill with search-before-reading workflow - Add codebase-management skill with indexing and troubleshooting guides - Add codebase-explorer delegatable subagent for deep analysis - Add SessionStart hook for duplicate MCP detection - Add .mcp.json for plugin-bundled MCP server config - Update package.json files array to include plugin assets in npm package - Add release-it after:bump hook to sync plugin.json version - Update README with plugin install badge, instructions, and guidance --- .claude-plugin/plugin.json | 16 ++ .mcp.json | 8 + .release-it.json | 3 + README.md | 76 +++++- agents/codebase-explorer.md | 53 ++++ hooks/hooks.json | 14 + package.json | 5 + skills/codebase-exploration/SKILL.md | 86 ++++++ .../references/tool-reference.md | 190 +++++++++++++ skills/codebase-management/SKILL.md | 98 +++++++ .../references/tool-reference.md | 253 ++++++++++++++++++ 11 files changed, 794 insertions(+), 8 deletions(-) create mode 100644 .claude-plugin/plugin.json create mode 100644 .mcp.json create mode 100644 agents/codebase-explorer.md create mode 100644 hooks/hooks.json create mode 100644 skills/codebase-exploration/SKILL.md create mode 100644 skills/codebase-exploration/references/tool-reference.md create mode 100644 skills/codebase-management/SKILL.md create mode 100644 skills/codebase-management/references/tool-reference.md diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..855ac2c --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,16 @@ +{ + "name": "socraticode", + "version": "1.0.1", + "description": "Codebase intelligence — semantic search workflows, dependency graph analysis, and context artifact exploration for SocratiCode", + "author": { + "name": "Giancarlo Erra", + "email": "giancarlo@altaire.com", + "url": "https://altaire.com" + }, + "homepage": "https://github.com/giancarloerra/socraticode", + "repository": "https://github.com/giancarloerra/socraticode", + "license": "AGPL-3.0-only", + "keywords": ["codebase", "search", "indexing", "dependency-graph", "semantic-search", "mcp"], + "mcpServers": "./.mcp.json", + "hooks": "./hooks/hooks.json" +} diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 0000000..e423d5c --- /dev/null +++ b/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "socraticode": { + "command": "npx", + "args": ["-y", "socraticode"] + } + } +} diff --git a/.release-it.json b/.release-it.json index 31491ec..2f5c00d 100644 --- a/.release-it.json +++ b/.release-it.json @@ -1,4 +1,7 @@ { + "hooks": { + "after:bump": "node -e \"const fs=require('fs'); const p=JSON.parse(fs.readFileSync('.claude-plugin/plugin.json','utf8')); p.version='${version}'; fs.writeFileSync('.claude-plugin/plugin.json', JSON.stringify(p,null,2)+'\\n');\"" + }, "git": { "commitMessage": "chore: release v${version}", "tagName": "v${version}", diff --git a/README.md b/README.md index e9a786d..0c5eb9a 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@

+ Install Claude Code Plugin Install in VS Code Install in VS Code Insiders Install in Cursor @@ -28,16 +29,26 @@ > If SocratiCode has been useful to you, please ⭐ **star this repo** — it helps others discover it — and share it with your dev team and fellow developers! -**One thing, done well: deep codebase intelligence — zero setup, no bloat, fully automatic.** SocratiCode gives AI assistants deep semantic understanding of your codebase — hybrid search, polyglot code dependency graphs, and searchable context artifacts (database schemas, API specs, infra configs, architecture docs). Zero configuration — add it to any MCP host and it manages everything automatically. +**One thing, done well: deep codebase intelligence — zero setup, no bloat, fully automatic.** SocratiCode gives AI assistants deep semantic understanding of your codebase — hybrid search, polyglot code dependency graphs, and searchable context artifacts (database schemas, API specs, infra configs, architecture docs). Zero configuration — add it to **any MCP host**, or install the **Claude Code plugin** for built-in workflow skills. It manages everything automatically. **Production-ready**, battle-tested on **enterprise-level** large repositories (up to and over **~40 million lines of code**). **Batched**, automatic **resumable** indexing checkpoints progress — pauses, crashes, restarts, and interruptions don't lose work. The file watcher keeps the **index automatically updated** at every file change and across sessions. **Multi-agent ready** — multiple AI agents can work on the same codebase simultaneously, sharing a single index with automatic coordination and zero configuration. **Private and local by default** — Docker handles everything, no API keys required, no data leaves your machine. **Cloud ready** for embeddings (OpenAI, Google Gemini) and Qdrant, and a **full suite of configuration options** are all available when you need them. -The first Qdrant‑based MCP server that pairs auto‑managed, zero‑config local Docker deployment with **AST‑aware code chunking, hybrid semantic + BM25 (RRF‑fused) code search**, polyglot dependency **graphs** with circular‑dependency visualization, and searchable **infra/API/database artifacts** in a single focused, zero-config and easy to use code intelligence engine. +The first Qdrant‑based MCP/Claude Plugin/Skill that pairs auto‑managed, zero‑config local Docker deployment with **AST‑aware code chunking, hybrid semantic + BM25 (RRF‑fused) code search**, polyglot dependency **graphs** with circular‑dependency visualization, and searchable **infra/API/database artifacts** in a single focused, zero-config and easy to use code intelligence engine. > **Benchmarked on VS Code (2.45M lines):** SocratiCode uses **61% less context**, **84% fewer tool calls**, and is **37x faster** than grep‑based exploration — tested live with Claude Opus 4.6. [See the full benchmark →](#real-world-benchmark-vs-code-245m-lines-of-code-with-claude-opus-46) +

+ + + + + Star History Chart + + +

+ ## Contents - [Quick Start](#quick-start) @@ -63,9 +74,10 @@ The first Qdrant‑based MCP server that pairs auto‑managed, zero‑config loc > **Only [Docker](https://www.docker.com/products/docker-desktop/) (running) required.** -**One-click install** — VS Code and Cursor: +**One-click install** — Claude Code, VS Code and Cursor,: -[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install_MCP_Server-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=socraticode&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22socraticode%22%5D%7D) [![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install_MCP_Server-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=socraticode&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22socraticode%22%5D%7D&quality=insiders) [![Install in Cursor](https://img.shields.io/badge/Cursor-Install_MCP_Server-F14C28?style=flat-square&logo=cursor&logoColor=white)](cursor://anysphere.cursor-deeplink/mcp/install?name=socraticode&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsInNvY3JhdGljb2RlIl19) +[![Install Claude Code Plugin](https://img.shields.io/badge/Claude_Code-Install_Plugin-CC785C?style=flat-square&logoColor=white)](#claude-code-plugin-recommended-for-claude-code-users) +[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install_MCP_Server-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=socraticode&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22socraticode%22%5D%7D) [![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install_MCP_Server-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=socraticode&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22socraticode%22%5D%7D&quality=insiders) [![Install in Cursor](https://img.shields.io/badge/Cursor-Install_MCP_Server-F14C28?style=flat-square&logo=cursor&logoColor=white)](cursor://anysphere.cursor-deeplink/mcp/install?name=socraticode&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsInNvY3JhdGljb2RlIl19) **All MCP hosts** — add the following to your `mcpServers` (Claude Desktop, Windsurf, Cline, Roo Code) or `servers` (VS Code project-local `.vscode/mcp.json`) config: @@ -76,7 +88,13 @@ The first Qdrant‑based MCP server that pairs auto‑managed, zero‑config loc } ``` -**Claude Code** — run this command: +**Claude Code** — install the plugin (recommended, includes workflow skills for best results): + +```bash +claude plugin add --from-github giancarloerra/socraticode +``` + +Or as MCP only (without skills): ```bash claude mcp add socraticode -- npx -y socraticode @@ -100,13 +118,15 @@ Restart your host. On first use SocratiCode automatically pulls Docker images, s > **Recommended**: For best results, add the [Agent Instructions](#agent-instructions) to your AI assistant's system prompt or project instructions file (`CLAUDE.md`, `AGENTS.md`, etc.). The key principle — **search before reading** — helps your AI use SocratiCode's tools effectively and avoid unnecessary file reads. +> **Claude Code users**: If you installed the SocratiCode plugin, the Agent Instructions are included automatically as skills — no need to add them to your `CLAUDE.md`. The plugin also bundles the MCP server, so you don't need a separate `claude mcp add`. + > **Advanced**: cloud embeddings (OpenAI / Google), external Qdrant, remote Ollama, native Ollama, and dozens of tuning options are all available. See [Configuration](#configuration) below. ## Why SocratiCode I built SocratiCode because I regularly work on existing, large, and complex codebases across different languages and need to quickly understand them and act. Existing solutions were either too limited, insufficiently tested for production use, or bloated with unnecessary complexity. I wanted a single focused tool that does deep codebase intelligence well — zero setup, no bloat, fully automatic — and gets out of the way. -- **True Zero Configuration** — Just add the MCP server to your AI host config. The server automatically pulls Docker images, starts Qdrant and Ollama containers, and downloads the embedding model on first use. No config files, no YAML, no environment variables to tune, no native dependencies to compile, no commands to type. Works everywhere Docker runs. +- **True Zero Configuration** — Just install the Claude Plugin/Skill or add the MCP server to your AI host config. The server automatically pulls Docker images, starts Qdrant and Ollama containers, and downloads the embedding model on first use. No config files, no YAML, no environment variables to tune, no native dependencies to compile, no commands to type. Works everywhere Docker runs. - **Fully Private & Local by Default** — Everything runs on your machine. Your code never leaves your network. The default Docker setup includes Ollama and Qdrant with no external API calls. Optional cloud providers (Qdrant, OpenAI, Gemini) are available but never required. - **Language-Agnostic** — Works with every programming language, framework, and file type out of the box. No per-language parsers to install, no grammar files to maintain, no "unsupported language" limitations. If your AI can read it, SocratiCode can index it. - **Production-Grade Vector Search** — Built on Qdrant, a purpose-built vector database with HNSW indexing, concurrent read/write, and payload filtering. Collections store both a dense vector and a BM25 sparse vector per chunk; the Query API runs both sub-queries in a single round-trip and fuses results with RRF. Designed for scale vector search. @@ -140,7 +160,7 @@ I built SocratiCode because I regularly work on existing, large, and complex cod - **Cross-process safety** — File-based locking (`proper-lockfile`) prevents multiple MCP instances from simultaneously indexing or watching the same project. Stale locks from crashed processes are automatically reclaimed. When another MCP process is already watching a project, `codebase_status` reports "active (watched by another process)" instead of incorrectly showing "inactive." - **Concurrency guards** — Duplicate indexing and graph-build operations are prevented. If you call `codebase_index` while indexing is already running, it returns the current progress instead of starting a second operation. - **Graceful stop** — Long-running indexing operations can be stopped safely with `codebase_stop`. The current batch finishes and checkpoints, preserving all progress. Re-run `codebase_index` to resume from where it left off. -- **Graceful shutdown** — On server shutdown, active indexing operations are given up to 60 seconds to complete, all file watchers are stopped cleanly, and the MCP server closes gracefully. +- **Graceful shutdown** — On server shutdown, active indexing operations are given up to 60 seconds to complete, all file watchers are stopped cleanly, and the everything closes gracefully. - **Structured logging** — All operations are logged with structured context for observability. Log level configurable via `SOCRATICODE_LOG_LEVEL`. - **Graceful degradation** — If infrastructure goes down during watch, the watcher backs off and retries instead of crashing. @@ -201,6 +221,8 @@ User: "Are there any circular dependencies?" ## Agent Instructions +> **Claude Code plugin users**: These instructions are included automatically as skills in the SocratiCode plugin. You don't need to copy them into `CLAUDE.md`. The section below is for non-Claude Code hosts (VS Code, Cursor, Claude Desktop, etc.). + For best results, add instructions like the following to your AI assistant's system prompt, `CLAUDE.md`, `AGENTS.md`, or equivalent instructions file. The core principle: **search before reading**. The index gives you a map of the codebase in milliseconds; raw file reading is expensive and context-consuming. ```markdown @@ -270,7 +292,45 @@ before reading any files directly. ### Install -#### npx (recommended — no installation) +#### Claude Code plugin (recommended for Claude Code users) + +The SocratiCode plugin bundles both the MCP server and workflow skills that teach Claude how to use the tools effectively. One install gives you everything: + +```bash +claude plugin add --from-github giancarloerra/socraticode +``` + +The plugin includes: +- **MCP server** — all 21 SocratiCode tools (search, graph, context artifacts, etc.) +- **Exploration skill** — teaches Claude the search-before-reading workflow +- **Management skill** — guides setup, indexing, watching, and troubleshooting +- **Explorer agent** — delegatable subagent for deep codebase analysis + +> If you previously installed SocratiCode as a standalone MCP (`claude mcp add socraticode`), remove it after installing the plugin to avoid duplicates: `claude mcp remove socraticode` + +**Configuring environment variables:** SocratiCode works with zero config for most users (local Ollama + managed Qdrant). If you need cloud embeddings, a remote Qdrant, or other customization: + +1. **Claude Code settings** (recommended) — add to `~/.claude/settings.json`: + ```json + { + "env": { + "EMBEDDING_PROVIDER": "openai", + "OPENAI_API_KEY": "sk-..." + } + } + ``` + This works in all environments — CLI, VS Code, and JetBrains. + +2. **Shell profile** — set vars in `~/.zshrc` or `~/.bashrc`: + ```bash + export EMBEDDING_PROVIDER=openai + export OPENAI_API_KEY=sk-... + ``` + Works when Claude Code is launched from a terminal. Note: IDE-launched sessions (e.g. VS Code opened from Finder/Dock) may not inherit shell profile variables — use option 1 instead. + +Restart Claude Code after changing variables. See [Environment Variables](#environment-variables) for all options. + +#### npx (recommended for all other MCP hosts — no installation) Requires Node.js 18+ and Docker (running). Already covered in [Quick Start](#quick-start) above, add the following to your `mcpServers` (Claude Desktop, Windsurf, Cline, Roo Code) or `servers` (VS Code project-local `.vscode/mcp.json`) config: diff --git a/agents/codebase-explorer.md b/agents/codebase-explorer.md new file mode 100644 index 0000000..9eea191 --- /dev/null +++ b/agents/codebase-explorer.md @@ -0,0 +1,53 @@ +--- +name: codebase-explorer +description: >- + Deep codebase exploration using SocratiCode. Combines semantic search, + dependency graphs, and context artifacts to answer questions about code + structure and behavior. Use when delegating complex codebase understanding + tasks that require tracing through multiple files and dependencies. + + + Context: User wants to understand how a complex feature works across multiple files. + user: "How does the authentication system work in this codebase?" + assistant: "I'll use the codebase-explorer agent to trace through the authentication implementation." + + + + Context: User wants an architectural overview of a new codebase. + user: "Give me an overview of this project's architecture" + assistant: "I'll use the codebase-explorer agent for a deep architectural analysis." + +model: sonnet +allowed-tools: Read, Grep, Glob, Bash(git *) +--- + +You are a codebase exploration specialist. You use SocratiCode's MCP tools to understand codebases deeply and efficiently. + +## Core Principle: Search Before Reading + +Never open a file just to check if it's relevant. Always search first. + +## Your Approach + +1. **Search broadly first.** Use `codebase_search` with conceptual queries to map the relevant areas of the codebase. A single search returns ranked snippets from the entire codebase in milliseconds. + +2. **Follow the dependency graph.** Use `codebase_graph_query` to understand what a file imports and what depends on it before reading its contents. Use `codebase_graph_stats` for an architectural overview. + +3. **Check for non-code knowledge.** Use `codebase_context` to discover database schemas, API specs, and infrastructure configs. Use `codebase_context_search` to search them. + +4. **Read files only after narrowing down.** Once search results point to 1-3 specific files, read the relevant sections. + +5. **Check for architectural issues.** Use `codebase_graph_circular` to detect circular dependencies when debugging unexpected behavior. + +6. **Synthesize findings.** Present clear, structured answers with specific file paths and line references. Explain the relationships between components. + +## Available SocratiCode Tools + +**Search:** `codebase_search` (hybrid semantic + keyword), `codebase_status` +**Graph:** `codebase_graph_query`, `codebase_graph_stats`, `codebase_graph_circular`, `codebase_graph_visualize`, `codebase_graph_status` +**Context:** `codebase_context`, `codebase_context_search` +**Info:** `codebase_about` + +## When to Use grep Instead + +If you already know the exact identifier, error string, or regex pattern, use grep/ripgrep — it's faster and more precise for exact matches. Use `codebase_search` when exploring conceptually or when you don't know which files to look in. diff --git a/hooks/hooks.json b/hooks/hooks.json new file mode 100644 index 0000000..cc8d070 --- /dev/null +++ b/hooks/hooks.json @@ -0,0 +1,14 @@ +{ + "SessionStart": [ + { + "matcher": ".*", + "hooks": [ + { + "type": "prompt", + "prompt": "IMPORTANT: If you notice that SocratiCode MCP tools appear with BOTH a plugin prefix (e.g. mcp__plugin_socraticode_socraticode__codebase_search) AND a standalone prefix (e.g. mcp__socraticode__codebase_search), the user has a duplicate configuration. Advise them to remove the standalone MCP by running: claude mcp remove socraticode — the plugin already provides the MCP server.", + "timeout": 5 + } + ] + } + ] +} diff --git a/package.json b/package.json index 759a030..448ae61 100644 --- a/package.json +++ b/package.json @@ -50,6 +50,11 @@ "license": "AGPL-3.0-only", "files": [ "dist", + ".claude-plugin", + "skills", + "agents", + "hooks", + ".mcp.json", "LICENSE", "LICENSE-COMMERCIAL", "README.md", diff --git a/skills/codebase-exploration/SKILL.md b/skills/codebase-exploration/SKILL.md new file mode 100644 index 0000000..7b8c7a9 --- /dev/null +++ b/skills/codebase-exploration/SKILL.md @@ -0,0 +1,86 @@ +--- +name: codebase-exploration +description: >- + Explore and understand codebases using SocratiCode semantic search, dependency graphs, + and context artifacts. Use when exploring code, understanding architecture, finding + functions/types, analyzing dependencies, searching database schemas or API specs, + or when socraticode/codebase_search tools are available. Activates when the user asks + about code structure, wants to find where a feature lives, or needs to understand + how code is organized. +--- + +# SocratiCode Codebase Exploration + +Use SocratiCode MCP tools to explore codebases efficiently. The core principle: +**search before reading** — the index gives you a map of the codebase in milliseconds; +raw file reading is expensive and context-consuming. + +## Workflow + +### 1. Start most explorations with `codebase_search` + +Hybrid semantic + keyword search (vector + BM25, RRF-fused) runs in a single call. + +- **Broad queries for orientation**: "how is authentication handled", "database connection setup", "error handling patterns" +- **Precise queries for symbol lookup**: exact function names, constants, type names +- Prefer search results to infer which files to read — do not speculatively open files +- Use `fileFilter` to narrow to a specific file path, `languageFilter` for a specific language +- Adjust `minScore` (default 0.10) for precision vs recall — lower for more results, higher for stricter matching + +**When to use grep instead**: If you already know the exact identifier, error string, or regex pattern, grep/ripgrep is faster and more precise — no semantic gap to bridge. Use `codebase_search` when exploring, asking conceptual questions, or when you don't know which files to look in. + +### 2. Follow the graph before following imports + +Use `codebase_graph_query` to see what a file imports and what depends on it **before** diving into its contents. This prevents unnecessary reading of transitive dependencies. + +- **`codebase_graph_query`** — imports and dependents for any file (pass relative path) +- **`codebase_graph_stats`** — architecture overview: total files, edges, most connected files, orphans, language breakdown +- **`codebase_graph_circular`** — find circular dependencies (these cause subtle runtime bugs; check proactively when debugging unexpected behavior) +- **`codebase_graph_visualize`** — Mermaid diagram color-coded by language, circular deps highlighted in red + +The graph is auto-built after indexing. Use `codebase_graph_status` to check if the graph is ready. + +### 3. Read files only after narrowing via search + +Once search results clearly point to 1-3 files, read only the relevant sections. **Never read a file just to find out if it's relevant** — search first. + +A single `codebase_search` call returns ranked, deduplicated snippets from across the entire codebase in milliseconds. This gives you a broad map at negligible token cost — far cheaper than opening files speculatively. + +### 4. Leverage context artifacts for non-code knowledge + +Projects can define a `.socraticodecontextartifacts.json` config to expose database schemas, API specs, infrastructure configs, architecture docs, and other project knowledge that lives outside source code. + +- **`codebase_context`** — list available artifacts (names, descriptions, paths, index status) +- **`codebase_context_search`** — semantic search across all artifacts (or filter with `artifactName`) +- Artifacts are auto-indexed on first search and auto-detect staleness + +Run `codebase_context` early to see what's available. Use `codebase_context_search` before asking about database structure, API contracts, or infrastructure. + +### 5. Check status if something seems wrong + +- **`codebase_status`** — check index status, progress, watcher state, graph status +- If search returns no results, the project may not be indexed yet +- If the watcher is inactive, results may be stale — run `codebase_update` or start the watcher + +### 6. Get an overview of all tools + +- **`codebase_about`** — quick reference of all SocratiCode tools and a typical workflow + +## Goal → Tool Quick Reference + +| Goal | Tool | +|------|------| +| Understand what a codebase does / where a feature lives | `codebase_search` (broad query) | +| Find a specific function, constant, or type | `codebase_search` (exact name) or grep | +| Find exact error messages, log strings, or regex patterns | grep / ripgrep | +| See what a file imports or what depends on it | `codebase_graph_query` | +| Get architecture overview (files, edges, most connected) | `codebase_graph_stats` | +| Spot circular dependencies | `codebase_graph_circular` | +| Visualize module structure | `codebase_graph_visualize` | +| Check graph build status | `codebase_graph_status` | +| Verify index is up to date | `codebase_status` | +| Discover available schemas, specs, configs | `codebase_context` | +| Find database tables, API endpoints, infra configs | `codebase_context_search` | +| Quick overview of all tools | `codebase_about` | + +For full parameter details on every tool, see [references/tool-reference.md](references/tool-reference.md). diff --git a/skills/codebase-exploration/references/tool-reference.md b/skills/codebase-exploration/references/tool-reference.md new file mode 100644 index 0000000..0f469f6 --- /dev/null +++ b/skills/codebase-exploration/references/tool-reference.md @@ -0,0 +1,190 @@ +# SocratiCode Exploration Tools — Full Reference + +## codebase_search + +Semantic search across an indexed codebase. Only use after `codebase_index` is complete. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `query` | string | yes | — | Natural language search query (e.g. "authentication middleware", "database connection setup") | +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `limit` | number (1-50) | no | 10 | Maximum results. Override globally via `SEARCH_DEFAULT_LIMIT` env var | +| `fileFilter` | string | no | — | Filter results to a specific file path (relative) | +| `languageFilter` | string | no | — | Filter results to a specific language (e.g. "typescript", "python") | +| `minScore` | number (0-1) | no | 0.10 | Minimum RRF score threshold. Override via `SEARCH_MIN_SCORE`. Set to 0 to disable | + +**Returns:** Ranked code chunks with file paths, line numbers, language, and RRF scores. + +**Key behaviors:** +- Uses hybrid semantic + keyword (BM25) search with Reciprocal Rank Fusion +- Warns if indexing is in progress (results will be incomplete during full index) +- Warns if file watcher is not active (results may be stale) +- Results below `minScore` are filtered out with a count of omitted results + +--- + +## codebase_status + +Check index status: chunk count, indexing progress, last completed operation, file watcher state. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Detailed status including: +- Collection name and indexed chunk count +- In-progress indexing: phase, file/chunk progress percentage, batch info, elapsed time +- Last completed operation: type, files processed, chunks created, duration +- Incomplete index detection (previous run interrupted) +- Cross-process indexing detection (another process actively indexing) +- File watcher status (active / watched by another process / inactive) +- Code graph status (files, edges, last built, cached in memory) +- Context artifacts status + +**Key behaviors:** +- Call every ~60 seconds during indexing to poll progress AND keep the MCP connection alive +- Detects both same-process and cross-process indexing + +--- + +## codebase_graph_query + +Query the dependency graph for a specific file. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `filePath` | string | yes | — | Relative path of the file to query (e.g. "src/index.ts") | + +**Returns:** Two lists: what this file imports (→) and what depends on it (←). + +**Key behaviors:** +- Requires graph to exist (auto-built after indexing, or use `codebase_graph_build`) +- Auto-starts file watcher on query +- Use relative paths (not absolute) + +--- + +## codebase_graph_stats + +Get statistics about the code dependency graph. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Total files, edges, average dependencies per file, circular dependency count, language breakdown, top 10 most connected files, first 20 orphan files. + +--- + +## codebase_graph_circular + +Find circular dependencies in the codebase. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** List of circular dependency chains (up to 20, with total count). + +**Key behaviors:** +- Detects transitive circular dependencies +- Useful for debugging subtle runtime issues caused by import cycles + +--- + +## codebase_graph_visualize + +Generate a Mermaid diagram of the dependency graph. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Mermaid flowchart code block. Nodes color-coded by language, circular dependency edges highlighted in red. + +**Key behaviors:** +- Can be rendered in markdown viewers, VS Code, GitHub, etc. +- Shows file count and edge count + +--- + +## codebase_graph_status + +Check graph build status and progress. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** If building: phase, file progress, elapsed time. If ready: node/edge counts, last built timestamp, cache status, last build duration. + +**Key behaviors:** +- Use to poll progress after `codebase_graph_build` +- Shows build errors if last build failed + +--- + +## codebase_context + +List all context artifacts defined in `.socraticodecontextartifacts.json`. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Each artifact's name, description, path, and index status (chunk count, last indexed timestamp, or "not yet indexed"). If no config exists, provides a template. + +--- + +## codebase_context_search + +Semantic search across context artifacts. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `query` | string | yes | — | Natural language query (e.g. "tables related to billing", "authentication endpoints") | +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `artifactName` | string | no | — | Filter to a specific artifact by name. Omit to search all | +| `limit` | number (1-50) | no | 10 | Maximum results | +| `minScore` | number (0-1) | no | 0.10 | Minimum RRF score threshold | + +**Returns:** Artifact content chunks with artifact name, file path, line ranges, and scores. + +**Key behaviors:** +- Auto-indexes artifacts on first use (no manual step needed) +- Auto-detects stale artifacts and re-indexes changed ones +- Works with any text-based artifact: SQL, OpenAPI, Terraform, K8s, YAML, markdown, etc. + +--- + +## codebase_about + +Display information about SocratiCode and all its tools. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| (none) | — | — | — | No parameters | + +**Returns:** Tool summary by category, typical workflow, infrastructure status, version info. + +--- + +## Tips + +### Search tips +- A single `codebase_search` returns ranked snippets from the entire codebase in milliseconds — far cheaper than opening files speculatively +- The RRF score combines semantic similarity and keyword match; higher scores = better relevance +- Use `fileFilter` when you know the area, `languageFilter` when cross-language results are noisy +- Lower `minScore` to 0 when exploring broadly; raise it for precision + +### Graph tips +- The graph is auto-built after `codebase_index` — usually no need to call `codebase_graph_build` manually +- `codebase_graph_visualize` generates Mermaid that renders in GitHub, VS Code, and most markdown viewers +- Check `codebase_graph_circular` when debugging mysterious behavior — circular deps cause subtle issues + +### Context artifact tips +- `codebase_context_search` auto-indexes on first use — just search, no setup needed +- Stale artifacts are auto-detected and re-indexed when content changes +- Use `artifactName` filter to target specific schemas or specs +- Supported types: SQL schemas, OpenAPI/Protobuf specs, Terraform configs, K8s manifests, architecture docs, env configs — any text-based file diff --git a/skills/codebase-management/SKILL.md b/skills/codebase-management/SKILL.md new file mode 100644 index 0000000..bbd067f --- /dev/null +++ b/skills/codebase-management/SKILL.md @@ -0,0 +1,98 @@ +--- +name: codebase-management +description: >- + Set up, index, and manage SocratiCode codebase indexing. Use when the user wants to + index a project, check infrastructure health, start/stop file watching, configure + context artifacts, troubleshoot indexing issues, manage the code graph, or any + SocratiCode administrative task. Activates when the user mentions indexing, setting up + search, SocratiCode infrastructure, or managing the codebase index. +--- + +# SocratiCode Management + +Set up, index, and manage SocratiCode codebase indexing, file watching, code graphs, and context artifacts. + +## First-Time Setup + +1. **Check infrastructure**: `codebase_health` — verifies Docker, Qdrant, Ollama/embedding provider, and embedding model +2. **Start indexing**: `codebase_index` — runs in background, returns immediately +3. **Poll progress**: `codebase_status` — call every ~60 seconds until 100% complete + - This also keeps the MCP connection alive (some hosts disconnect idle connections) +4. **Done**: Graph auto-builds after indexing. File watcher auto-starts. Ready to search. + +On first use, SocratiCode automatically pulls Docker images, starts containers, and downloads the embedding model (~5 min one-time setup). + +## Incremental Updates & File Watching + +The file watcher keeps the index automatically updated. It auto-starts after indexing. + +- **`codebase_watch { action: "start" }`** — start the watcher (runs catch-up update first) +- **`codebase_watch { action: "stop" }`** — stop the watcher +- **`codebase_watch { action: "status" }`** — list watched projects (including cross-process) +- **`codebase_update`** — manual incremental update (only changed files, synchronous). Usually not needed if watcher is active. + +## Managing Indexes + +- **`codebase_stop`** — gracefully pause in-progress indexing. Current batch finishes and checkpoints. All progress preserved. Resume with `codebase_index`. +- **`codebase_remove`** — delete entire index (destructive). Safely stops watcher, cancels indexing, waits for graph builds. +- **`codebase_list_projects`** — list all indexed projects with metadata, graph info, and artifact status. + +## Managing the Code Graph + +The dependency graph is auto-built after indexing. Manual management is rarely needed. + +- **`codebase_graph_build`** — manually rebuild (background, async). Poll with `codebase_graph_status`. +- **`codebase_graph_remove`** — delete graph (auto-rebuilds on next `codebase_index`) +- **`codebase_graph_status`** — check build progress or graph readiness + +## Context Artifacts Setup + +To index non-code knowledge, create `.socraticodecontextartifacts.json` in the project root: + +```json +{ + "artifacts": [ + { + "name": "database-schema", + "path": "./docs/schema.sql", + "description": "PostgreSQL schema — all tables, indexes, constraints, foreign keys." + } + ] +} +``` + +Supported types: SQL schemas, OpenAPI/Protobuf API specs, Terraform/CloudFormation configs, Kubernetes manifests, architecture docs, environment configs — any text-based file or directory. + +- **`codebase_context_index`** — manually index/re-index all artifacts (usually auto-triggered) +- **`codebase_context_remove`** — remove all indexed artifacts (blocked during indexing) + +## Troubleshooting + +| Problem | Solution | +|---------|----------| +| Docker not available | Install Docker Desktop from https://docker.com, ensure it's running | +| Slow indexing on macOS/Windows | Docker can't use GPU. Install native Ollama from https://ollama.com/download for Metal/CUDA acceleration. Or use cloud embeddings. | +| Want cloud embeddings instead | Set `EMBEDDING_PROVIDER=openai` + `OPENAI_API_KEY`, or `EMBEDDING_PROVIDER=google` + `GOOGLE_API_KEY` | +| Search returns no results | Check `codebase_status` — project may not be indexed. Run `codebase_index`. | +| Stale results | Check if watcher is active (`codebase_status`). Run `codebase_update` or `codebase_watch { action: "start" }`. | +| Indexing was interrupted | Run `codebase_index` again — it resumes from the last checkpoint automatically. | +| Another process is indexing | `codebase_status` detects cross-process indexing. Wait for it, or use `codebase_stop`. | + +## Key Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `QDRANT_MODE` | `managed` | `managed` (Docker) or `external` (remote/cloud Qdrant) | +| `QDRANT_URL` | — | Full URL for remote Qdrant (e.g. `https://xyz.cloud.qdrant.io:6333`) | +| `QDRANT_API_KEY` | — | API key for remote Qdrant | +| `EMBEDDING_PROVIDER` | `ollama` | `ollama`, `openai`, or `google` | +| `OPENAI_API_KEY` | — | Required when `EMBEDDING_PROVIDER=openai` | +| `GOOGLE_API_KEY` | — | Required when `EMBEDDING_PROVIDER=google` | +| `OLLAMA_MODE` | `auto` | `auto` (detect native, fallback Docker), `docker`, `external` | +| `EMBEDDING_MODEL` | `nomic-embed-text` | Model name (provider-specific) | +| `SEARCH_DEFAULT_LIMIT` | `10` | Default result limit for codebase_search (1-50) | +| `SEARCH_MIN_SCORE` | `0.10` | Default minimum RRF score threshold (0-1) | +| `MAX_FILE_SIZE_MB` | `5` | Maximum file size for indexing in MB | +| `EXTRA_EXTENSIONS` | — | Additional file extensions to index (e.g. `.tpl,.blade,.hbs`) | + +For full parameter details on every tool, see [references/tool-reference.md](references/tool-reference.md). diff --git a/skills/codebase-management/references/tool-reference.md b/skills/codebase-management/references/tool-reference.md new file mode 100644 index 0000000..fa9fc6a --- /dev/null +++ b/skills/codebase-management/references/tool-reference.md @@ -0,0 +1,253 @@ +# SocratiCode Management Tools — Full Reference + +## codebase_index + +Start indexing a codebase in the background. Returns immediately. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `extraExtensions` | string | no | — | Comma-separated additional extensions (e.g. ".tpl,.blade,.hbs"). Also via `EXTRA_EXTENSIONS` env var | + +**Returns:** Confirmation that indexing started, with instructions to poll `codebase_status`. + +**Key behaviors:** +- Runs asynchronously — does NOT block. Returns immediately. +- Auto-starts file watcher upon completion (if not cancelled) +- Ensures Docker/Qdrant/Ollama infrastructure is running first +- Concurrency guard: if already indexing, returns current progress instead of starting again +- Auto-indexes context artifacts defined in `.socraticodecontextartifacts.json` +- Auto-builds code graph after indexing completes +- Batched and resumable: checkpoints after each batch of 50 files. Interruptions don't lose work. + +--- + +## codebase_update + +Incrementally update an existing index. Only re-indexes changed files. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `extraExtensions` | string | no | — | Comma-separated additional extensions | + +**Returns:** Statistics: files added/updated/removed, chunks created. + +**Key behaviors:** +- Runs synchronously (blocking), unlike `codebase_index` +- Only processes files changed since last index (via content hash comparison) +- Auto-starts file watcher if not already active +- Usually not needed if the file watcher is running + +--- + +## codebase_remove + +Remove a project's entire index from the vector database. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | **yes** | — | Absolute path to the project directory | + +**Returns:** Confirmation of removal. + +**Key behaviors:** +- **Destructive** — cannot be undone +- Safely stops file watcher (same-process and cross-process) +- Cancels in-progress indexing and drains current batch +- Waits for in-flight graph builds to finish before deletion +- May refuse if indexing batch can't drain within 5 minutes (retry after) + +--- + +## codebase_stop + +Gracefully stop an in-progress indexing operation. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Confirmation with current phase and batch info. + +**Key behaviors:** +- Current batch finishes and checkpoints — all progress preserved +- Re-run `codebase_index` to resume from where it left off +- Handles both same-process and cross-process (orphan) indexing +- Sends SIGTERM to orphan processes holding the lock +- Non-destructive — progress is never lost + +--- + +## codebase_watch + +Start/stop/status of live file watching. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `action` | enum | **yes** | — | `"start"`, `"stop"`, or `"status"` | + +**Returns:** Action result or list of watched projects. + +**Key behaviors:** +- `start`: Runs catch-up incremental update first, then starts debounced file watcher +- `stop`: Stops same-process watcher (cross-process watchers unaffected) +- `status`: Lists all watched projects including cross-process watchers +- Detects if another process already watches the same project +- Auto-started after successful `codebase_index` or `codebase_update` +- Debounced with ~500ms delay to batch rapid file changes + +--- + +## codebase_graph_build + +Build the dependency graph using AST-based static analysis. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | +| `extraExtensions` | string | no | — | Additional extensions included as leaf nodes (dependency targets) | + +**Returns:** Confirmation that build started. Poll with `codebase_graph_status`. + +**Key behaviors:** +- Runs asynchronously in background +- Auto-built during `codebase_index` (usually no need to call manually) +- Concurrency guard: if already building, shows progress +- Uses ast-grep for static import/require/export analysis across 18+ languages +- Skips files larger than 1 MB + +--- + +## codebase_graph_remove + +Remove a project's persisted code graph. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | **yes** | — | Absolute path to the project directory | + +**Returns:** Confirmation of removal. + +**Key behaviors:** +- **Destructive** — cannot be undone +- Waits for in-flight builds before deletion +- Graph auto-rebuilds during next `codebase_index` + +--- + +## codebase_graph_status + +Check graph build status and readiness. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** If building: phase, file progress, elapsed time. If ready: node/edge counts, last built timestamp, cache status, build duration. + +--- + +## codebase_context_index + +Index or re-index all context artifacts. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | no | cwd | Absolute path to the project directory | + +**Returns:** Summary: artifacts indexed with chunk counts, any errors. + +**Key behaviors:** +- Runs synchronously (blocking) +- Usually auto-triggered by `codebase_context_search` on first use +- Reports individual artifact errors without stopping the whole operation + +--- + +## codebase_context_remove + +Remove all indexed context artifacts. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `projectPath` | string | **yes** | — | Absolute path to the project directory | + +**Returns:** Confirmation of removal. + +**Key behaviors:** +- **Destructive** — cannot be undone +- Blocked while indexing is in progress (wait for finish or use `codebase_stop`) +- Removes ALL artifacts (not selective) + +--- + +## codebase_health + +Check infrastructure health: Docker, Qdrant, Ollama, embedding model. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| (none) | — | — | — | No parameters | + +**Returns:** Status for each component with [OK]/[MISSING] indicators. + +**Key behaviors:** +- Checks Docker availability +- Checks Qdrant (managed container or external endpoint) +- Checks embedding provider health (Ollama, OpenAI, or Google) +- Suggests fixes for missing components +- Works with both managed (Docker) and external (remote Qdrant) modes + +--- + +## codebase_list_projects + +List all projects that have been indexed. + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| (none) | — | — | — | No parameters | + +**Returns:** List with project paths, collection names, last indexed timestamp, file counts, graph info, context artifact status. Flags incomplete indexes. + +--- + +## Architectural Behaviors + +### Concurrency guards +- Only one indexing operation per project at a time +- Only one graph build per project at a time +- Duplicate operations return current progress instead of starting again + +### Checkpoint & resume +- Indexing checkpoints after each batch of 50 files +- Interrupted indexing resumes from the last checkpoint automatically +- `codebase_stop` preserves all progress — resume with `codebase_index` + +### Cross-process coordination +- File-based locking (`proper-lockfile`) prevents conflicts between multiple MCP instances +- Detects watchers/indexing running in other processes +- Can terminate orphan processes via SIGTERM +- Stale locks from crashed processes are auto-reclaimed + +### Auto-features +- File watcher auto-starts after indexing/updates +- Context artifacts auto-indexed on first `codebase_context_search` +- Stale artifacts auto-detected and re-indexed +- Code graph auto-built after indexing +- Session resume: watcher restarts on first tool use for previously indexed projects + +### Supported file extensions +**Built-in:** `.js`, `.jsx`, `.ts`, `.tsx`, `.mjs`, `.cjs`, `.py`, `.pyw`, `.pyi`, `.java`, `.kt`, `.kts`, `.scala`, `.c`, `.h`, `.cpp`, `.hpp`, `.cc`, `.hh`, `.cxx`, `.cs`, `.go`, `.rs`, `.rb`, `.php`, `.swift`, `.sh`, `.bash`, `.zsh`, `.html`, `.htm`, `.css`, `.scss`, `.sass`, `.less`, `.vue`, `.svelte`, `.json`, `.yaml`, `.yml`, `.toml`, `.xml`, `.ini`, `.cfg`, `.md`, `.mdx`, `.rst`, `.txt`, `.sql`, `.dart`, `.lua`, `.r`, `.R`, `.dockerfile` + +**Special files:** `Dockerfile`, `Makefile`, `Rakefile`, `Gemfile`, `Procfile`, `.env.example`, `.gitignore`, `.dockerignore` + +**Custom:** Add via `extraExtensions` parameter or `EXTRA_EXTENSIONS` env var. + +### Chunking defaults +- Chunk size: 100 lines, 10 lines overlap +- Batch size: 50 files per batch (for resumable checkpointing) +- Max chunk chars: 2000 (safety limit) +- Max file size: 5 MB (configurable via `MAX_FILE_SIZE_MB`)