diff --git a/README.md b/README.md index 775f875..422191e 100644 --- a/README.md +++ b/README.md @@ -3,17 +3,15 @@

-

light weight MCP for code that just works

+

AST-based semantic code search that just works

![effect](https://github.com/user-attachments/assets/cb3a4cae-0e1f-49c4-890b-7bb93317ab60) - - -A super light-weight, effective embedded MCP **(AST-based)** that understand and searches your codebase that just works! Using [CocoIndex](https://github.com/cocoindex-io/cocoindex) - an Rust-based ultra performant data transformation engine. No blackbox. Works for Claude, Codex, Cursor - any coding agent. +A lightweight, effective **(AST-based)** semantic code search tool for your codebase. Built on [CocoIndex](https://github.com/cocoindex-io/cocoindex) — a Rust-based ultra performant data transformation engine. Use it from the CLI, or integrate with Claude, Codex, Cursor — any coding agent — via [Skill](#skill-recommended) or [MCP](#mcp-server). - Instant token saving by 70%. -- **1 min setup** - Just claude/codex mcp add works! +- **1 min setup** — install and go, zero config needed!
@@ -42,7 +40,9 @@ A super light-weight, effective embedded MCP **(AST-based)** that understand and
-## Get Started - zero config, let's go!! +## Get Started — zero config, let's go! + +### Install Using [pipx](https://pipx.pypa.io/stable/installation/): ```bash @@ -55,119 +55,209 @@ Using [uv](https://docs.astral.sh/uv/getting-started/installation/): uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24" ``` -### Claude +### Init, Index, Search + +```bash +ccc init # initialize project (creates settings) +ccc index # build the index +ccc search "authentication logic" # search! +``` + +That's it! The background daemon starts automatically on first use. The default embedding model runs locally ([sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)) — no API key required, completely free. + +> **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you can skip straight to indexing. + +## Coding Agent Integration + +### Skill (Recommended) + +Install the `ccc` skill so your coding agent automatically uses semantic search when needed: + +```bash +npx skills add cocoindex-io/cocoindex-code +``` + +This installs the skill into your project's `.claude/skills/` directory. Once installed, the agent automatically triggers semantic code search when it would be helpful — no manual prompting required. + +Works with [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and other skill-compatible agents. + +### MCP Server + +Alternatively, use `ccc mcp` to run as an MCP server: + +
+Claude Code + ```bash -claude mcp add cocoindex-code -- cocoindex-code +claude mcp add cocoindex-code -- ccc mcp ``` +
+ +
+Codex -### Codex ```bash -codex mcp add cocoindex-code -- cocoindex-code +codex mcp add cocoindex-code -- ccc mcp ``` +
+ +
+OpenCode -### OpenCode ```bash opencode mcp add ``` Enter MCP server name: `cocoindex-code` Select MCP server type: `local` -Enter command to run: `cocoindex-code` +Enter command to run: `ccc mcp` Or use opencode.json: -``` +```json { "$schema": "https://opencode.ai/config.json", "mcp": { "cocoindex-code": { "type": "local", "command": [ - "cocoindex-code" + "ccc", "mcp" ] } } } ``` +
-### Build the Index - -For large codebases, we recommend running the indexer once before using the MCP so you can see the progress: - -```bash -cocoindex-code index -``` - -This lets you monitor the indexing process and ensure everything is ready. After the initial build, the MCP server will automatically keep the index up-to-date in the background as files change. +Once configured, the agent automatically decides when semantic code search is helpful — finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names. -For small projects you can skip this step — the MCP server will build the index automatically on first use. +> **Note:** The `cocoindex-code` command (without subcommand) still works as an MCP server for backward compatibility. It auto-creates settings from environment variables on first run. -## When Is the MCP Triggered? +
+MCP Tool Reference -Once configured, your coding agent (Claude Code, Codex, Cursor, etc.) automatically decides when semantic code search is helpful — especially for finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names. +When running as an MCP server (`ccc mcp`), the following tool is exposed: -You can also nudge the agent explicitly, e.g. *"Use the cocoindex-code MCP to find how user sessions are managed."* For persistent instructions, add guidance to your project's `AGENTS.md` or `CLAUDE.md`: +**`search`** — Search the codebase using semantic similarity. ``` -Use the cocoindex-code MCP server for semantic code search when: -- Searching for code by meaning or description rather than exact text -- Exploring unfamiliar parts of the codebase -- Looking for implementations without knowing exact names -- Finding similar code patterns or related functionality +search( + query: str, # Natural language query or code snippet + limit: int = 5, # Maximum results (1-100) + offset: int = 0, # Pagination offset + refresh_index: bool = True, # Refresh index before querying + languages: list[str] | None = None, # Filter by language (e.g. ["python", "typescript"]) + paths: list[str] | None = None, # Filter by path glob (e.g. ["src/utils/*"]) +) ``` +Returns matching code chunks with file path, language, code content, line numbers, and similarity score. +
+ ## Features - **Semantic Code Search**: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately. -- **Ultra Performant to code changes**:⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex/edit/main/README.md). Only re-indexes changed files for fast updates. -- **Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell +- **Ultra Performant**: ⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex). Only re-indexes changed files for fast updates. +- **Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more. - **Embedded**: Portable and just works, no database setup required! -- **Flexible Embeddings**: By default, no API key required with Local SentenceTransformers - totally free! You can customize 100+ cloud providers. +- **Flexible Embeddings**: Local SentenceTransformers by default (free!) or 100+ cloud providers via LiteLLM. + +## CLI Reference +| Command | Description | +|---------|-------------| +| `ccc init` | Initialize a project — creates settings files, adds `.cocoindex_code/` to `.gitignore` | +| `ccc index` | Build or update the index (auto-inits if needed). Shows streaming progress. | +| `ccc search ` | Semantic search across the codebase | +| `ccc status` | Show index stats (chunk count, file count, language breakdown) | +| `ccc mcp` | Run as MCP server in stdio mode | +| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. | +| `ccc daemon status` | Show daemon version, uptime, and loaded projects | +| `ccc daemon restart` | Restart the background daemon | +| `ccc daemon stop` | Stop the daemon | + +### Search Options + +```bash +ccc search database schema # basic search +ccc search --lang python --lang markdown schema # filter by language +ccc search --path 'src/utils/*' query handler # filter by path +ccc search --offset 10 --limit 5 database schema # pagination +ccc search --refresh database schema # update index first, then search +``` + +By default, `ccc search` scopes results to your current working directory (relative to the project root). Use `--path` to override. ## Configuration -| Variable | Description | Default | -|----------|-------------|---------| -| `COCOINDEX_CODE_ROOT_PATH` | Root path of the codebase | Auto-discovered (see below) | -| `COCOINDEX_CODE_EMBEDDING_MODEL` | Embedding model (see below) | `sbert/sentence-transformers/all-MiniLM-L6-v2` | -| `COCOINDEX_CODE_EXTRA_EXTENSIONS` | Additional file extensions to index (comma-separated, e.g. `"inc:php,yaml,toml"` — use `ext:lang` to override language detection) | _(none)_ | -| `COCOINDEX_CODE_EXCLUDED_PATTERNS` | Additional glob patterns to exclude from indexing as a JSON array (e.g. `'["**/migration.sql", "{**/*.md,**/*.txt}"]'`) | _(none)_ | +Configuration lives in two YAML files, both created automatically by `ccc init`. +### User Settings (`~/.cocoindex_code/global_settings.yml`) -### Root Path Discovery +Shared across all projects. Controls the embedding model and environment variables for the daemon. + +```yaml +embedding: + provider: sentence-transformers # or "litellm" + model: sentence-transformers/all-MiniLM-L6-v2 + device: mps # optional: cpu, cuda, mps (auto-detected if omitted) + +envs: # extra environment variables for the daemon + OPENAI_API_KEY: your-key # only needed if not already in your shell environment +``` -If `COCOINDEX_CODE_ROOT_PATH` is not set, the codebase root is discovered by: +> **Note:** The daemon inherits your shell environment. If an API key (e.g. `OPENAI_API_KEY`) is already set as an environment variable, you don't need to duplicate it in `envs`. The `envs` field is only for values that aren't in your environment. -1. Finding the nearest parent directory containing `.cocoindex_code/` -2. Finding the nearest parent directory containing `.git/` -3. Falling back to the current working directory +### Project Settings (`/.cocoindex_code/settings.yml`) + +Per-project. Controls which files to index. + +```yaml +include_patterns: + - "**/*.py" + - "**/*.js" + - "**/*.ts" + - "**/*.rs" + - "**/*.go" + # ... (sensible defaults for 28+ file types) + +exclude_patterns: + - "**/.*" # hidden directories + - "**/__pycache__" + - "**/node_modules" + - "**/dist" + # ... + +language_overrides: + - ext: inc # treat .inc files as PHP + lang: php +``` -### Embedding model -By default - this project use a local SentenceTransformers model (`sentence-transformers/all-MiniLM-L6-v2`). No API key required and completely free! +> `.cocoindex_code/` is automatically added to `.gitignore` during init. -Use a code specific embedding model can achieve better semantic understanding for your results, this project supports all models on Ollama and 100+ cloud providers. +## Embedding Models -Set `COCOINDEX_CODE_EMBEDDING_MODEL` to any [LiteLLM-supported model](https://docs.litellm.ai/docs/embedding/supported_embedding), along with the provider's API key: +By default, a local SentenceTransformers model ([sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)) is used — no API key required. To use a different model, edit `~/.cocoindex_code/global_settings.yml`. + +> The `envs` entries below are only needed if the key isn't already in your shell environment — the daemon inherits your environment automatically.
Ollama (Local) -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=ollama/nomic-embed-text \ - -- cocoindex-code +```yaml +embedding: + model: ollama/nomic-embed-text ``` -Set `OLLAMA_API_BASE` if your Ollama server is not at `http://localhost:11434`. +Set `OLLAMA_API_BASE` in `envs:` if your Ollama server is not at `http://localhost:11434`.
OpenAI -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=text-embedding-3-small \ - -e OPENAI_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: text-embedding-3-small +envs: + OPENAI_API_KEY: your-api-key ```
@@ -175,13 +265,13 @@ claude mcp add cocoindex-code \
Azure OpenAI -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=azure/your-deployment-name \ - -e AZURE_API_KEY=your-api-key \ - -e AZURE_API_BASE=https://your-resource.openai.azure.com \ - -e AZURE_API_VERSION=2024-06-01 \ - -- cocoindex-code +```yaml +embedding: + model: azure/your-deployment-name +envs: + AZURE_API_KEY: your-api-key + AZURE_API_BASE: https://your-resource.openai.azure.com + AZURE_API_VERSION: "2024-06-01" ```
@@ -189,11 +279,11 @@ claude mcp add cocoindex-code \
Gemini -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=gemini/text-embedding-004 \ - -e GEMINI_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: gemini/gemini-embedding-001 +envs: + GEMINI_API_KEY: your-api-key ```
@@ -201,11 +291,11 @@ claude mcp add cocoindex-code \
Mistral -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=mistral/mistral-embed \ - -e MISTRAL_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: mistral/mistral-embed +envs: + MISTRAL_API_KEY: your-api-key ```
@@ -213,11 +303,11 @@ claude mcp add cocoindex-code \
Voyage (Code-Optimized) -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=voyage/voyage-code-3 \ - -e VOYAGE_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: voyage/voyage-code-3 +envs: + VOYAGE_API_KEY: your-api-key ```
@@ -225,11 +315,11 @@ claude mcp add cocoindex-code \
Cohere -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=cohere/embed-english-v3.0 \ - -e COHERE_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: cohere/embed-v4.0 +envs: + COHERE_API_KEY: your-api-key ```
@@ -237,13 +327,13 @@ claude mcp add cocoindex-code \
AWS Bedrock -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=bedrock/amazon.titan-embed-text-v2:0 \ - -e AWS_ACCESS_KEY_ID=your-access-key \ - -e AWS_SECRET_ACCESS_KEY=your-secret-key \ - -e AWS_REGION_NAME=us-east-1 \ - -- cocoindex-code +```yaml +embedding: + model: bedrock/amazon.titan-embed-text-v2:0 +envs: + AWS_ACCESS_KEY_ID: your-access-key + AWS_SECRET_ACCESS_KEY: your-secret-key + AWS_REGION_NAME: us-east-1 ```
@@ -251,68 +341,39 @@ claude mcp add cocoindex-code \
Nebius -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=nebius/BAAI/bge-en-icl \ - -e NEBIUS_API_KEY=your-api-key \ - -- cocoindex-code +```yaml +embedding: + model: nebius/BAAI/bge-en-icl +envs: + NEBIUS_API_KEY: your-api-key ```
-Any model supported by LiteLLM works — see the [full list of embedding providers](https://docs.litellm.ai/docs/embedding/supported_embedding). +Any [LiteLLM-supported model](https://docs.litellm.ai/docs/embedding/supported_embedding) works. When using a LiteLLM model, set `provider: litellm` (or omit `provider` — LiteLLM is the default for non-`sentence-transformers` models). -### Local SentenceTransformers models +### Local SentenceTransformers Models -Use the `sbert/` prefix to load any [SentenceTransformers](https://www.sbert.net/) model locally (no API key required). +Set `provider: sentence-transformers` and use any [SentenceTransformers](https://www.sbert.net/) model (no API key required). **Example — general purpose text model:** -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=sbert/nomic-ai/nomic-embed-text-v1 \ - -- cocoindex-code +```yaml +embedding: + provider: sentence-transformers + model: nomic-ai/nomic-embed-text-v1.5 ``` **GPU-optimised code retrieval:** [`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed) delivers significantly better code retrieval than the default model. It is 137M parameters, requires ~1 GB VRAM, and has an 8192-token context window. -```bash -claude mcp add cocoindex-code \ - -e COCOINDEX_CODE_EMBEDDING_MODEL=sbert/nomic-ai/CodeRankEmbed \ - -- cocoindex-code -``` - -**Note:** Switching models requires re-indexing your codebase (the vector dimensions differ). - -## MCP Tools - -### `search` - -Search the codebase using semantic similarity. - -``` -search( - query: str, # Natural language query or code snippet - limit: int = 10, # Maximum results (1-100) - offset: int = 0, # Pagination offset - refresh_index: bool = True # Refresh index before querying -) +```yaml +embedding: + provider: sentence-transformers + model: nomic-ai/CodeRankEmbed ``` -The `refresh_index` parameter controls whether the index is refreshed before searching: - -- `True` (default): Refreshes the index to include any recent changes -- `False`: Skip refresh for faster consecutive queries - -Returns matching code chunks with: - -- File path -- Language -- Code content -- Line numbers (start/end) -- Similarity score - +**Note:** Switching models requires re-indexing your codebase (`ccc reset && ccc index`) since the vector dimensions differ. ## Supported Languages @@ -348,14 +409,6 @@ Returns matching code chunks with: | xml | | `.xml` | | yaml | | `.yaml`, `.yml` | -Common generated directories are automatically excluded: - -- `__pycache__/` -- `node_modules/` -- `target/` -- `dist/` -- `vendor/` (Go vendored dependencies, matched by domain-based child paths) - ## Troubleshooting ### `sqlite3.Connection object has no attribute enable_load_extension` @@ -368,7 +421,7 @@ Some Python installations (e.g. the one pre-installed on macOS) ship with a SQLi brew install python3 ``` -Then re-install cocoindex-code (see [Get Started](#get-started---zero-config-lets-go) for install options): +Then re-install cocoindex-code (see [Get Started](#get-started--zero-config-lets-go) for install options): Using pipx: ```bash @@ -381,10 +434,22 @@ Using uv (install or upgrade): uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24" ``` +## Legacy: Environment Variables + +If you previously configured `cocoindex-code` via environment variables, the `cocoindex-code` MCP command still reads them and auto-migrates to YAML settings on first run. We recommend switching to the YAML settings for new setups. + +| Environment Variable | YAML Equivalent | +|---------------------|-----------------| +| `COCOINDEX_CODE_EMBEDDING_MODEL` | `embedding.model` in `global_settings.yml` | +| `COCOINDEX_CODE_DEVICE` | `embedding.device` in `global_settings.yml` | +| `COCOINDEX_CODE_ROOT_PATH` | Run `ccc init` in your project root instead | +| `COCOINDEX_CODE_EXCLUDED_PATTERNS` | `exclude_patterns` in project `settings.yml` | +| `COCOINDEX_CODE_EXTRA_EXTENSIONS` | `include_patterns` + `language_overrides` in project `settings.yml` | + ## Large codebase / Enterprise -[CocoIndex](https://github.com/cocoindex-io/cocoindex) is an ultra effecient indexing engine that also works on large codebase at scale on XXX G for enterprises. In enterprise scenarios it is a lot more effecient to do index share with teammates when there are large repo or many repos. We also have advanced features like branch dedupe etc designed for enterprise users. +[CocoIndex](https://github.com/cocoindex-io/cocoindex) is an ultra efficient indexing engine that also works on large codebases at scale for enterprises. In enterprise scenarios it is a lot more efficient to share indexes with teammates when there are large or many repos. We also have advanced features like branch dedupe etc designed for enterprise users. -If you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help!! +If you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help! ## License diff --git a/skills/ccc/SKILL.md b/skills/ccc/SKILL.md new file mode 100644 index 0000000..78f435f --- /dev/null +++ b/skills/ccc/SKILL.md @@ -0,0 +1,85 @@ +--- +name: ccc +description: "This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'." +--- + +# ccc - Semantic Code Search & Indexing + +`ccc` is the CLI for CocoIndex Code, providing semantic search over the current codebase and index management. + +## Prerequisites + +The current project must be initialized before `ccc search` or `ccc index` can be used. If either command fails with an error about missing initialization or the tool not being found, refer to [management.md](references/management.md) for installation and initialization instructions. + +## Searching the Codebase + +To perform a semantic search: + +```bash +ccc search +``` + +The query should describe the concept, functionality, or behavior to find, not exact code syntax. For example: + +```bash +ccc search database connection pooling +ccc search user authentication flow +ccc search error handling retry logic +``` + +### Filtering Results + +- **By language** (`--lang`, repeatable): restrict results to specific languages. + + ```bash + ccc search --lang python --lang markdown database schema + ``` + +- **By path** (`--path`): restrict results to a glob pattern relative to project root. If omitted, defaults to the current working directory (only results under that subdirectory are returned). + + ```bash + ccc search --path 'src/api/*' request validation + ``` + +### Pagination + +Results default to the first page. To retrieve additional results: + +```bash +ccc search --offset 5 --limit 5 database schema +``` + +If all returned results look relevant, use `--offset` to fetch the next page — there are likely more useful matches beyond the first page. + +### Refreshing the Index Before Search + +To ensure the index reflects the latest code changes before searching: + +```bash +ccc search --refresh +``` + +This is equivalent to running `ccc index` followed by `ccc search`. + +### Working with Search Results + +Search results include file paths and line ranges. To explore a result in more detail: + +- Use the editor's built-in file reading capabilities (e.g., the `Read` tool) to load the matched file and read lines around the returned range for full context. +- When working in a terminal without a file-reading tool, use `sed -n ',p' ` to extract a specific line range. + +## Updating the Index + +After making code changes, update the index to keep search results current: + +```bash +ccc index +``` + +This blocks until indexing completes, showing progress. If indexing is already in progress, it waits for completion. + +Run `ccc index` proactively after significant code changes (new files, refactors, renamed modules) to ensure subsequent searches return accurate results. + +## Management & Troubleshooting + +For installation, initialization, daemon management, troubleshooting, and cleanup commands, see [management.md](references/management.md). diff --git a/skills/ccc/references/management.md b/skills/ccc/references/management.md new file mode 100644 index 0000000..acf7691 --- /dev/null +++ b/skills/ccc/references/management.md @@ -0,0 +1,85 @@ +# ccc Management + +## Installation + +Install CocoIndex Code via pipx: + +```bash +pipx install cocoindex-code +``` + +To upgrade to the latest version: + +```bash +pipx upgrade cocoindex-code +``` + +After installation, the `ccc` command is available globally. + +## Project Initialization + +Run from the root directory of the project to index: + +```bash +ccc init +``` + +This creates: +- `~/.cocoindex_code/global_settings.yml` (user-level settings, e.g., model configuration) if it does not already exist. +- `.cocoindex_code/settings.yml` (project-level settings, e.g., include/exclude file patterns). + +If `.git` exists in the directory, `.cocoindex_code/` is automatically added to `.gitignore`. + +Use `-f` to skip the confirmation prompt if `ccc init` detects a potential parent project root. + +After initialization, edit the settings files if needed, then run `ccc index` to build the initial index. + +## Troubleshooting + +### Checking Project Status + +To view the current project's index status: + +```bash +ccc status +``` + +This shows whether indexing is ongoing and index statistics. + +### Daemon Management + +The daemon starts automatically on first use. To check its status: + +```bash +ccc daemon status +``` + +This shows whether the daemon is running, its version, uptime, and loaded projects. + +To restart the daemon (useful if it gets into a bad state): + +```bash +ccc daemon restart +``` + +To stop the daemon: + +```bash +ccc daemon stop +``` + +## Cleanup + +To reset a project's index (removes databases, keeps settings): + +```bash +ccc reset +``` + +To fully remove all CocoIndex Code data for a project (including settings): + +```bash +ccc reset --all +``` + +Both commands prompt for confirmation. Use `-f` to skip.