A local MCP server that attaches to your codebase and gives Claude Code and Codex semantic search, symbol extraction, and auto-reindexing - no cloud, no API keys.
- Symbol extraction — parses every file with tree-sitter and indexes classes, methods, and functions
- Semantic search — embeds your codebase with a local Ollama model so you can search by meaning, not just text
- Regex search — fast ripgrep/grep fallback for exact pattern matching
- Auto-reindex — watches for file changes and updates the index automatically
- Works offline — everything runs locally via Ollama
| Language | Extensions |
|---|---|
| C# | .cs |
| JavaScript | .js, .jsx, .mjs, .cjs |
| TypeScript | .ts, .tsx |
| HTML | .html, .htm |
| Python | .py, .pyw |
| Dart | .dart |
- Python 3.11+
- uv (installed automatically by setup scripts)
- Ollama for semantic search
- git + a C compiler (
gccorclang) for the Dart grammar
git clone https://github.com/zjs81/barnacle-search.git
cd barnacle-search
./setup.shgit clone https://github.com/zjs81/barnacle-search.git
cd barnacle-search
.\setup.ps1The setup script will:
- Install
uvif not already present - Install all Python dependencies
- Compile the Dart tree-sitter grammar from source
- Register
barnacle-searchas a global MCP server in Claude Code (~/.claude.json) - Add a managed
barnacle-searchguidance block to Claude user memory (~/.claude/CLAUDE.md) so Claude Code knows when and how to use the server - Add a Claude permission allow rule for the Barnacle MCP server (
~/.claude/settings.json) so Claude can call its tools without prompting on first use - Register
barnacle-searchas a global MCP server in Codex (~/.codex/config.toml) - Add a managed
barnacle-searchguidance block to Codex global instructions (~/.codex/AGENTS.md) so exploratory codebase questions bias toward Barnacle first
The setup scripts can also uninstall the MCP registration. They detect whether barnacle-search is currently registered in Claude Code, Codex, or both, then let you choose which target to remove. When uninstalling, they remove only the managed barnacle-search guidance blocks from ~/.claude/CLAUDE.md and ~/.codex/AGENTS.md, plus the Claude permission rule from ~/.claude/settings.json.
Semantic search requires a running Ollama instance with the embedding model pulled:
# macOS
brew install ollama
ollama pull granite-embedding
# Windows
winget install Ollama.Ollama
ollama pull granite-embeddingBarnacle will auto-pull the model if Ollama is running but the model isn't downloaded yet. Structural search (symbols, regex) works fine without Ollama.
After setup, restart Claude Code and/or Codex. Then in any session:
set_project_path("/path/to/your/project")
build_deep_index()
build_deep_index() only needs to run once — the index updates automatically when files change.
If Codex shows barnacle-search as enabled but lists Tools: (none), the MCP process is usually failing before startup because uv cannot use its default cache directory inside the Codex sandbox. Ensure your ~/.codex/config.toml entry includes a writable cache override:
[mcp_servers."barnacle-search"]
command = "uv"
args = ["--directory", "/absolute/path/to/barnacle-search", "run", "barnacle-search"]
env = { UV_CACHE_DIR = "/tmp/barnacle-search-uv-cache" }| Tool | Description |
|---|---|
set_project_path(path) |
Point barnacle at a project directory |
build_deep_index() |
Parse all files and generate embeddings |
semantic_search(query) |
Natural language search over your codebase |
find_files(pattern) |
Glob matching e.g. **/*Service*.cs |
search_code(pattern) |
Regex search across files |
get_file_summary(path) |
Symbols, imports, line count for a file |
get_symbol_body(file, symbol) |
Read source of a specific method or class |
get_index_status() |
File count, language breakdown, embedding count |
To make Claude automatically use barnacle-search in a specific project, add this to your CLAUDE.md:
## Code Navigation
Use the `barnacle-search` MCP tools to explore this codebase.
### Setup (first time per session)
set_project_path("/absolute/path/to/project")
build_deep_index()
### Key tools
- `semantic_search(query="...")` — find by meaning
- `find_files(pattern="**/*.cs")` — find by name
- `search_code(pattern="...")` — find by regex
- `get_file_summary(path="...")` — symbols in a file
- `get_symbol_body(file="...", symbol="MethodName")` — read a methodTo make Codex automatically use barnacle-search in a specific project, add this to your AGENTS.md:
## Code Navigation
Use the `barnacle-search` MCP tools to explore this codebase.
### Setup (first time per session)
set_project_path("/absolute/path/to/project")
build_deep_index()
### Key tools
- `semantic_search(query="...")` — find by meaning
- `find_files(pattern="**/*.cs")` — find by name
- `search_code(pattern="...")` — find by regex
- `get_file_summary(path="...")` — symbols in a file
- `get_symbol_body(file="...", symbol="MethodName")` — read a methodThe setup scripts also add a global Codex guidance block under ~/.codex/AGENTS.md with the same intent, so exploratory codebase questions in any repo can prefer Barnacle first while exact lookups still use rg.
For Claude Code, the setup scripts also add a global guidance block under ~/.claude/CLAUDE.md. This matters because MCP registration makes the server available, but Claude memory is what tells Claude to prefer Barnacle for exploratory codebase work and to call set_project_path() before other Barnacle tools.
They also add mcp__barnacle-search to ~/.claude/settings.json under permissions.allow, which follows Claude Code's MCP permission syntax for allowing all tools from a specific MCP server.
Barnacle uses a two-tier index:
- Shallow index — a lightweight JSON file list with mtimes for fast file lookup without touching the database
- Deep index — a SQLite database with four tables:
files— path, language, line count, imports, exportssymbols— extracted classes/methods/functions with line rangessymbol_embeddings— one vector per symbol (packed float32 BLOB, no numpy/chromadb needed)symbol_fts— FTS5 full-text index over symbol names, signatures, and file paths
Embeddings are generated per symbol (class, method, function), not per file. Each symbol is embedded with its full context:
path/to/File.cs [csharp] > ClassName > MethodName
signature: ClassName.MethodName(int userId, string name)
<up to 510 tokens of body>
This means semantic_search("password hashing") returns PasswordHasher.Hash() directly instead of a file that happens to contain it somewhere. semantic_search results include a matched_symbols list showing which specific symbols scored highest and their individual scores.
Not every symbol gets embedded — imports and trivial methods/functions (≤2 lines) are filtered out as noise. This typically cuts embedding count by 20-40% on real codebases while keeping all the symbols worth searching for.
Embed text is capped at 510 body tokens to prevent giant methods from bloating request size. Symbols are sent to Ollama in batches of 64 for throughput. A large codebase (~28k symbols after filtering) typically indexes in 5-6 minutes.
semantic_search combines two signals to rank results:
- Cosine similarity (70%) — embedding distance between your query and each symbol
- BM25 keyword match (30%) — SQLite FTS5 full-text search over symbol names, signatures, and file paths
Both scores are normalized to 0–1 before blending, so neither dominates by magnitude. The result is that queries like "retry logic" surface symbols whose meaning is close even if the word "retry" doesn't appear, while exact-name queries like "RetryService" get a strong keyword boost that pulls the right symbol to the top.
Embeddings are stored in SQLite as packed float32 BLOBs. No vector database needed at this scale.
Uses FSEventsObserver on macOS and inotify on Linux — directory-level watching with a 500ms debounce, so large repos with node_modules work fine. Changed files are re-parsed and their symbol embeddings regenerated incrementally.