🪸 barnacle-search

A local MCP server that attaches to your codebase and gives Claude Code and Codex semantic search, symbol extraction, and auto-reindexing - no cloud, no API keys.

What it does

Symbol extraction — parses every file with tree-sitter and indexes classes, methods, and functions
Semantic search — embeds your codebase with a local Ollama model so you can search by meaning, not just text
Regex search — fast ripgrep/grep fallback for exact pattern matching
Auto-reindex — watches for file changes and updates the index automatically
Works offline — everything runs locally via Ollama

Supported languages

Language	Extensions
C#	`.cs`
JavaScript	`.js`, `.jsx`, `.mjs`, `.cjs`
TypeScript	`.ts`, `.tsx`
HTML	`.html`, `.htm`
Python	`.py`, `.pyw`
Dart	`.dart`

Requirements

Python 3.11+
uv (installed automatically by setup scripts)
Ollama for semantic search
git + a C compiler (gcc or clang) for the Dart grammar

Setup

macOS / Linux

git clone https://github.com/zjs81/barnacle-search.git
cd barnacle-search
./setup.sh

Windows

git clone https://github.com/zjs81/barnacle-search.git
cd barnacle-search
.\setup.ps1

The setup script will:

Install uv if not already present
Install all Python dependencies
Compile the Dart tree-sitter grammar from source
Register barnacle-search as a global MCP server in Claude Code (~/.claude.json)
Add a managed barnacle-search guidance block to Claude user memory (~/.claude/CLAUDE.md) so Claude Code knows when and how to use the server
Add a Claude permission allow rule for the Barnacle MCP server (~/.claude/settings.json) so Claude can call its tools without prompting on first use
Register barnacle-search as a global MCP server in Codex (~/.codex/config.toml)
Add a managed barnacle-search guidance block to Codex global instructions (~/.codex/AGENTS.md) so exploratory codebase questions bias toward Barnacle first

The setup scripts can also uninstall the MCP registration. They detect whether barnacle-search is currently registered in Claude Code, Codex, or both, then let you choose which target to remove. When uninstalling, they remove only the managed barnacle-search guidance blocks from ~/.claude/CLAUDE.md and ~/.codex/AGENTS.md, plus the Claude permission rule from ~/.claude/settings.json.

Ollama (for semantic search)

Semantic search requires a running Ollama instance with the embedding model pulled:

# macOS
brew install ollama
ollama pull granite-embedding

# Windows
winget install Ollama.Ollama
ollama pull granite-embedding

Barnacle will auto-pull the model if Ollama is running but the model isn't downloaded yet. Structural search (symbols, regex) works fine without Ollama.

Usage in Claude Code or Codex

After setup, restart Claude Code and/or Codex. Then in any session:

set_project_path("/path/to/your/project")
build_deep_index()

build_deep_index() only needs to run once — the index updates automatically when files change.

If Codex shows barnacle-search as enabled but lists Tools: (none), the MCP process is usually failing before startup because uv cannot use its default cache directory inside the Codex sandbox. Ensure your ~/.codex/config.toml entry includes a writable cache override:

[mcp_servers."barnacle-search"]
command = "uv"
args = ["--directory", "/absolute/path/to/barnacle-search", "run", "barnacle-search"]
env = { UV_CACHE_DIR = "/tmp/barnacle-search-uv-cache" }

Available tools

Tool	Description
`set_project_path(path)`	Point barnacle at a project directory
`build_deep_index()`	Parse all files and generate embeddings
`semantic_search(query)`	Natural language search over your codebase
`find_files(pattern)`	Glob matching e.g. `*/Service*.cs`
`search_code(pattern)`	Regex search across files
`get_file_summary(path)`	Symbols, imports, line count for a file
`get_symbol_body(file, symbol)`	Read source of a specific method or class
`get_index_status()`	File count, language breakdown, embedding count

Adding to project instructions

To make Claude automatically use barnacle-search in a specific project, add this to your CLAUDE.md:

## Code Navigation

Use the `barnacle-search` MCP tools to explore this codebase.

### Setup (first time per session)
set_project_path("/absolute/path/to/project")
build_deep_index()

### Key tools
- `semantic_search(query="...")` — find by meaning
- `find_files(pattern="**/*.cs")` — find by name
- `search_code(pattern="...")` — find by regex
- `get_file_summary(path="...")` — symbols in a file
- `get_symbol_body(file="...", symbol="MethodName")` — read a method

To make Codex automatically use barnacle-search in a specific project, add this to your AGENTS.md:

## Code Navigation

Use the `barnacle-search` MCP tools to explore this codebase.

### Setup (first time per session)
set_project_path("/absolute/path/to/project")
build_deep_index()

### Key tools
- `semantic_search(query="...")` — find by meaning
- `find_files(pattern="**/*.cs")` — find by name
- `search_code(pattern="...")` — find by regex
- `get_file_summary(path="...")` — symbols in a file
- `get_symbol_body(file="...", symbol="MethodName")` — read a method

The setup scripts also add a global Codex guidance block under ~/.codex/AGENTS.md with the same intent, so exploratory codebase questions in any repo can prefer Barnacle first while exact lookups still use rg.

For Claude Code, the setup scripts also add a global guidance block under ~/.claude/CLAUDE.md. This matters because MCP registration makes the server available, but Claude memory is what tells Claude to prefer Barnacle for exploratory codebase work and to call set_project_path() before other Barnacle tools.

They also add mcp__barnacle-search to ~/.claude/settings.json under permissions.allow, which follows Claude Code's MCP permission syntax for allowing all tools from a specific MCP server.

How it works

Barnacle uses a two-tier index:

Shallow index — a lightweight JSON file list with mtimes for fast file lookup without touching the database
Deep index — a SQLite database with four tables:
- files — path, language, line count, imports, exports
- symbols — extracted classes/methods/functions with line ranges
- symbol_embeddings — one vector per symbol (packed float32 BLOB, no numpy/chromadb needed)
- symbol_fts — FTS5 full-text index over symbol names, signatures, and file paths

Symbol-level embeddings

Embeddings are generated per symbol (class, method, function), not per file. Each symbol is embedded with its full context:

path/to/File.cs [csharp] > ClassName > MethodName
signature: ClassName.MethodName(int userId, string name)
<up to 510 tokens of body>

This means semantic_search("password hashing") returns PasswordHasher.Hash() directly instead of a file that happens to contain it somewhere. semantic_search results include a matched_symbols list showing which specific symbols scored highest and their individual scores.

Not every symbol gets embedded — imports and trivial methods/functions (≤2 lines) are filtered out as noise. This typically cuts embedding count by 20-40% on real codebases while keeping all the symbols worth searching for.

Embed text is capped at 510 body tokens to prevent giant methods from bloating request size. Symbols are sent to Ollama in batches of 64 for throughput. A large codebase (~28k symbols after filtering) typically indexes in 5-6 minutes.

Hybrid search

semantic_search combines two signals to rank results:

Cosine similarity (70%) — embedding distance between your query and each symbol
BM25 keyword match (30%) — SQLite FTS5 full-text search over symbol names, signatures, and file paths

Both scores are normalized to 0–1 before blending, so neither dominates by magnitude. The result is that queries like "retry logic" surface symbols whose meaning is close even if the word "retry" doesn't appear, while exact-name queries like "RetryService" get a strong keyword boost that pulls the right symbol to the top.

Embeddings are stored in SQLite as packed float32 BLOBs. No vector database needed at this scale.

File watcher

Uses FSEventsObserver on macOS and inotify on Linux — directory-level watching with a 500ms debounce, so large repos with node_modules work fine. Changed files are re-parsed and their symbol embeddings regenerated incrementally.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src/code_indexer		src/code_indexer
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.ps1		setup.ps1
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪸 barnacle-search

What it does

Supported languages

Requirements

Setup

macOS / Linux

Windows

Ollama (for semantic search)

Usage in Claude Code or Codex

Available tools

Adding to project instructions

How it works

Symbol-level embeddings

Hybrid search

File watcher

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪸 barnacle-search

What it does

Supported languages

Requirements

Setup

macOS / Linux

Windows

Ollama (for semantic search)

Usage in Claude Code or Codex

Available tools

Adding to project instructions

How it works

Symbol-level embeddings

Hybrid search

File watcher

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages