Search Algorithms

Directory: src/lib/search/

All graph searches use the same hybrid algorithm combining BM25 keyword search and vector cosine similarity, fused via Reciprocal Rank Fusion (RRF).

Hybrid search algorithm

flowchart TD
    Q[Search query] --> E[Compute query embedding]
    E --> P{Score all nodes}
    P --> BM25[BM25 keyword score]
    P --> VEC[Vector cosine similarity]
    BM25 --> RRF[Reciprocal Rank Fusion]
    VEC --> RRF
    RRF --> SEED[Select top seeds]
    SEED --> BFS[BFS graph expansion]
    BFS --> MERGE[Merge & deduplicate]
    MERGE --> FILTER[Filter by minScore]
    FILTER --> RESULT[Return top maxResults]

Step 1: Score all nodes

Two scoring methods run in parallel:

Vector scoring: cosine similarity between query embedding and each node's embedding. Nodes with empty embeddings are skipped.

BM25 keyword scoring: BM25 algorithm scores nodes against the query text. The BM25 index is maintained incrementally (updated on every CRUD operation).

Step 2: Reciprocal Rank Fusion (RRF)

Results from both methods are fused using RRF:

score(d) = 1/(k + rank_vector) + 1/(k + rank_bm25)

Where k = 60 by default. This formula gives balanced weight to both ranking methods regardless of score scale differences.

Step 3: Seed selection

From the fused results, take the top topK candidates above minScore.

Step 4: BFS expansion

From each seed node, traverse outgoing and incoming edges up to bfsDepth hops:

Each hop multiplies the score by a decay factor
Code graph uses edge-specific decay: contains (0.95), extends/implements (0.85), imports (0.70). This reflects that a class→method relationship is much tighter than a cross-file import
Other graphs use a uniform bfsDecay (default 0.8)
Prune early if the best possible decay can't pass minScore
Neighboring nodes discovered via BFS inherit decayed scores
Code graph: incoming imports edges are excluded from BFS to avoid noise from popular utility files being pulled in as neighbors of every search result
When bfsDecay is explicitly passed as a parameter, it overrides edge-specific decay (uniform behavior)

Step 5: Merge and output

De-duplicate: keep highest score per node across all seeds and BFS paths
Discard below minScore
Sort descending by score
Cap at maxResults

Search modes

The searchMode parameter controls which scoring methods are used:

Mode	Description
`hybrid`	BM25 + vector (default)
`vector`	Embedding cosine similarity only
`keyword`	BM25 keyword search only

Default parameters

Parameter	Docs/Code/Knowledge/Tasks/Skills	File-level search
`topK`	5	10
`bfsDepth`	1	— (no BFS)
`maxResults`	5	10
`minScore`	0.5	0.3
`bfsDecay`	0.8	—

skills_recall uses minScore: 0.3 for higher recall in task contexts.

BM25 index

File: src/lib/search/bm25.ts

BM25Index class

Maintained incrementally by each graph manager:

add(id, text) — on node creation
remove(id) — on node deletion
update(id, text) — on node update

Tokenizer

Splits text into tokens by:

Whitespace
Punctuation boundaries
CamelCase boundaries: getUserById → [get, user, by, id]

All tokens are lowercased. A minimal stop-word list filters common words that never carry meaning in code search: articles (a, an, the), conjunctions (and, or, but), prepositions (of, with, by, from, in, to, etc.), pronouns (it, he, she, we, they), and modals (would, could, should). Programming-significant words like for, do, if, not, is, has, can are preserved.

Text extraction

Each graph defines what text to extract for BM25 indexing:

Graph	BM25 text
DocGraph	`title + content`
CodeGraph	`name + signature + docComment + body` (body truncated to 2000 chars)
KnowledgeGraph	`title + content`
TaskGraph	`title + description`
SkillGraph	`title + description + triggers` (triggers included!)
FileIndexGraph	`filePath` (file path string)

File-level search

code_search_files, docs_search_files, and files_search use file-level embeddings stored on root nodes:

Code: file path + exported symbol names + import summary
Docs: file path + h1 title
FileIndex: file path

File searches use the same hybrid approach (BM25 + vector, fused via RRF) as node-level search. This means exact filename queries (e.g. "embedder.ts") find files by keyword match, while semantic queries (e.g. "authentication helpers") work via vector similarity. No BFS expansion. Default minScore: 0.3, topK: 10.

File paths are normalized for embedding (slashes/dots → spaces) so that path segments like src, lib, docs_search are treated as separate tokens by the embedding model.

Search modules

Module	File	Used by
Docs search	`src/lib/search/docs.ts`	`docs_search` MCP tool
Code search	`src/lib/search/code.ts`	`code_search` MCP tool
Knowledge search	`src/lib/search/knowledge.ts`	`notes_search` MCP tool
Task search	`src/lib/search/tasks.ts`	`tasks_search` MCP tool
Skill search	`src/lib/search/skills.ts`	`skills_search`, `skills_recall` MCP tools
File-level search	`src/lib/search/files.ts`	`code_search_files`, `docs_search_files` MCP tools
File index search	`src/lib/search/file-index.ts`	`files_search` MCP tool

Proxy node exclusion

Knowledge, Task, and Skill searches automatically exclude proxy nodes (cross-graph phantom nodes). Proxies have empty embeddings and their IDs start with @.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Algorithms

Hybrid search algorithm

Step 1: Score all nodes

Step 2: Reciprocal Rank Fusion (RRF)

Step 3: Seed selection

Step 4: BFS expansion

Step 5: Merge and output

Search modes

Default parameters

BM25 index

BM25Index class

Tokenizer

Text extraction

File-level search

Search modules

Proxy node exclusion

FilesExpand file tree

search.md

Latest commit

History

search.md

File metadata and controls

Search Algorithms

Hybrid search algorithm

Step 1: Score all nodes

Step 2: Reciprocal Rank Fusion (RRF)

Step 3: Seed selection

Step 4: BFS expansion

Step 5: Merge and output

Search modes

Default parameters

BM25 index

BM25Index class

Tokenizer

Text extraction

File-level search

Search modules

Proxy node exclusion