The indexer (src/cli/indexer.ts) walks the project directory and dispatches files to three serial queues for parsing and embedding. During initial indexing, the queues run sequentially by phase to reduce peak memory usage; after that, the file watcher dispatches to all queues in parallel.
flowchart TD
FILE[File detected] --> MATCH{Pattern matching}
MATCH -->|docs patterns| DQ[Docs Queue]
MATCH -->|code patterns| CQ[Code Queue]
MATCH -->|all files| FQ[File Index Queue]
DQ --> DP[parseFile + embedBatch]
CQ --> CP[parseCodeFile + embedBatch]
FQ --> FP[fs.stat + embed]
DP --> DG[Update DocGraph]
CP --> CG[Update CodeGraph]
FP --> FG[Update FileIndexGraph]
During initial indexing, the three queues run sequentially by phase rather than concurrently. This ensures only one embedding model is loaded at a time, reducing peak memory:
Phase 1: docs → scan(docs) + drain(docs) — triggers bge-m3 lazy load
Phase 2: files → scan(files) + drain(files) — reuses bge-m3 (already loaded)
Phase 3: code → scan(code) + drain(code) — triggers jina-code lazy load
Finalize: rebuildDirectoryStats, resolvePendingLinks, scanMirrorDirs (K/T/S)
The IndexPhase type defines the three phases: "docs" | "files" | "code". ProjectManager.startIndexingPhase(phase) runs scan(phase) + drain(phase) for a single phase, and ProjectManager.finalizeIndexing() runs the post-indexing steps.
After initial indexing completes, the chokidar watcher dispatches to all three queues concurrently as before.
Each queue is a Promise chain — queue = queue.then(fn).catch(log). Errors are logged to stderr but don't stop the queue.
parseFile()— parses markdown intoChunk[](heading-based sections + code blocks)embedBatch()— embeds all chunks in one forward passupdateFile()— replaces nodes and edges in DocGraph
parseCodeFile()— extracts AST symbols via tree-sitterembedBatch()— embeds all symbols in one forward passupdateCodeFile()— replaces nodes and edges in CodeGraph
fs.stat()— reads file size, mtimeembed()— embeds the file pathupdateFileEntry()— adds/updates node in FileIndexGraph
When a file is detected:
- Check against
exclude— if matches, skip entirely - Check against
graphs.docs.include— if matches, enqueue to docs queue - Check against
graphs.code.include— if matches, enqueue to code queue - All non-excluded files are always enqueued to the file index queue
- Check if graph is
enabled: false— disabled graphs skip their queue
A single file can be dispatched to multiple queues (e.g. a .ts file goes to both code and file index queues).
Walks projectDir recursively with fs.readdirSync. For each entry:
- Skips dotfiles/dotdirs (names starting with
.) - Skips
ALWAYS_IGNOREDdirectories (node_modules,dist,build, etc.) at any nesting level - Prunes directories matching the exclude pattern (not descended into)
- Dispatches matching files to relevant queues
When called with an IndexPhase argument ("docs", "files", or "code"), only dispatches files to the queue for that phase. When called without arguments, dispatches to all queues (used by the watcher).
Starts a chokidar watcher on projectDir. Events:
add/change→ dispatched to queues (same logic as scan)unlink→ enqueued removal of file's nodes from relevant graphs (serialized with adds to prevent races)
See Watcher for details.
When called with an IndexPhase argument, waits for only the specified queue to complete. When called without arguments, waits for all three queues:
// drain all queues
await Promise.all([docsQueue, codeQueue, fileQueue]);
// drain single phase
await docsQueue; // drain("docs")During initial indexing, drain(phase) is called after each scan(phase) to ensure one phase finishes before the next begins. The post-drain steps (rebuild directory stats, resolve pending links, scan mirror dirs) are handled separately by ProjectManager.finalizeIndexing().
Files are skipped if their mtime matches what's already stored in the graph node. This means:
- First indexing processes all files
- Subsequent starts only process changed files
- The
--reindexflag forces re-processing of everything
Docs and code queues use embedBatch() to embed all chunks/symbols per file in a single forward pass through the embedding model. This is more efficient than embedding one at a time.
The file index queue uses embed() for single items (one file path per call).
updateCodeFile() skips cross-file edges (e.g. imports) whose target node is not yet indexed. When the target file is later indexed, those edges are not automatically restored — the source file must be re-indexed (or a full rescan run) to pick them up.
When a file is removed (unlink event):
- Remove file's nodes from DocGraph and/or CodeGraph
- Remove file's node from FileIndexGraph
cleanupProxies()— remove orphaned cross-graph proxy nodes in KnowledgeGraph, TaskGraph, and SkillGraph that pointed to the removed file's nodes
Each graph can have its own include and exclude patterns:
projects:
my-app:
projectDir: "/path/to/my-app"
# Server default exclude (**/node_modules/**, **/dist/**) always applies.
# Project-level exclude adds to server defaults:
exclude: "**/coverage/**"
graphs:
docs:
include: "**/*.md" # default
exclude: "**/drafts/**" # overrides project-level exclude
code:
include: "**/*.{js,ts,jsx,tsx,mjs,mts,cjs,cts}" # defaultThe graph-level exclude overrides the project-level one (not merged).