DocGraph indexes markdown files. CodeGraph indexes source code. But a real project has dozens of other file types — configs, images, scripts, data files, lock files, dockerfiles, CI configs. The FileIndexGraph gives the LLM a complete map of the project filesystem.
This answers questions like:
- "What files are in the
src/lib/directory?" - "Is there a Dockerfile in this project?"
- "What TypeScript files exist in the project?"
- "How big is the
dist/directory?"
Everything. Every file and directory in projectDir that doesn't match the exclude pattern gets a node in the FileIndexGraph. This includes:
- Source files (
.ts,.js,.py,.go, etc.) - Config files (
.json,.yaml,.toml,.env) - Documentation (
.md,.txt,.rst) - Images (
.png,.jpg,.svg) - Scripts (
.sh,.bat) - Data files (
.csv,.sql) - Build artifacts (if not excluded)
- Lock files (
package-lock.json) - Anything else
The key difference from DocGraph/CodeGraph: those graphs only index files matching their patterns. FileIndexGraph indexes all files.
Each file gets a node with rich metadata:
| Field | Description | Example |
|---|---|---|
filePath |
Relative path (= node ID) | src/lib/embedder.ts |
fileName |
Basename | embedder.ts |
directory |
Parent directory | src/lib |
extension |
File extension | .ts |
language |
Detected programming language | typescript |
mimeType |
IANA MIME type | text/typescript |
size |
File size in bytes | 4096 |
mtime |
Last modification time | 1710547200000 |
Extension-based lookup supporting ~80 file types:
.ts → typescript, .py → python, .rs → rust, .go → go, .java → java, .rb → ruby, .md → markdown, .json → json, .yaml → yaml, .sh → shell, .sql → sql, .html → html, .css → css, etc.
Unknown extensions → null.
Uses the mime npm library (IANA-complete database):
.ts → text/typescript, .png → image/png, .json → application/json, etc.
Directories also get nodes in the graph:
| Field | Value |
|---|---|
kind |
directory |
filePath |
src/lib |
size |
Sum of direct children file sizes |
fileCount |
Count of direct children files |
embedding |
[] (empty — directories are not searchable) |
When a file is indexed, the system automatically creates nodes for every directory up to the root:
src/lib/parsers/code.ts →
creates "src/lib/parsers" (directory)
creates "src/lib" (directory)
creates "src" (directory)
creates "." (root directory)
Each directory → child relationship gets a contains edge:
"." → [contains] → "src"
"src" → [contains] → "src/lib"
"src/lib" → [contains] → "src/lib/parsers"
"src/lib/parsers" → [contains] → "src/lib/parsers/code.ts"
After the indexer finishes scanning all files, rebuildDirectoryStats() walks the tree bottom-up and computes:
size— total bytes of direct children filesfileCount— count of direct children files
This lets you answer questions like "how big is the src/ directory?" without summing file sizes manually.
File paths are embedded — the path string itself is converted into a vector. This enables semantic search:
files_search({ query: "authentication configuration" })
→ finds src/lib/auth.ts, src/config/auth.yaml, etc.
The embeddings capture semantic meaning in file names and directory structure, so "auth" finds "authentication" and related concepts.
Only file nodes have embeddings — directory nodes have empty embeddings and are excluded from search results.
An LLM starting a new conversation can quickly understand the project layout:
files_list({ directory: "src/", limit: 50 })
→ complete listing of source files
"What configuration files does this project have?"
files_list({ extension: ".yaml" })
files_list({ extension: ".json" })
files_list({ language: "yaml" })
"Find files related to database migrations"
files_search({ query: "database migration" })
→ src/db/migrations/, src/scripts/migrate.ts, etc.
"How big is this file? When was it last modified?"
files_get_info({ filePath: "src/lib/embedder.ts" })
→ { size: 4096, mtime: ..., language: "typescript", mimeType: "text/typescript" }
Notes, tasks, and skills can link to specific files:
notes_create_link({
fromId: "deployment-config-note",
toId: "docker-compose.yaml",
targetGraph: "files",
kind: "documents"
})
This connects knowledge about configuration to the actual file, regardless of whether that file is a doc or source code.
| Graph | What it indexes | How | Purpose |
|---|---|---|---|
| DocGraph | Markdown files matching docs pattern | Parses into heading chunks, extracts code blocks | Semantic search over documentation content |
| CodeGraph | Source files matching code pattern | Parses AST, extracts symbols | Semantic search over code symbols |
| FileIndexGraph | ALL files | Stores metadata + path embedding | File discovery, project structure, metadata |
A single file can exist in all three graphs simultaneously:
docs/api.md→ DocGraph (chunks) + FileIndexGraph (metadata)src/auth.ts→ CodeGraph (symbols) + FileIndexGraph (metadata)Dockerfile→ FileIndexGraph only (no docs/code pattern match)
The FileIndexGraph is always enabled — it has no separate include setting. It indexes everything that passes the project's exclude pattern:
projects:
my-app:
projectDir: "/path/to/my-app"
# Server default exclude (**/node_modules/**, **/dist/**) applies automatically.
# Add project-specific excludes if needed:
exclude: "**/.git/**"
graphs:
files:
enabled: true # can be disabled if not needed