CodeCortex

Persistent codebase knowledge layer for AI agents. Pre-builds architecture, dependency, coupling, and risk knowledge so agents skip the cold start and go straight to the right files.

Stack

TypeScript, ESM ("type": "module")
tree-sitter (native N-API) + 27 language grammar packages
@modelcontextprotocol/sdk - MCP server (stdio transport)
commander - CLI (init, serve, update, inject, status, symbols, search, modules, hotspots, hook, upgrade)
simple-git - git integration + temporal analysis
zod - schema validation for LLM analysis results
yaml - cortex.yaml manifest
glob - file discovery

Architecture

Three-tier knowledge storage in .codecortex/ flat files:

HOT (always loaded): cortex.yaml, constitution.md, overview.md, graph.json, symbols.json, temporal.json
WARM (per-module): modules/*.md
COLD (on-demand): decisions/.md, sessions/.md, patterns.md

Hybrid extraction:

Tree-sitter native N-API → symbols (name, kind, signature, startLine, endLine, exported), imports, exports, call edges
Host LLM → module summaries, decisions, patterns, session diffs

Six Knowledge Layers

Structural (graph.json + symbols.json) - modules, deps, entry points, symbol index
Semantic (modules/*.md) - what each module DOES
Temporal (temporal.json) - git co-change coupling, hotspots, bug archaeology
Decisions (decisions/*.md) - WHY things are built this way
Patterns (patterns.md) - HOW code is written here
Sessions (sessions/*.md) - what CHANGED between sessions

Scripts

npm run dev - watch mode development
npm run build - tsup build to dist/
npm run test - vitest
npm run lint - tsc --noEmit

CLI

codecortex init - discover + extract + temporal analysis → write .codecortex/
codecortex serve - start MCP server
codecortex update - re-extract changed files → update modules
codecortex status - knowledge freshness, stale modules, symbol counts
codecortex symbols [query] - browse and filter the symbol index
codecortex search <query> - search across all knowledge files
codecortex modules [name] - list modules or deep-dive into one
codecortex inject - regenerate inline context in CLAUDE.md and agent config files
codecortex hotspots - files ranked by risk (churn + coupling + bugs)
codecortex hook install|uninstall|status - manage git hooks for auto-update
codecortex upgrade - check for and install latest version

MCP Tools (5)

get_project_overview, get_dependency_graph, lookup_symbol, get_change_coupling, get_edit_briefing

MCP Resources (3)

codecortex://project/overview — constitution (architecture, risk map)
codecortex://project/hotspots — risk-ranked files
codecortex://module/{name} — module documentation (template)

MCP Prompts (2)

start_session — constitution + latest session for context
before_editing — risk assessment for files you plan to edit

All tools include _freshness metadata (status, lastAnalyzed, filesChangedSince, changedFiles, message). All tools return context-safe responses (<10K chars) via truncation utilities in src/utils/truncate.ts.

Pre-Publish Checklist

Run ALL of these before npm publish. Do not skip any step.

npx tsc --noEmit — must be clean
npm run build — must succeed
npm test — all tests must pass (grammar smoke test loads every language)
node dist/cli/index.js --version — verify version matches package.json
node dist/cli/index.js --help — verify grouped help renders correctly
node dist/cli/index.js hook --help — verify subcommand help is flat (not grouped)
npm pack --dry-run — verify tarball contents (no stale files, no secrets)
Verify version is bumped in BOTH package.json AND src/mcp/server.ts
If adding/removing a language: update count in README, CLAUDE.md, site/

What the tests catch

Grammar smoke test (parser.test.ts): Loads every language in LANGUAGE_LOADERS via parseSource(). Catches missing packages, broken native builds, wrong require paths. This is what would have caught the tree-sitter-liquid issue.
Version-check tests: Update notification, cache lifecycle, PM detection, upgrade commands.
Hook tests: Git hook install/uninstall/status integration tests.
MCP tests: All 5 tools, resources, prompts, simulation tests.

Known limitations

tree-sitter native bindings don't compile on Node 24 yet (upstream issue)
Some grammar packages need --legacy-peer-deps due to peer dep mismatches with tree-sitter@0.25
Grammar smoke test skips NODE_MODULE_VERSION and "Invalid language object" errors (native binding issues, not code bugs)

Key Patterns

All MCP tool handlers return { content: [{ type: 'text', text: JSON.stringify(...) }] }
Use stderr for logging (stdout reserved for JSON-RPC in stdio mode)
All file paths in .codecortex/ are relative to project root
Zod schemas validate LLM analysis input before persisting
Discovery respects .gitignore via git ls-files

Directory Structure

src/
  cli/           - commander CLI (init, serve, update, status)
  mcp/           - MCP server + tools
  core/          - knowledge store (graph, modules, decisions, sessions, patterns, constitution, search, agent-instructions, context-injection, freshness)
  extraction/    - tree-sitter native N-API (parser, symbols, imports, calls)
  git/           - git diff, history, temporal analysis
  types/         - TypeScript types + Zod schemas
  utils/         - file I/O, YAML, markdown helpers, truncation

Temporal Analysis

Change coupling: file pairs that co-change (hidden deps not in import graph)
Hotspots: files ranked by change frequency (high churn = risky)
Bug archaeology: fix/bug commit messages → learned lessons per module
Stability signals: days since last change, change velocity per file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeCortex

Stack

Architecture

Six Knowledge Layers

Scripts

CLI

MCP Tools (5)

MCP Resources (3)

MCP Prompts (2)

Pre-Publish Checklist

What the tests catch

Known limitations

Key Patterns

Directory Structure

Temporal Analysis

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CodeCortex

Stack

Architecture

Six Knowledge Layers

Scripts

CLI

MCP Tools (5)

MCP Resources (3)

MCP Prompts (2)

Pre-Publish Checklist

What the tests catch

Known limitations

Key Patterns

Directory Structure

Temporal Analysis