This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
git-ai is a local code understanding tool that builds a semantic layer for codebases using advanced RAG techniques. It combines vector search (LanceDB) with graph-based analysis (CozoDB) to enable AI Agents to deeply understand code structure and relationships beyond simple text search.
Key Design Principle: Indices travel with code in Git repos—checkout, branch, or tag any version and the semantic index is immediately available without rebuilding.
# Build
npm run build # Compile TypeScript to dist/
# Development run
npm run start -- --help # Run directly with ts-node
# Testing
npm test # Full test suite (build + E2E)
npm run test:cli # CLI-specific tests
npm run test:parser # Parser verification
# Global install for local testing
npm i -g .Important: After building, test with the compiled CLI to verify packaging:
node dist/bin/git-ai.js --helpCLI Layer (src/cli/)
↓
Core Layer (src/core/)
↓
Data Layer (LanceDB + CozoDB)
CLI Layer (src/cli/):
- Commands: Commander.js command definitions in
cli/commands/ - Handlers: Business logic in
cli/handlers/(one per command type) - Schemas: Zod validation schemas in
cli/schemas/ - Types: CLI-specific types and the
executeHandlerwrapper incli/types.ts
Core Layer (src/core/):
- indexer.ts / indexerIncremental.ts: Parallel indexing with worker pools
- lancedb.ts: Vector database (SQ8-quantized embeddings)
- cozo.ts / astGraph.ts: Graph database for AST relationships
- parser.ts: Tree-sitter based multi-language parsing
- embedding.ts: ONNX-based semantic embeddings
- search.ts: Multi-strategy retrieval (vector + graph + hybrid)
- repoMap.ts: PageRank-based importance scoring
Indexing: Source files → Tree-sitter AST → Embeddings + Symbol extraction → LanceDB (chunks) + CozoDB (refs)
Search: Query → Classification → Multi-strategy retrieval → Reranking → Results
All CLI commands output JSON for agent readability:
Success:
{
"ok": true,
"command": "semantic",
"repoRoot": "/path/to/repo",
"timestamp": "2024-01-01T00:00:00Z",
"duration_ms": 123,
"data": { ... }
}Error:
{
"ok": false,
"reason": "index_not_found",
"message": "No semantic index found",
"command": "semantic",
"hint": "Run 'git-ai ai index --overwrite' to create an index"
}See src/cli/types.ts for CLIResult, CLIError, ErrorReasons, and ErrorHints.
bin/git-ai.ts: Main CLI—proxies to git for non-AI commands, registersaicommandsrc/commands/ai.ts: AI command registry (allgit-ai ai *subcommands)
src/core/indexer.ts: Parallel indexing with HNSW vector indexsrc/core/indexerIncremental.ts: Smart rebuild strategiessrc/core/parser.ts: Multi-language Tree-sitter adapterssrc/core/embedding.ts: ONNX runtime for local embeddingssrc/core/lancedb.ts: LanceDB management (chunks table)src/core/sq8.ts: Vector quantization for storage efficiency
src/core/search.ts: Query classification and multi-strategy routingsrc/core/symbolSearch.ts: Symbol-based search functionalitysrc/core/astGraphQuery.ts: Graph-based call relationship queries
src/core/cozo.ts: CozoDB interface (refs table)src/core/astGraph.ts: AST graph construction
src/core/git.ts: Git repository handlingsrc/core/workspace.ts: Workspace path resolutionsrc/core/manifest.ts: Index versioning and compatibility checkingsrc/core/indexCheck.ts: Index validation
src/core/archive.ts: Pack/unpack index archives (.git-ai/lancedb.tar.gz)src/core/lfs.ts: Git LFS integration for index storage
src/mcp/server.ts: MCP server implementation (stdio + HTTP modes)src/mcp/handlers/: MCP tool implementationssrc/mcp/tools/: MCP tool registry
The MCP Server enables AI Agents to query git-ai indices. All MCP tools require a path parameter to specify the target repository—no implicit repository selection for atomic operation.
Two modes:
- stdio mode (default): Single-agent connection
- HTTP mode (
--http): Multiple concurrent agents with session management
Supported languages are in src/core/parser.ts:
- TypeScript/JavaScript (
.ts,.tsx,.js,.jsx) - Java (
.java) - Python (
.py) - Go (
.go) - Rust (
.rs) - C (
.c,.h) - Markdown (
.md,.mdx) - YAML (
.yml,.yaml)
Each language has a separate LanceDB table with its own HNSW index.
Indexing respects three filter mechanisms (priority order):
.aiignore- Highest priority, explicit exclusions.git-ai/include.txt- Force-include overrides.gitignore.gitignore- Standard Git ignore patterns
Pattern syntax: ** (any dirs), * (any chars), directory/ (entire dir)
Tests are located in test/ with multiple formats (.test.mjs, .test.ts, .test.js).
Run single tests with Node's native test runner:
node --test test/cliCommands.test.jsThis project uses native modules that may need build tools:
@lancedb/lancedb- Vector database (platform-specific prebuilt binaries)cozo-node- Graph databaseonnxruntime-node- ONNX runtimetree-sitter-*- Language parsers
If native builds fail, ensure:
- Node.js >= 18
- Build tools installed (Windows: Visual Studio Build Tools, Linux: build-essential)
Add a new CLI command:
- Create handler in
src/cli/handlers/yourHandler.ts - Create Zod schema in
src/cli/schemas/(optional) - Register in
src/cli/registry.ts - Add Commander command in
src/cli/commands/yourCommand.ts - Register in
src/commands/ai.ts
Add language support:
- Add Tree-sitter grammar in
package.jsondependencies - Extend
src/core/parser.tswith new language adapter - Test with
npm run test:parser
Add MCP tool:
- Create handler in
src/mcp/handlers/ - Register in
src/mcp/tools/ - Export from
src/mcp/server.ts