mars167 · mars167 · Feb 1, 2026 · Jan 31, 2026 · Jan 31, 2026 · Feb 1, 2026
diff --git a/.git-ai/lancedb.tar.gz b/.git-ai/lancedb.tar.gz
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -19,7 +19,7 @@ jobs:
       - name: Setup Node
         uses: actions/setup-node@v4
         with:
-          node-version: "20"
+          node-version: "22"
           cache: "npm"
           cache-dependency-path: package-lock.json
 

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,77 @@
+# PROJECT KNOWLEDGE BASE
+
+**Generated:** 2026-01-31 23:03
+**Commit:** 680e8f2
+**Branch:** copilot/add-index-commit-id-feature
+
+## OVERVIEW
+git-ai CLI + MCP server. TypeScript implementation for AI-powered Git operations with semantic search, DSR (Deterministic Semantic Record), and graph-based code analysis. Indices stored in `.git-ai/`.
+
+## STRUCTURE
+```
+git-ai-cli-v2/
+├── src/
+│   ├── commands/     # CLI subcommands (ai, graph, query, etc.)
+│   ├── core/         # Indexing, DSR, graph, storage, parsers
+│   └── mcp/          # MCP server implementation
+├── test/             # Node test runner tests
+├── dist/             # Build output
+└── .git-ai/          # Indices (LanceDB + DSR)
+```
+
+## WHERE TO LOOK
+| Task | Location |
+|------|----------|
+| CLI commands | `src/commands/*.ts` |
+| Indexing logic | `src/core/indexer.ts`, `src/core/indexerIncremental.ts` |
+| DSR (commit records) | `src/core/dsr/`, `src/core/dsr.ts` |
+| Graph queries | `src/core/cozo.ts`, `src/core/astGraph.ts` |
+| Semantic search | `src/core/semantic.ts`, `src/core/sq8.ts` |
+| MCP tools | `src/mcp/`, `src/core/graph.ts` |
+| Language parsers | `src/core/parser/*.ts` |
+
+## CODE MAP
+| Symbol | Type | Location | Role |
+|--------|------|----------|------|
+| `indexer` | fn | `core/indexer.ts` | Full repository indexing |
+| `incrementalIndexer` | fn | `core/indexerIncremental.ts` | Incremental updates |
+| `GitAiService` | class | `mcp/index.ts` | MCP entry point |
+| `runDsr` | fn | `commands/dsr.ts` | DSR CLI command |
+| `cozoQuery` | fn | `core/cozo.ts` | Graph DB queries |
+| `semanticSearch` | fn | `core/semantic.ts` | Vector similarity |
+| `resolveGitRoot` | fn | `core/git.ts` | Repo boundary detection |
+
+## CONVENTIONS
+- **strict: true** TypeScript - no implicit any
+- **Imports**: Node built-ins → external deps → internal modules
+- **Formatting**: 2 spaces, single quotes, trailing commas
+- **Errors**: Structured JSON logging via `createLogger`
+- **CLI output**: JSON on stdout, logs on stderr
+- **External inputs**: Use `unknown`, narrow early
+
+## ANTI-PATTERNS (THIS PROJECT)
+- Never suppress type errors (`as any`, `@ts-ignore`)
+- Never throw raw strings - throw `Error` objects
+- Never commit without explicit request
+- No empty catch blocks
+
+## UNIQUE STYLES
+- `.git-ai/` directory for all index data (not config files)
+- MCP tools require explicit `path` argument
+- DSR files per commit for reproducible queries
+- Multi-language parser architecture (TS, Go, Rust, Python, C, Markdown, YAML)
+
+## COMMANDS
+```bash
+npm i              # Install dependencies
+npm run build      # Build to dist/
+npm run start      # Dev run (e.g., --help)
+npm test           # Build + node --test
+node dist/bin/git-ai.js --help  # Validate packaged output
+```
+
+## NOTES
+- Indices auto-update on git operations
+- `checkIndex` gates symbol/semantic/graph queries
+- DSR commit hash mismatch with HEAD triggers warning
+- MCP server exposes git-ai tools for external IDEs
diff --git a/docs/README.md b/docs/README.md
@@ -38,6 +38,7 @@ This collects all documentation for `git-ai`.
 - [Advanced: Index Archiving & LFS](./zh-CN/advanced.md) (Chinese)
 - [Architecture Design](./zh-CN/design.md) (Chinese)
 - [Development Rules](./zh-CN/rules.md) (Chinese)
+- [Cross-Encoder Reranking](./cross-encoder.md) (English)
 
 ## Agent Integration
 - [MCP Skill & Rule Templates](./zh-CN/mcp.md#agent-skills--rules) (Chinese)
diff --git a/docs/cross-encoder.md b/docs/cross-encoder.md
@@ -0,0 +1,157 @@
+# Cross-Encoder Reranking & ONNX Runtime
+
+## Overview
+
+git-ai v2.2+ includes an optional **Cross-Encoder Reranking** feature that uses ONNX Runtime for high-quality result re-ranking. This is an optional enhancement that improves search result quality when a model is available.
+
+## Architecture
+
+```
+Query → [Vector Search] → [Graph Search] → [DSR Search] → [Cross-Encoder Rerank] → Results
+```
+
+The cross-encoder takes query-candidate pairs and scores their relevance, providing higher quality re-ranking than simple score fusion.
+
+## Configuration
+
+### Model Path
+
+The cross-encoder uses a configurable model path. By default, it looks for:
+1. `<modelName>` (as absolute or relative path)
+2. `<modelName>/model.onnx`
+3. `<modelName>/onnx/model.onnx`
+
+The default model name is `non-existent-model.onnx`, which means the system will use hash-based fallback by default.
+
+```typescript
+// Reranker configuration
+interface RerankerConfig {
+  modelName: string;      // Path to ONNX model
+  device: 'cpu' | 'gpu';  // Execution device
+  batchSize: number;      // Batch processing size
+  topK: number;           // Max candidates to re-rank
+  scoreWeights: {
+    original: number;      // Weight for original retrieval score
+    crossEncoder: number;  // Weight for cross-encoder score
+  };
+}
+```
+
+### Default Behavior
+
+When no model is found, the system automatically falls back to **hash-based scoring**:
+- Uses `hashEmbedding` to create query-content vectors
+- Computes similarity via sigmoid(sum)
+- No external dependencies required
+
+This ensures the system works even without ONNX models.
+
+## Installing ONNX Models
+
+To enable cross-encoder reranking, download a compatible model (e.g., MiniLM, CodeBERT) and configure the path:
+
+```bash
+# Example: Download a cross-encoder model
+mkdir -p models/cross-encoder
+cd models/cross-encoder
+# Download your ONNX model (e.g., from HuggingFace, ONNX Model Zoo)
+# Place model.onnx in this directory
+```
+
+## Performance Considerations
+
+### Memory
+- ONNX Runtime loads models into memory
+- GPU memory required for GPU inference
+- CPU inference works on any modern CPU
+
+### Batch Processing
+- Configure `batchSize` based on available memory
+- Larger batches = better throughput but more memory
+
+### Supported Backends
+- **CPU**: All platforms, no additional setup
+- **GPU**: CUDA-enabled systems (optional CUDA execution provider)
+
+## API Usage
+
+### CLI (Not yet exposed)
+
+Cross-encoder is currently used internally by the retrieval pipeline.
+
+### Programmatic
+
+```typescript
+import { CrossEncoderReranker } from 'git-ai';
+
+const reranker = new CrossEncoderReranker({
+  modelName: './models/cross-encoder',
+  device: 'cpu',
+  batchSize: 32,
+  topK: 100,
+  scoreWeights: {
+    original: 0.3,
+    crossEncoder: 0.7,
+  },
+});
+
+const results = await reranker.rerank('authentication logic', candidates);
+```
+
+## Fallback Mechanism
+
+The system handles missing models gracefully:
+
+1. **Model file missing** → Log `cross_encoder_model_missing` and use hash fallback
+2. **ONNX load failed** → Log `cross_encoder_fallback` and use hash fallback
+3. **Inference error** → Log error and continue with fallback
+
+No crashes or service interruption when model is unavailable.
+
+## Comparison: Hash vs ONNX
+
+| Aspect | Hash Fallback | ONNX Cross-Encoder |
+|--------|---------------|-------------------|
+| Quality | Good for exact matches | Excellent for semantic matching |
+| Speed | <1ms | 10-100ms (depending on model) |
+| Dependencies | None | onnxruntime-node |
+| Memory | <1MB | 50-500MB (model size) |
+| GPU Required | No | Optional |
+
+## Troubleshooting
+
+### Model Load Failed
+
+```
+{"level":"warn","msg":"cross_encoder_fallback","err":"..."}
+```
+
+Causes:
+- Model file doesn't exist
+- Corrupted model file
+- Incompatible ONNX opset version
+
+Solution:
+1. Verify model path is correct
+2. Check model file is valid ONNX
+3. Ensure onnxruntime-node is installed
+
+### Out of Memory
+
+Reduce `batchSize` in configuration or use CPU backend.
+
+### Slow Inference
+
+- Use smaller models (MiniLM instead of large BERT)
+- Enable batching for multiple queries
+- Consider GPU for large-scale usage
+
+## Dependencies
+
+```json
+{
+  "onnxruntime-node": "^1.19.2"
+}
+```
+
+Required for cross-encoder functionality. Optional - system works without it.
diff --git a/docs/zh-CN/rules.md b/docs/zh-CN/rules.md
@@ -1,11 +1,7 @@
 # Rules
 
-## 1. 约束
-- 索引只针对当前 HEAD 工作区；不在查询接口中显式建模版本/分支。
-- 索引数据可被 Git 管理；不写入任何敏感信息。
-
-## 2. 代码规范
-- TypeScript strict。
-- 不引入外部 embedding 服务依赖（默认使用本地确定性 embedding）。
-- 所有 CLI 输出尽量用 JSON，便于实验与自动化。
-
+## 规则
+- 优先用 git-ai MCP；若 `index_status` 可用先检查；不可用才用传统命令。
+- 索引只针对当前 HEAD；不在查询接口显式建模版本/分支。
+- 索引数据可被 Git 管理；不写入敏感信息。
+- TypeScript strict；不引入外部 embedding；CLI 输出尽量 JSON。
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "git-ai",
-  "version": "2.1.0",
+  "version": "2.2.0",
   "main": "dist/index.js",
   "bin": {
     "git-ai": "dist/bin/git-ai.js"
@@ -11,7 +11,7 @@
   "scripts": {
     "build": "tsc",
     "start": "ts-node bin/git-ai.ts",
-    "test": "npm run build && node --test",
+    "test": "npm run build && node --test test/*.test.mjs test/*.test.ts",
     "test:parser": "ts-node test/verify_parsing.ts"
   },
   "files": [
@@ -45,6 +45,7 @@
     "commander": "^14.0.2",
     "fs-extra": "^11.3.3",
     "glob": "^13.0.0",
+    "onnxruntime-node": "^1.19.2",
     "simple-git": "^3.30.0",
     "tar": "^7.5.3",
     "tree-sitter": "^0.21.1",