βββ βββ βββββββ βββββββ βββββββββββββββ βββ βββββββββββ ββββββββββββββββββββββββ βββ ββββββ ββββββββββββββββββ ββββββββ ββββ βββββββ βββββββββββββββββ βββββββ βββββββ ββββββββββββ ββββββββββββββ βββββ βββββββ βββ ββββββββββββββ
Search code by meaning, not just keywords. 100% offline. Zero cloud dependencies.
curl -fsSL https://vgrep.dev/install.sh | shOr with wget:
wget -qO- https://vgrep.dev/install.sh | shAfter installation, initialize vgrep:
vgrep init
vgrep models downloadΞ½grΞ΅p is a semantic code search tool that uses local LLM embeddings to find code by intent rather than exact text matches. Unlike traditional grep which searches for literal strings, Ξ½grΞ΅p understands the meaning behind your query and finds semantically related code across your entire codebase.
Quick Start:
vgrep init && vgrep servethenvgrep "where is authentication handled?"
- Semantic Search: Find code by intent - search "error handling" to find try/catch blocks, Result types, and exception handlers
- 100% Local: All processing happens on your machine using llama.cpp - no API keys, no cloud, your code stays private
- Server Mode: Keep models loaded in memory for instant sub-100ms searches
- File Watcher: Automatically re-index files as they change
- Cross-Platform: Native binaries for Windows, Linux, and macOS
- GPU Acceleration: Optional CUDA, Metal, and Vulkan support for faster embeddings
Ξ½grΞ΅p uses a client-server architecture optimized for fast repeated searches:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER QUERIES β
β "where is auth handled?" β
β "database connection logic" β
β "error handling patterns" β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ξ½grΞ΅p CLIENT β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Search β β Index β β Watch β β Config β β
β β Command β β Command β β Command β β Editor β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β HTTP API
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ξ½grΞ΅p SERVER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Embedding Engine (llama.cpp) β β
β β Qwen3-Embedding-0.6B β’ Always Loaded β’ Fast β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SQLite Vector Database β β
β β File Hashes β’ Code Chunks β’ Embeddings β’ Metadata β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Source βββββΆβ Chunk βββββΆβ Embed βββββΆβ Store βββββΆβ Search β
β Files β β (512b) β β (LLM) β β (SQLite) β β (Cosine) β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
.rs .py Split into Generate Vector DB Similarity
.js .ts overlapping 768-dim with fast ranking +
.go .c text chunks vectors retrieval results
# Prerequisites: Rust 1.75+, LLVM/Clang, CMake
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
cargo build --release
# Binary at target/release/vgrepcargo build --release --features cuda # NVIDIA GPUs
cargo build --release --features metal # Apple Silicon
cargo build --release --features vulkan # Cross-platform GPU| Component | Minimum | Recommended |
|---|---|---|
| RAM | 2 GB | 4+ GB |
| Disk | 1 GB (models) | 2+ GB |
| CPU | 4 cores | 8+ cores |
| GPU | Optional | CUDA/Metal for 10x speedup |
# Download models and create config (~1GB download)
vgrep init
vgrep models download# Keep this running - loads model once for fast searches
vgrep serveOutput:
>>> vgrep server
Server: http://127.0.0.1:7777
Loading embedding model...
Model loaded successfully!
Endpoints:
β’ GET /health - Health check
β’ GET /status - Index status
β’ POST /search - Semantic search
β’ POST /embed - Generate embeddings
β Press Ctrl+C to stop
# In another terminal - index and auto-update on changes
vgrep watchOutput:
>>> vgrep watcher
Path: /home/user/myproject
Mode: server
Ctrl+C to stop
ββββββββββββββββββββββββββββββββββββββββββββββββββ
>> Initial indexing...
Phase 1: Reading files...
Read 45 files, 312 chunks
Phase 2: Generating embeddings via server...
Generated 312 embeddings
Phase 3: Storing in database...
Stored 45 files
Indexing complete!
Files: 45 indexed, 12 skipped
Chunks: 312
ββββββββββββββββββββββββββββββββββββββββββββββββββ
[~] Watching for changes...
[+] indexed auth.rs
[+] indexed db.rs
# Semantic search - finds by meaning
vgrep "where is authentication handled?"
vgrep "database connection pooling"
vgrep "error handling for network requests"Output:
Searching for: where is authentication handled?
1. ./src/auth/middleware.rs (87.3%)
2. ./src/handlers/login.rs (82.1%)
3. ./src/utils/jwt.rs (76.8%)
4. ./src/config/security.rs (71.2%)
β Found 4 results in 45ms
| Command | Description |
|---|---|
vgrep "query" |
Quick semantic search |
vgrep search "query" -m 20 |
Search with max 20 results |
vgrep search "query" -c |
Show code snippets in results |
vgrep search "query" --sync |
Re-index before searching |
| Command | Description |
|---|---|
vgrep serve |
Start server (keeps model loaded) |
vgrep serve -p 8080 |
Custom port |
vgrep index |
Manual one-time index |
vgrep index --force |
Force re-index all files |
vgrep watch |
Watch and auto-index on changes |
vgrep status |
Show index statistics |
| Command | Description |
|---|---|
vgrep config |
Interactive configuration editor |
vgrep config show |
Display all settings |
vgrep config set mode local |
Set config value |
vgrep config reset |
Reset to defaults |
| Command | Description |
|---|---|
vgrep init |
Initialize vgrep |
vgrep models download |
Download embedding models |
vgrep models list |
Show configured models |
Ξ½grΞ΅p supports assisted installation for popular coding agents:
vgrep install <agent> # Install integration
vgrep uninstall <agent> # Remove integration| Agent | Command |
|---|---|
| Claude Code | vgrep install claude-code |
| OpenCode | vgrep install opencode |
| Codex | vgrep install codex |
| Factory Droid | vgrep install droid |
vgrep install claude-code
vgrep serve # Start server
vgrep watch # Index your project
# Claude Code can now use vgrep for semantic searchvgrep install droid
# vgrep auto-starts when you begin a Droid sessionTo uninstall: vgrep uninstall <agent> (e.g., vgrep uninstall droid).
Ξ½grΞ΅p converts code into high-dimensional vectors that capture semantic meaning:
Input: "fn authenticate(user: &str, pass: &str) -> Result<Token>"
β
Tokenize β Qwen3-Embedding β Normalize
β
Output: [0.023, -0.156, 0.891, ..., 0.045] (768 dimensions)
Queries are embedded and compared using cosine similarity:
Where:
-
$q$ = query embedding vector -
$d$ = document (code chunk) embedding vector - Result in range
$[-1, 1]$ , higher = more similar
Files are split into overlapping chunks for granular search:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Source File β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Chunk 1 (512 chars) β
β βββ Overlap (64 chars) βββ€ β
β Chunk 2 (512 chars) β
β βββ Overlap βββ€ β
β Chunk 3 ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Chunk Size: 512 characters (configurable)
- Overlap: 64 characters to preserve context at boundaries
- Deduplication: Results grouped by file, best chunk shown
| Platform | Path |
|---|---|
| Linux | ~/.vgrep/config.json |
| macOS | ~/.vgrep/config.json |
| Windows | C:\Users\<user>\.vgrep\config.json |
| Setting | Default | Description |
|---|---|---|
mode |
server |
server (recommended) or local |
server_host |
127.0.0.1 |
Server bind address |
server_port |
7777 |
Server port |
max_results |
10 |
Default search results |
max_file_size |
524288 |
Max file size to index (512KB) |
chunk_size |
512 |
Characters per chunk |
chunk_overlap |
64 |
Overlap between chunks |
n_threads |
0 |
CPU threads (0 = auto) |
use_reranker |
true |
Enable result reranking |
All settings can be overridden via environment:
VGREP_HOST=0.0.0.0 # Bind to all interfaces
VGREP_PORT=8080 # Custom port
VGREP_MAX_RESULTS=20 # More results
VGREP_CONTENT=true # Always show snippetsΞ½grΞ΅p server exposes a REST API for programmatic access:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/status |
GET | Index statistics |
/search |
POST | Semantic search |
/embed |
POST | Generate single embedding |
/embed_batch |
POST | Batch embeddings |
curl -X POST http://127.0.0.1:7777/search \
-H 'Content-Type: application/json' \
-d '{
"query": "authentication middleware",
"max_results": 5
}'Response:
{
"results": [
{
"path": "/project/src/auth/middleware.rs",
"score": 0.873,
"score_percent": "87.3%",
"preview": "pub async fn auth_middleware...",
"start_line": 15,
"end_line": 45
}
],
"query": "authentication middleware",
"total": 1
}vgrep/
βββ src/
β βββ cli/ # Command-line interface
β β βββ commands.rs # CLI argument handling
β β βββ interactive.rs # Config editor
β βββ core/ # Core functionality
β β βββ db.rs # SQLite vector storage
β β βββ embeddings.rs # llama.cpp integration
β β βββ indexer.rs # File chunking & indexing
β β βββ search.rs # Similarity search
β βββ server/ # HTTP server
β β βββ api.rs # Axum endpoints
β β βββ client.rs # HTTP client
β βββ ui/ # User interface
β β βββ console.rs # Colored output
β β βββ search_tui.rs # Interactive TUI
β βββ config.rs # Configuration
β βββ watcher.rs # File system watcher
β βββ lib.rs # Library root
β βββ main.rs # Entry point
βββ tests/ # Integration tests
βββ .github/
β βββ workflows/ # CI/CD (test, build, release)
β βββ hooks/ # Git hooks (pre-commit, pre-push)
βββ scripts/ # Development utilities
Ξ½grΞ΅p uses quantized models from HuggingFace for efficient local inference:
| Model | Size | Purpose |
|---|---|---|
| Qwen3-Embedding-0.6B-Q8_0 | ~600 MB | Text β Vector embeddings |
| Qwen3-Reranker-0.6B-Q4_K_M | ~400 MB | Result reranking (optional) |
Models are downloaded to ~/.cache/huggingface/ and cached automatically.
- Use Server Mode: 10-50x faster than local mode for repeated searches
- Enable GPU: CUDA/Metal provides 5-10x speedup for embedding generation
- Watch Mode: Auto-indexes only changed files, not entire codebase
- Tune Chunk Size: Larger chunks = fewer embeddings but less granular results
# Clone with submodules (llama.cpp)
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
# Setup git hooks
./scripts/setup-hooks.sh # Unix
./scripts/setup-hooks.ps1 # Windows
# Build
cargo build
# Test
cargo test
# Lint
cargo clippy --all-targets --all-features
cargo fmt --checkPre-commit and pre-push hooks ensure code quality:
| Hook | Checks |
|---|---|
pre-commit |
Format, Clippy, Tests |
pre-push |
Full test suite, Release build |
Enable with: git config core.hooksPath .github/hooks
| Feature | grep/ripgrep | Ξ½grΞ΅p |
|---|---|---|
| Search type | Exact text / regex | Semantic meaning |
| "auth" finds "authentication" | β | β |
| "error handling" finds try/catch | β | β |
| Speed | Instant | 30-100ms |
| Setup | None | Model download |
| Feature | Cloud Tools | Ξ½grΞ΅p |
|---|---|---|
| Privacy | Code sent to servers | 100% local |
| Cost | API fees | Free |
| Offline | β | β |
| Latency | 200-500ms | 30-100ms |
| Rate limits | Yes | None |
# Check if port is in use
netstat -an | grep 7777
# Try different port
vgrep serve -p 8080# Use server mode for batch embeddings
vgrep serve &
vgrep index# Manual download
vgrep models download --force
# Check disk space
df -h ~/.cache/huggingface# Reduce threads
vgrep config set n-threads 2
# Use quantized model (default)See CONTRIBUTING.md for development guidelines.
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Run tests (
cargo test) - Run lints (
cargo fmt && cargo clippy) - Submit Pull Request
Apache 2.0 - see LICENSE
Ξ½grΞ΅p - Search code by meaning
Built with π¦ Rust and powered by llama.cpp