Skip to content
/ vgrep Public

[🧬] vgrep: a privacy-first, fully local semantic search engine that uses vector embeddings to understand meaning, not just keywords. It runs entirely on your machine, indexes your data locally, and lets you search code, documents, or text by semantic similarity β€” fast, offline, and without sending anything to external services.

License

Notifications You must be signed in to change notification settings

CortexLM/vgrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ξ½grΞ΅p

β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— 
β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β•β• 
 β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘     
  β•šβ•β•β•β•   β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•     

Search code by meaning, not just keywords. 100% offline. Zero cloud dependencies.

CI License GitHub stars Rust Discord


Installation

curl -fsSL https://vgrep.dev/install.sh | sh

Or with wget:

wget -qO- https://vgrep.dev/install.sh | sh

After installation, initialize vgrep:

vgrep init
vgrep models download

Introduction

Ξ½grΞ΅p is a semantic code search tool that uses local LLM embeddings to find code by intent rather than exact text matches. Unlike traditional grep which searches for literal strings, Ξ½grΞ΅p understands the meaning behind your query and finds semantically related code across your entire codebase.

Quick Start: vgrep init && vgrep serve then vgrep "where is authentication handled?"

Key Features

  • Semantic Search: Find code by intent - search "error handling" to find try/catch blocks, Result types, and exception handlers
  • 100% Local: All processing happens on your machine using llama.cpp - no API keys, no cloud, your code stays private
  • Server Mode: Keep models loaded in memory for instant sub-100ms searches
  • File Watcher: Automatically re-index files as they change
  • Cross-Platform: Native binaries for Windows, Linux, and macOS
  • GPU Acceleration: Optional CUDA, Metal, and Vulkan support for faster embeddings

System Overview

Ξ½grΞ΅p uses a client-server architecture optimized for fast repeated searches:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              USER QUERIES                                    β”‚
β”‚                        "where is auth handled?"                              β”‚
β”‚                        "database connection logic"                           β”‚
β”‚                        "error handling patterns"                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Ξ½grΞ΅p CLIENT                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚   Search    β”‚  β”‚    Index    β”‚  β”‚    Watch    β”‚  β”‚   Config    β”‚        β”‚
β”‚  β”‚  Command    β”‚  β”‚   Command   β”‚  β”‚   Command   β”‚  β”‚   Editor    β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚ HTTP API
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Ξ½grΞ΅p SERVER                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Embedding Engine (llama.cpp)                       β”‚  β”‚
β”‚  β”‚              Qwen3-Embedding-0.6B β€’ Always Loaded β€’ Fast              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                     SQLite Vector Database                            β”‚  β”‚
β”‚  β”‚         File Hashes β€’ Code Chunks β€’ Embeddings β€’ Metadata             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Processing Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Source  │───▢│  Chunk   │───▢│  Embed   │───▢│  Store   │───▢│  Search  β”‚
β”‚  Files   β”‚    β”‚  (512b)  β”‚    β”‚  (LLM)   β”‚    β”‚ (SQLite) β”‚    β”‚ (Cosine) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚               β”‚               β”‚               β”‚               β”‚
     β–Ό               β–Ό               β–Ό               β–Ό               β–Ό
  .rs .py        Split into      Generate       Vector DB       Similarity
  .js .ts        overlapping     768-dim        with fast       ranking +
  .go .c         text chunks     vectors        retrieval       results

Installation

From Source

# Prerequisites: Rust 1.75+, LLVM/Clang, CMake
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
cargo build --release

# Binary at target/release/vgrep

GPU Acceleration

cargo build --release --features cuda    # NVIDIA GPUs
cargo build --release --features metal   # Apple Silicon
cargo build --release --features vulkan  # Cross-platform GPU

System Requirements

Component Minimum Recommended
RAM 2 GB 4+ GB
Disk 1 GB (models) 2+ GB
CPU 4 cores 8+ cores
GPU Optional CUDA/Metal for 10x speedup

Quick Start

1. Initialize

# Download models and create config (~1GB download)
vgrep init
vgrep models download

2. Start Server

# Keep this running - loads model once for fast searches
vgrep serve

Output:

  >>> vgrep server
  Server: http://127.0.0.1:7777
  
  Loading embedding model...
  Model loaded successfully!
  
  Endpoints:
    β€’ GET  /health   - Health check
    β€’ GET  /status   - Index status
    β€’ POST /search   - Semantic search
    β€’ POST /embed    - Generate embeddings
  
  β†’ Press Ctrl+C to stop

3. Index & Watch

# In another terminal - index and auto-update on changes
vgrep watch

Output:

  >>> vgrep watcher
  Path: /home/user/myproject
  Mode: server

  Ctrl+C to stop

──────────────────────────────────────────────────

  >> Initial indexing...
  Phase 1: Reading files...
    Read 45 files, 312 chunks
  Phase 2: Generating embeddings via server...
    Generated 312 embeddings
  Phase 3: Storing in database...
    Stored 45 files

  Indexing complete!
    Files: 45 indexed, 12 skipped
    Chunks: 312

──────────────────────────────────────────────────

  [~] Watching for changes...

  [+] indexed auth.rs
  [+] indexed db.rs

4. Search

# Semantic search - finds by meaning
vgrep "where is authentication handled?"
vgrep "database connection pooling"
vgrep "error handling for network requests"

Output:

  Searching for: where is authentication handled?

  1. ./src/auth/middleware.rs (87.3%)
  2. ./src/handlers/login.rs (82.1%)
  3. ./src/utils/jwt.rs (76.8%)
  4. ./src/config/security.rs (71.2%)

  β†’ Found 4 results in 45ms

Commands

Search

Command Description
vgrep "query" Quick semantic search
vgrep search "query" -m 20 Search with max 20 results
vgrep search "query" -c Show code snippets in results
vgrep search "query" --sync Re-index before searching

Server & Indexing

Command Description
vgrep serve Start server (keeps model loaded)
vgrep serve -p 8080 Custom port
vgrep index Manual one-time index
vgrep index --force Force re-index all files
vgrep watch Watch and auto-index on changes
vgrep status Show index statistics

Configuration

Command Description
vgrep config Interactive configuration editor
vgrep config show Display all settings
vgrep config set mode local Set config value
vgrep config reset Reset to defaults

Models

Command Description
vgrep init Initialize vgrep
vgrep models download Download embedding models
vgrep models list Show configured models

Agent Integrations

Ξ½grΞ΅p supports assisted installation for popular coding agents:

vgrep install <agent>     # Install integration
vgrep uninstall <agent>   # Remove integration
Agent Command
Claude Code vgrep install claude-code
OpenCode vgrep install opencode
Codex vgrep install codex
Factory Droid vgrep install droid

Usage with Claude Code

vgrep install claude-code
vgrep serve   # Start server
vgrep watch   # Index your project
# Claude Code can now use vgrep for semantic search

Usage with Factory Droid

vgrep install droid
# vgrep auto-starts when you begin a Droid session

To uninstall: vgrep uninstall <agent> (e.g., vgrep uninstall droid).


How It Works

Embedding Generation

Ξ½grΞ΅p converts code into high-dimensional vectors that capture semantic meaning:

Input:  "fn authenticate(user: &str, pass: &str) -> Result<Token>"
        ↓
        Tokenize β†’ Qwen3-Embedding β†’ Normalize
        ↓
Output: [0.023, -0.156, 0.891, ..., 0.045]  (768 dimensions)

Similarity Search

Queries are embedded and compared using cosine similarity:

$$\text{similarity}(q, d) = \frac{q \cdot d}{|q| |d|} = \frac{\sum_{i=1}^{n} q_i d_i}{\sqrt{\sum_{i=1}^{n} q_i^2} \sqrt{\sum_{i=1}^{n} d_i^2}}$$

Where:

  • $q$ = query embedding vector
  • $d$ = document (code chunk) embedding vector
  • Result in range $[-1, 1]$, higher = more similar

Chunking Strategy

Files are split into overlapping chunks for granular search:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Source File                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Chunk 1 (512 chars)                                 β”‚
β”‚          β”œβ”€β”€ Overlap (64 chars) ───                β”‚
β”‚                    Chunk 2 (512 chars)              β”‚
β”‚                             β”œβ”€β”€ Overlap ───        β”‚
β”‚                                      Chunk 3 ...    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Chunk Size: 512 characters (configurable)
  • Overlap: 64 characters to preserve context at boundaries
  • Deduplication: Results grouped by file, best chunk shown

Configuration

Config Location

Platform Path
Linux ~/.vgrep/config.json
macOS ~/.vgrep/config.json
Windows C:\Users\<user>\.vgrep\config.json

Settings

Setting Default Description
mode server server (recommended) or local
server_host 127.0.0.1 Server bind address
server_port 7777 Server port
max_results 10 Default search results
max_file_size 524288 Max file size to index (512KB)
chunk_size 512 Characters per chunk
chunk_overlap 64 Overlap between chunks
n_threads 0 CPU threads (0 = auto)
use_reranker true Enable result reranking

Environment Variables

All settings can be overridden via environment:

VGREP_HOST=0.0.0.0      # Bind to all interfaces
VGREP_PORT=8080         # Custom port
VGREP_MAX_RESULTS=20    # More results
VGREP_CONTENT=true      # Always show snippets

Server API

Ξ½grΞ΅p server exposes a REST API for programmatic access:

Endpoints

Endpoint Method Description
/health GET Health check
/status GET Index statistics
/search POST Semantic search
/embed POST Generate single embedding
/embed_batch POST Batch embeddings

Search Example

curl -X POST http://127.0.0.1:7777/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication middleware",
    "max_results": 5
  }'

Response:

{
  "results": [
    {
      "path": "/project/src/auth/middleware.rs",
      "score": 0.873,
      "score_percent": "87.3%",
      "preview": "pub async fn auth_middleware...",
      "start_line": 15,
      "end_line": 45
    }
  ],
  "query": "authentication middleware",
  "total": 1
}

Project Structure

vgrep/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ cli/              # Command-line interface
β”‚   β”‚   β”œβ”€β”€ commands.rs   # CLI argument handling
β”‚   β”‚   └── interactive.rs # Config editor
β”‚   β”œβ”€β”€ core/             # Core functionality
β”‚   β”‚   β”œβ”€β”€ db.rs         # SQLite vector storage
β”‚   β”‚   β”œβ”€β”€ embeddings.rs # llama.cpp integration
β”‚   β”‚   β”œβ”€β”€ indexer.rs    # File chunking & indexing
β”‚   β”‚   └── search.rs     # Similarity search
β”‚   β”œβ”€β”€ server/           # HTTP server
β”‚   β”‚   β”œβ”€β”€ api.rs        # Axum endpoints
β”‚   β”‚   └── client.rs     # HTTP client
β”‚   β”œβ”€β”€ ui/               # User interface
β”‚   β”‚   β”œβ”€β”€ console.rs    # Colored output
β”‚   β”‚   └── search_tui.rs # Interactive TUI
β”‚   β”œβ”€β”€ config.rs         # Configuration
β”‚   β”œβ”€β”€ watcher.rs        # File system watcher
β”‚   β”œβ”€β”€ lib.rs            # Library root
β”‚   └── main.rs           # Entry point
β”œβ”€β”€ tests/                # Integration tests
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ workflows/        # CI/CD (test, build, release)
β”‚   └── hooks/            # Git hooks (pre-commit, pre-push)
└── scripts/              # Development utilities

Models

Ξ½grΞ΅p uses quantized models from HuggingFace for efficient local inference:

Model Size Purpose
Qwen3-Embedding-0.6B-Q8_0 ~600 MB Text β†’ Vector embeddings
Qwen3-Reranker-0.6B-Q4_K_M ~400 MB Result reranking (optional)

Models are downloaded to ~/.cache/huggingface/ and cached automatically.


Performance

Optimization Tips

  1. Use Server Mode: 10-50x faster than local mode for repeated searches
  2. Enable GPU: CUDA/Metal provides 5-10x speedup for embedding generation
  3. Watch Mode: Auto-indexes only changed files, not entire codebase
  4. Tune Chunk Size: Larger chunks = fewer embeddings but less granular results

Development

Setup

# Clone with submodules (llama.cpp)
git clone https://github.com/CortexLM/vgrep.git
cd vgrep

# Setup git hooks
./scripts/setup-hooks.sh   # Unix
./scripts/setup-hooks.ps1  # Windows

# Build
cargo build

# Test
cargo test

# Lint
cargo clippy --all-targets --all-features
cargo fmt --check

Git Hooks

Pre-commit and pre-push hooks ensure code quality:

Hook Checks
pre-commit Format, Clippy, Tests
pre-push Full test suite, Release build

Enable with: git config core.hooksPath .github/hooks


Comparison

vs Traditional Grep

Feature grep/ripgrep Ξ½grΞ΅p
Search type Exact text / regex Semantic meaning
"auth" finds "authentication" ❌ βœ…
"error handling" finds try/catch ❌ βœ…
Speed Instant 30-100ms
Setup None Model download

vs Cloud Semantic Search (mgrep, etc.)

Feature Cloud Tools Ξ½grΞ΅p
Privacy Code sent to servers 100% local
Cost API fees Free
Offline ❌ βœ…
Latency 200-500ms 30-100ms
Rate limits Yes None

Troubleshooting

Server won't start

# Check if port is in use
netstat -an | grep 7777

# Try different port
vgrep serve -p 8080

Slow indexing

# Use server mode for batch embeddings
vgrep serve &
vgrep index

Model download fails

# Manual download
vgrep models download --force

# Check disk space
df -h ~/.cache/huggingface

Out of memory

# Reduce threads
vgrep config set n-threads 2

# Use quantized model (default)

Contributing

See CONTRIBUTING.md for development guidelines.

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing)
  3. Run tests (cargo test)
  4. Run lints (cargo fmt && cargo clippy)
  5. Submit Pull Request

License

Apache 2.0 - see LICENSE


Ξ½grΞ΅p - Search code by meaning

Built with πŸ¦€ Rust and powered by llama.cpp

Report Bug Β· Request Feature

About

[🧬] vgrep: a privacy-first, fully local semantic search engine that uses vector embeddings to understand meaning, not just keywords. It runs entirely on your machine, indexes your data locally, and lets you search code, documents, or text by semantic similarity β€” fast, offline, and without sending anything to external services.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages