๐ CLI-first semantic code search with MCP integration
โ ๏ธ Alpha Release (v0.12.7): This is an early-stage project under active development. Expect breaking changes and rough edges. Feedback and contributions are welcome!
A modern, fast, and intelligent code search tool that understands your codebase through semantic analysis and AST parsing. Built with Python, powered by ChromaDB, and designed for developer productivity.
- Semantic Search: Find code by meaning, not just keywords
- AST-Aware Parsing: Understands code structure (functions, classes, methods)
- Multi-Language Support: 8 languages - Python, JavaScript, TypeScript, Dart/Flutter, PHP, Ruby, HTML, and Markdown/Text (with extensible architecture)
- Real-time Indexing: File watching with automatic index updates
- Automatic Version Tracking: Smart reindexing on tool upgrades
- Local-First: Complete privacy with on-device processing
- Zero Configuration: Auto-detects project structure and languages
- CLI-First Design: Simple commands for immediate productivity
- Rich Output: Syntax highlighting, similarity scores, context
- Fast Performance: Sub-second search responses, efficient indexing
- Modern Architecture: Async-first, type-safe, modular design
- Semi-Automatic Reindexing: Multiple strategies without daemon processes
- Vector Database: ChromaDB with connection pooling for 13.6% performance boost
- Embedding Models: Configurable sentence transformers
- Smart Reindexing: Search-triggered, Git hooks, scheduled tasks, and manual options
- Extensible Parsers: Plugin architecture for new languages
- Configuration Management: Project-specific settings
- Production Ready: Connection pooling, auto-indexing, comprehensive error handling
# Install from PyPI (recommended)
pip install mcp-vector-search
# Or with UV (faster)
uv pip install mcp-vector-search
# Or install from source
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search
uv sync && uv pip install -e .Verify Installation:
# Check that all dependencies are installed correctly
mcp-vector-search doctor
# Should show all โ marks
# If you see missing dependencies, try:
pip install --upgrade mcp-vector-searchThe fastest way to get started - completely hands-off, just one command:
# Smart zero-config setup (recommended)
mcp-vector-search setupWhat setup does automatically:
- โ Detects your project's languages and file types
- โ Initializes semantic search with optimal settings
- โ Indexes your entire codebase
- โ Configures ALL installed MCP platforms (Claude Code, Cursor, etc.)
- โ
Uses native Claude CLI integration (
claude mcp add) when available - โ
Falls back to
.mcp.jsonif Claude CLI not available - โ Sets up file watching for auto-reindex
- โ Zero user input required!
Behind the scenes:
- Server name:
mcp(for consistency with other MCP projects) - Command:
uv run python -m mcp_vector_search.mcp.server {PROJECT_ROOT} - File watching: Enabled via
MCP_ENABLE_FILE_WATCHING=true - Integration method: Native
claude mcp add(or.mcp.jsonfallback)
Example output:
๐ Smart Setup for mcp-vector-search
๐ Detecting project...
โ
Found 3 language(s): Python, JavaScript, TypeScript
โ
Detected 8 file type(s)
โ
Found 2 platform(s): claude-code, cursor
โ๏ธ Configuring...
โ
Embedding model: sentence-transformers/all-MiniLM-L6-v2
๐ Initializing...
โ
Vector database created
โ
Configuration saved
๐ Indexing codebase...
โ
Indexing completed in 12.3s
๐ Configuring MCP integrations...
โ
Using Claude CLI for automatic setup
โ
Registered with Claude CLI
โ
Configured 2 platform(s)
๐ Setup Complete!
Options:
# Force re-setup
mcp-vector-search setup --force
# Verbose output for debugging (shows Claude CLI commands)
mcp-vector-search setup --verboseFor more control over the installation process:
# Manual setup with MCP integration
mcp-vector-search install --with-mcp
# Custom file extensions
mcp-vector-search install --extensions .py,.js,.ts,.dart
# Skip automatic indexing
mcp-vector-search install --no-auto-index
# Just initialize (no indexing or MCP)
mcp-vector-search initAutomatic (Recommended):
# One command sets up all detected platforms
mcp-vector-search setupManual Platform Installation:
# Add Claude Code integration (project-scoped)
mcp-vector-search install claude-code
# Add Cursor IDE integration (global)
mcp-vector-search install cursor
# See all available platforms
mcp-vector-search install listNote: The setup command uses native claude mcp add when Claude CLI is available, providing better integration than manual .mcp.json creation.
# Remove specific platform
mcp-vector-search uninstall claude-code
# Remove all integrations
mcp-vector-search uninstall --all
# List configured integrations
mcp-vector-search uninstall list# Search your code
mcp-vector-search search "authentication logic"
mcp-vector-search search "database connection setup"
mcp-vector-search search "error handling patterns"
# Index your codebase (if not done during setup)
mcp-vector-search index
# Check project status
mcp-vector-search status
# Start file watching (auto-update index)
mcp-vector-search watchThe CLI includes intelligent command suggestions for typos:
# Typos are automatically detected and corrected
$ mcp-vector-search serach "auth"
No such command 'serach'. Did you mean 'search'?
$ mcp-vector-search indx
No such command 'indx'. Did you mean 'index'?See docs/guides/cli-usage.md for more details.
This project uses semantic versioning with an automated release workflow.
make version-show- Display current versionmake release-patch- Create patch releasemake publish- Publish to PyPI
See docs/development/versioning.md for complete documentation.
# One command to do everything (recommended)
mcp-vector-search setup
# What it does automatically:
# - Detects project languages and file types
# - Initializes semantic search
# - Indexes entire codebase
# - Configures all detected MCP platforms
# - Sets up file watching
# - Zero configuration needed!
# Force re-setup
mcp-vector-search setup --force
# Verbose output for debugging
mcp-vector-search setup --verboseKey Features:
- Zero Configuration: No user input required
- Smart Detection: Automatically discovers languages and platforms
- Comprehensive: Handles init + index + MCP setup in one command
- Idempotent: Safe to run multiple times
- Fast: Timeout-protected scanning (won't hang on large projects)
- Team-Friendly: Commit
.mcp.jsonto share configuration
When to use:
- โ First-time project setup
- โ Team onboarding
- โ Quick testing in new codebases
- โ Setting up multiple MCP platforms at once
# Manual setup with more control
mcp-vector-search install
# Install with all MCP integrations
mcp-vector-search install --with-mcp
# Custom file extensions
mcp-vector-search install --extensions .py,.js,.ts
# Skip automatic indexing
mcp-vector-search install --no-auto-index
# Platform-specific MCP integration
mcp-vector-search install claude-code # Project-scoped
mcp-vector-search install cursor # Global
mcp-vector-search install windsurf # Global
mcp-vector-search install vscode # Global
# List available platforms
mcp-vector-search install listWhen to use:
- Use
installwhen you need fine-grained control over extensions, models, or MCP platforms - Use
setupfor quick, zero-config onboarding (recommended)
# Remove specific platform
mcp-vector-search uninstall claude-code
# Remove all integrations
mcp-vector-search uninstall --all
# List configured integrations
mcp-vector-search uninstall list
# Skip backup creation
mcp-vector-search uninstall claude-code --no-backup
# Alias (same as uninstall)
mcp-vector-search remove claude-code# Basic initialization (no indexing or MCP)
mcp-vector-search init
# Custom configuration
mcp-vector-search init --extensions .py,.js,.ts --embedding-model sentence-transformers/all-MiniLM-L6-v2
# Force re-initialization
mcp-vector-search init --forceNote: For most users, use setup instead of init. The init command is for advanced users who want manual control.
# Index all files
mcp-vector-search index
# Index specific directory
mcp-vector-search index /path/to/code
# Force re-indexing
mcp-vector-search index --force
# Reindex entire project
mcp-vector-search index reindex
# Reindex entire project (explicit)
mcp-vector-search index reindex --all
# Reindex entire project without confirmation
mcp-vector-search index reindex --force
# Reindex specific file
mcp-vector-search index reindex path/to/file.py# Basic search
mcp-vector-search search "function that handles user authentication"
# Adjust similarity threshold
mcp-vector-search search "database queries" --threshold 0.7
# Limit results
mcp-vector-search search "error handling" --limit 10
# Search in specific context
mcp-vector-search search similar "path/to/function.py:25"# Setup all auto-indexing strategies
mcp-vector-search auto-index setup --method all
# Setup specific strategies
mcp-vector-search auto-index setup --method git-hooks
mcp-vector-search auto-index setup --method scheduled --interval 60
# Check for stale files and auto-reindex
mcp-vector-search auto-index check --auto-reindex --max-files 10
# View auto-indexing status
mcp-vector-search auto-index status
# Remove auto-indexing setup
mcp-vector-search auto-index teardown --method all# Start watching for changes
mcp-vector-search watch
# Check watch status
mcp-vector-search watch status
# Enable/disable watching
mcp-vector-search watch enable
mcp-vector-search watch disable# Basic status
mcp-vector-search status
# Detailed information
mcp-vector-search status --verbose# View configuration
mcp-vector-search config show
# Update settings
mcp-vector-search config set similarity_threshold 0.8
mcp-vector-search config set embedding_model microsoft/codebert-base
# Configure indexing behavior
mcp-vector-search config set skip_dotfiles true # Skip dotfiles (default)
mcp-vector-search config set respect_gitignore true # Respect .gitignore (default)
# Get specific setting
mcp-vector-search config get skip_dotfiles
mcp-vector-search config get respect_gitignore
# List available models
mcp-vector-search config models
# List all configuration keys
mcp-vector-search config list-keysAutomatic connection pooling provides 13.6% performance improvement with zero configuration:
# Automatically enabled for high-throughput scenarios
from mcp_vector_search.core.database import PooledChromaVectorDatabase
database = PooledChromaVectorDatabase(
max_connections=10, # Pool size
min_connections=2, # Warm connections
max_idle_time=300.0, # 5 minutes
)Multiple strategies to keep your index up-to-date without daemon processes:
- Search-Triggered: Automatically checks for stale files during searches
- Git Hooks: Triggers reindexing after commits, merges, checkouts
- Scheduled Tasks: System-level cron jobs or Windows tasks
- Manual Checks: On-demand via CLI commands
- Periodic Checker: In-process periodic checks for long-running apps
# Setup all strategies
mcp-vector-search auto-index setup --method all
# Check status
mcp-vector-search auto-index statusProjects are configured via .mcp-vector-search/config.json:
{
"project_root": "/path/to/project",
"file_extensions": [".py", ".js", ".ts"],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"similarity_threshold": 0.75,
"languages": ["python", "javascript", "typescript"],
"watch_files": true,
"cache_embeddings": true,
"skip_dotfiles": true,
"respect_gitignore": true
}skip_dotfiles (default: true)
- Controls whether files and directories starting with "." are skipped during indexing
- Whitelisted directories are always indexed regardless of this setting:
.github/- GitHub workflows and actions.gitlab-ci/- GitLab CI configuration.circleci/- CircleCI configuration
- When
false: All dotfiles are indexed (subject to gitignore rules ifrespect_gitignoreistrue)
respect_gitignore (default: true)
- Controls whether
.gitignorepatterns are respected during indexing - When
false: Files in.gitignoreare indexed (subject toskip_dotfilesif enabled)
Default Behavior (Recommended for most projects):
# Skip dotfiles AND respect .gitignore
mcp-vector-search config set skip_dotfiles true
mcp-vector-search config set respect_gitignore trueIndex Everything (Useful for deep code analysis):
# Index all files including dotfiles and gitignored files
mcp-vector-search config set skip_dotfiles false
mcp-vector-search config set respect_gitignore falseIndex Dotfiles but Respect .gitignore:
# Index configuration files but skip build artifacts
mcp-vector-search config set skip_dotfiles false
mcp-vector-search config set respect_gitignore trueSkip Dotfiles but Ignore .gitignore:
# Useful when you want to index files in .gitignore but skip hidden config files
mcp-vector-search config set skip_dotfiles true
mcp-vector-search config set respect_gitignore false- Parser Registry: Extensible system for language-specific parsing
- Semantic Indexer: Efficient code chunking and embedding generation
- Vector Database: ChromaDB integration for similarity search
- File Watcher: Real-time monitoring and incremental updates
- CLI Interface: Rich, user-friendly command-line experience
MCP Vector Search supports 8 programming languages with full semantic search capabilities:
| Language | Extensions | Status | Features |
|---|---|---|---|
| Python | .py, .pyw |
โ Full | Functions, classes, methods, docstrings |
| JavaScript | .js, .jsx, .mjs |
โ Full | Functions, classes, JSDoc, ES6+ syntax |
| TypeScript | .ts, .tsx |
โ Full | Interfaces, types, generics, decorators |
| Dart | .dart |
โ Full | Functions, classes, widgets, async, dartdoc |
| PHP | .php, .phtml |
โ Full | Classes, methods, traits, PHPDoc, Laravel patterns |
| Ruby | .rb, .rake, .gemspec |
โ Full | Modules, classes, methods, RDoc, Rails patterns |
| HTML | .html, .htm |
โ Full | Semantic content extraction, heading hierarchy, text chunking |
| Text/Markdown | .txt, .md, .markdown |
โ Basic | Semantic chunking for documentation |
Planned Languages:
| Language | Status | Features |
|---|---|---|
| Java | ๐ Planned | Classes, methods, annotations |
| Go | ๐ Planned | Functions, structs, interfaces |
| Rust | ๐ Planned | Functions, structs, traits |
HTML Support (Unreleased):
- Semantic Extraction: Content from h1-h6, p, section, article, main, aside, nav, header, footer
- Intelligent Chunking: Based on heading hierarchy (h1-h6)
- Context Preservation: Maintains class and id attributes for searchability
- Script/Style Filtering: Ignores non-content elements
- Use Cases: Static sites, documentation, web templates, HTML fragments
Dart/Flutter Support (v0.4.15):
- Widget Detection: StatelessWidget, StatefulWidget recognition
- State Classes: Automatic parsing of
_WidgetNameStatepatterns - Async Support: Future and async function handling
- Dartdoc: Triple-slash comment extraction
- Tree-sitter AST: Fast, accurate parsing with regex fallback
PHP Support (v0.5.0):
- Class Detection: Classes, interfaces, traits
- Method Extraction: Public, private, protected, static methods
- Magic Methods: __construct, __get, __set, __call, etc.
- PHPDoc: Full comment extraction
- Laravel Patterns: Controllers, Models, Eloquent support
- Tree-sitter AST: Fast parsing with regex fallback
Ruby Support (v0.5.0):
- Module/Class Detection: Full namespace support (::)
- Method Extraction: Instance and class methods
- Special Syntax: Method names with ?, ! support
- Attribute Macros: attr_accessor, attr_reader, attr_writer
- RDoc: Comment extraction (# and =begin...=end)
- Rails Patterns: ActiveRecord, Controllers support
- Tree-sitter AST: Fast parsing with regex fallback
We welcome contributions! Please see our Contributing Guide for details.
# Clone the repository
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search
# Install development environment (includes dependencies + editable install)
make dev
# Test CLI from source (recommended during development)
./dev-mcp version # Shows [DEV] indicator
./dev-mcp search "test" # No reinstall needed after code changes
# Run tests and quality checks
make test-unit # Run unit tests
make quality # Run linting and type checking
make fix # Auto-fix formatting issues
# View all available targets
make helpFor detailed development workflow and dev-mcp usage, see the Development section below.
- Create a new parser in
src/mcp_vector_search/parsers/ - Extend the
BaseParserclass - Register the parser in
parsers/registry.py - Add tests and documentation
- Indexing Speed: ~1000 files/minute (typical Python project)
- Search Latency: <100ms for most queries
- Memory Usage: ~50MB baseline + ~1MB per 1000 code chunks
- Storage: ~1KB per code chunk (compressed embeddings)
- Tree-sitter Integration: Currently using regex fallback parsing (Tree-sitter setup needs improvement)
- Search Relevance: Embedding model may need tuning for code-specific queries
- Error Handling: Some edge cases may not be gracefully handled
- Documentation: API documentation is minimal
- Testing: Limited test coverage, needs real-world validation
We're actively seeking feedback on:
- Search Quality: How relevant are the search results for your codebase?
- Performance: How does indexing and search speed feel in practice?
- Usability: Is the CLI interface intuitive and helpful?
- Language Support: Which languages would you like to see added next?
- Features: What functionality is missing for your workflow?
Please open an issue or start a discussion to share your experience!
- Core CLI interface
- Python/JS/TS parsing
- ChromaDB integration
- File watching
- Basic search functionality
- Real-world testing and feedback
- Bug fixes and stability improvements
- Performance optimizations
- Advanced search modes (contextual, similar code)
- Additional language support (Java, Go, Rust)
- Configuration improvements
- Comprehensive testing suite
- Documentation improvements
- MCP server implementation
- IDE extensions (VS Code, JetBrains)
- Git integration
- Team collaboration features
- Production-ready performance
Stage A: Local Development & Testing
# Setup development environment
make dev
# Run development tests
make test-unit
# Run CLI from source (recommended during development)
./dev-mcp version # Visual [DEV] indicator
./dev-mcp status # Any command works
./dev-mcp search "auth" # Immediate feedback on changes
# Run quality checks
make quality
# Alternative: use uv run directly
uv run mcp-vector-search versionThe ./dev-mcp script provides a streamlined way to run the CLI from source code during development, eliminating the need for repeated installations.
Key Features:
- Visual [DEV] Indicator: Shows
[DEV]prefix to distinguish from installed version - No Reinstall Required: Reflects code changes immediately
- Complete Argument Forwarding: Works with all CLI commands and options
- Verbose Mode: Debug output with
--verboseflag - Built-in Help: Script usage with
--help
Usage Examples:
# Basic commands (note the [DEV] prefix in output)
./dev-mcp version
./dev-mcp status
./dev-mcp index
./dev-mcp search "authentication logic"
# With CLI options
./dev-mcp search "error handling" --limit 10
./dev-mcp index --force
# Script verbose mode (shows Python interpreter, paths)
./dev-mcp --verbose search "database"
# Script help (shows dev-mcp usage, not CLI help)
./dev-mcp --help
# CLI command help (forwards --help to the CLI)
./dev-mcp search --help
./dev-mcp index --helpWhen to Use:
./dev-mcpโ Development workflow (runs from source code)mcp-vector-searchโ Production usage (runs installed version via pipx/pip)
Benefits:
- Instant Feedback: Changes to source code are reflected immediately
- No Build Step: Skip the reinstall cycle during active development
- Clear Context: Visual
[DEV]indicator prevents confusion about which version is running - Error Handling: Built-in checks for uv installation and project structure
Requirements:
- Must have
uvinstalled (pip install uv) - Must run from project root directory
- Requires
pyproject.tomlin current directory
Stage B: Local Deployment Testing
# Build and test clean deployment
./scripts/deploy-test.sh
# Test on other projects
cd ~/other-project
mcp-vector-search init && mcp-vector-search indexStage C: PyPI Publication
# Publish to PyPI
./scripts/publish.sh
# Verify published version
pip install mcp-vector-search --upgrade./scripts/workflow.sh # Show workflow overviewSee DEVELOPMENT.md for detailed development instructions.
For comprehensive documentation, see docs/index.md - the complete documentation hub.
- Installation Guide - Complete installation instructions
- First Steps - Quick start tutorial
- Configuration - Basic configuration
- Searching Guide - Master semantic code search
- Indexing Guide - Indexing strategies and optimization
- CLI Usage - Advanced CLI features
- MCP Integration - AI tool integration
- File Watching - Real-time index updates
- CLI Commands - Complete command reference
- Configuration Options - All configuration settings
- Features - Feature overview
- Architecture - System architecture
- Contributing - How to contribute
- Testing - Testing guide
- Code Quality - Linting and formatting
- API Reference - Internal API docs
- Deployment - Release and deployment guide
- Troubleshooting - Common issues and solutions
- Performance - Performance optimization
- Extending - Adding new features
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
MIT License - see LICENSE file for details.
- ChromaDB for vector database
- Tree-sitter for parsing infrastructure
- Sentence Transformers for embeddings
- Typer for CLI framework
- Rich for beautiful terminal output
Built with โค๏ธ for developers who love efficient code search