A semantic code indexing service that augments AI coding assistants with intelligent, contextually-relevant codebase information through Model Context Protocol (MCP) integration.
- Semantic Code Parsing: Tree-sitter integration for accurate AST-based code analysis
- Vector Embeddings: Advanced semantic search using sentence transformers
- MCP Integration: Direct integration with Claude Code and other AI assistants
- Multi-Language Support: Python, JavaScript, TypeScript, Go, Rust, Java
- Incremental Indexing: Efficient updates based on file change detection
- Performance Optimized: Quantized embeddings, batch processing, <200ms search latency
# Clone the repository
git clone <repository-url>
cd snipr
# Install dependencies
uv sync
# Install development dependencies (optional)
uv sync --extra devuv run python -m src.mainAdd to your Claude Code MCP configuration:
{
"mcpServers": {
"code-indexer": {
"command": "uv",
"args": ["run", "python", "-m", "src.main"],
"cwd": "/path/to/snipr",
"env": {
"INDEX_CACHE_DIR": ".index_cache",
"ENABLE_QUANTIZATION": "true"
}
}
}
}Use the index_codebase_tool in Claude Code:
# Index entire codebase
await index_codebase("/path/to/your/project")
# Index specific languages
await index_codebase("/path/to/your/project", languages="python,javascript")
# Exclude patterns
await index_codebase("/path/to/your/project", exclude_patterns="**/node_modules/**,**/.git/**")# Semantic search
await search_code("authentication logic", language="python")
# Search by code type
await search_by_type("function_definition", language="python")
# Search within specific file
await search_in_file("/path/to/file.py", "error handling")
# Get indexing statistics
await get_search_stats()- IndexingService: Tree-sitter parsing and code chunk extraction
- SearchService: Vector embeddings and semantic search
- MCP Tools: FastMCP-based tool implementations
- Configuration: Environment-based settings management
- CodeChunk: Represents indexed code segments with metadata
- SearchRequest/Response: API contracts for search operations
- IndexingRequest/Response: API contracts for indexing operations
| Language | Extension | Tree-sitter Support |
|---|---|---|
| Python | .py |
β Full support |
| JavaScript | .js |
β Full support |
| TypeScript | .ts |
π Auto-detected |
| Go | .go |
π Auto-detected |
| Rust | .rs |
π Auto-detected |
| Java | .java |
π Auto-detected |
# Cache directory for index storage
INDEX_CACHE_DIR=".index_cache"
# Enable/disable embedding generation
EMBEDDING_ENABLED="true"
# Enable quantized embeddings (8x memory reduction)
ENABLE_QUANTIZATION="true"
# Maximum file size for indexing (MB)
MAX_FILE_SIZE_MB="5"
# Embedding model for semantic search
EMBEDDING_MODEL="all-MiniLM-L6-v2"
# Batch size for embedding generation
EMBEDDING_BATCH_SIZE="32"
# Device for model inference (cpu or cuda)
# Default is cpu for better compatibility
DEVICE="cpu"- Quantization: Reduces memory usage by 8x with minimal accuracy loss
- Batch Processing: Configurable batch sizes for optimal throughput
- File Filtering: Automatic exclusion of binary files and large files
- Incremental Updates: Only re-index changed files
# Run all tests
uv run python -m pytest src/ -v
# Run with coverage
uv run python -m pytest src/ -v --cov=src --cov-report=term-missing
# Run specific test file
uv run python -m pytest src/services/tests/test_indexing_service.py -v# Lint code
uv run ruff check src/ --fix
# Type checking
uv run mypy src/
# Format code
uv run black src/src/
βββ models/ # Pydantic data models
β βββ indexing_models.py
βββ services/ # Core business logic
β βββ indexing_service.py
β βββ search_service.py
βββ tools/ # MCP tool implementations
β βββ index_codebase.py
β βββ search_code.py
βββ config.py # Configuration management
βββ main.py # FastMCP server entry point
Index a codebase for semantic search.
Parameters:
codebase_path(string): Absolute path to codebase rootlanguages(string, optional): Comma-separated languages to indexexclude_patterns(string, optional): Comma-separated glob patterns to exclude
Search for semantically similar code chunks.
Parameters:
query(string): Natural language or code querylanguage(string, optional): Filter by programming languagemax_results(number): Maximum results (1-100)similarity_threshold(number): Minimum similarity score (0.0-1.0)
Search for specific code constructs (functions, classes, etc.).
Parameters:
semantic_type(string): Code construct typelanguage(string, optional): Filter by programming languagemax_results(number): Maximum results
Search within a specific file.
Parameters:
file_path(string): Absolute path to filequery(string): Search querymax_results(number): Maximum results
Get current indexing status for a codebase.
Parameters:
codebase_path(string): Absolute path to codebase root
Clear all indexing data and start fresh.
Get comprehensive indexing and search statistics.
- Search Latency: <200ms for typical queries
- Memory Usage: 8x reduction with quantization enabled
- Index Size: ~5MB per 10k lines of code
- Supported Files: Up to 5MB per file (configurable)
If you encounter Tree-sitter import errors:
# Install specific language parsers
uv add tree-sitter-python tree-sitter-javascriptEnable quantization for large codebases:
export ENABLE_QUANTIZATION="true"Reduce batch size for lower memory usage:
export BATCH_SIZE="16"The tool defaults to CPU mode for better compatibility. To use GPU:
export DEVICE="cuda"To force CPU mode (default):
export DEVICE="cpu"- Python 3.11+
- uv package manager
- Tree-sitter language parsers
- sentence-transformers (optional, for semantic search)
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Run tests:
uv run python -m pytest src/ -v - Submit a pull request
Built with β€οΈ for AI-powered development workflows.