YARP (Yet Another RAG Pipeline) is a lightweight, high-performance Python library focused on in-memory vector database operations with Approximate Nearest Neighbor (ANN) search. Built for fast document similarity search and retrieval-augmented generation (RAG) applications.
- Fast In-Memory Vector Search: Uses Annoy (Spotify's ANN library) for lightning-fast similarity search
- Hybrid Scoring: Combines semantic similarity (via sentence transformers) with lexical similarity (Levenshtein distance)
- Easy Document Management: Add, delete, and update documents dynamically
- Persistence: Save and load your vector indices to/from disk
- Lightweight: Minimal dependencies, maximum performance
- Configurable: Adjustable similarity metrics, tree counts, and scoring weights
- Type Safe: Built with Pydantic models for reliable data handling
Default installation does not automatically install
sentence-transformers. Please installpython-yarp[cpu]orpython-yarp[gpu]depending on acceleration type.
uv add python-yarpOr with pip:
pip install python-yarpTo enable GPU acceleration and install GPU-specific dependencies (PyTorch and sentence-transformers):
uv add python-yarp[gpu]Or with pip:
pip install 'python-yarp[gpu]'For a leaner installation that installs PyTorch CPU-only wheel without NVIDIA CUDA dependencies:
uv add python-yarp[cpu]Or with pip:
pip install 'python-yarp[cpu]'This option is ideal for:
- CPU-only environments
- Docker containers without GPU support
- Systems where you want to minimize package size
- Development environments that don't require GPU acceleration
git clone https://github.com/regmibijay/yarp.git
cd yarp
uv sync --devfrom yarp import LocalMemoryIndex
# Initialize with your documents
documents = [
"The cat sat on the mat",
"Python programming language",
"Machine learning with transformers",
"Natural language processing",
"Vector similarity search"
]
# Create and build the index
index = LocalMemoryIndex(documents, model_name="all-MiniLM-L6-v2")
index.process()
# Search for similar documents
results = index.query("programming languages", top_k=3)
# Access results
for result in results:
print(f"Document: {result.document}")
print(f"Score: {result.matching_score:.2f}%")
print("---")from yarp import LocalMemoryIndex
# Initialize index
index = LocalMemoryIndex(documents)
index.process(num_trees=256, metrics_type="angular")
# Query with custom weights
results = index.query(
"machine learning algorithms",
top_k=5,
weight_semantic=0.7, # 70% semantic similarity
weight_levenshtein=0.3, # 30% lexical similarity
search_k=100 # Search more candidates for better accuracy
)
# Invert results (lowest to highest scores)
inverted_results = results.invert(inplace=False)# Add new documents
index.add("New document about artificial intelligence")
index.add(["Multiple", "documents", "at once"])
# Delete documents
index.delete("The cat sat on the mat")
# Query updated index
results = index.query("AI and machine learning")# Save index to disk
index.backup("/path/to/backup/directory")
# Load index from disk
loaded_index = LocalMemoryIndex.load("/path/to/backup/directory")
# Continue using loaded index
results = loaded_index.query("your query here")The main class for creating and managing vector indices.
LocalMemoryIndex(documents: List[str], model_name: str = "all-MiniLM-L6-v2")- documents: List of text documents to index
- model_name: SentenceTransformer model name for embeddings
Build the vector index with specified parameters.
- num_trees: Number of trees in Annoy index (more trees = better accuracy, slower build)
- metrics_type: Distance metric ("angular", "euclidean", "manhattan", "hamming", "dot")
query(q: str, top_k: int = 5, weight_semantic: float = 0.5, weight_levenshtein: float = 0.5, search_k: int = 50)
Search for similar documents.
- q: Query string
- top_k: Number of results to return
- weight_semantic: Weight for semantic similarity (0.0-1.0)
- weight_levenshtein: Weight for lexical similarity (0.0-1.0)
- search_k: Number of candidates to search (higher = better accuracy)
Returns LocalMemorySearchResult object.
Add new documents to the index. Automatically rebuilds the index.
Remove a document from the index. Automatically rebuilds the index.
Save the index and metadata to disk.
Class method to load an index from disk.
Container for search results with built-in iteration and sorting capabilities.
class LocalMemorySearchResult(BaseModel):
results: List[LocalMemorySearchResultEntry]
def __iter__(self):
"""Iterate over results"""
def invert(self, inplace: bool = True):
"""Reverse sort order of results"""Individual search result entry.
class LocalMemorySearchResultEntry(BaseModel):
document: str # The matched document
matching_score: float # Similarity score (0-100%)- Document Similarity Search: Find similar documents in large collections
- RAG Applications: Retrieve relevant context for language model prompts
- Content Recommendation: Recommend similar articles, products, or content
- Semantic Search: Search beyond exact keyword matching
- Duplicate Detection: Find near-duplicate documents with hybrid scoring
- Question Answering: Retrieve relevant passages for Q&A systems
YARP is optimized for speed and memory efficiency:
- Fast Indexing: Efficient embedding generation and Annoy index building
- Quick Queries: Sub-millisecond search times for most datasets
- Memory Efficient: Stores embeddings in optimized Annoy format
- Scalable: Tested with thousands of documents
| Operation | Small (10 docs) | Medium (100 docs) | Large (1K docs) |
|---|---|---|---|
| Index Build | <1s | ~3s | ~15s |
| Query Time | <1ms | <5ms | <10ms |
| Memory Usage | ~10MB | ~50MB | ~200MB |
Benchmarks run on standard laptop with all-MiniLM-L6-v2 model
Choose from various SentenceTransformer models based on your needs:
# Lightweight and fast
index = LocalMemoryIndex(docs, model_name="all-MiniLM-L6-v2")
# Better accuracy, slower
index = LocalMemoryIndex(docs, model_name="all-mpnet-base-v2")
# Multilingual support
index = LocalMemoryIndex(docs, model_name="paraphrase-multilingual-MiniLM-L12-v2")- angular: Cosine similarity (default, good for text)
- euclidean: L2 distance
- manhattan: L1 distance
- dot: Dot product similarity
- num_trees: Higher values increase accuracy but slow down indexing
- search_k: Higher values increase query accuracy but slow down search
- weight_semantic/weight_levenshtein: Balance between semantic and lexical matching
YARP provides specific exception types for different error conditions:
from yarp.exceptions import (
LocalMemoryTreeNotBuildException,
LocalMemoryBadRequestException
)
try:
results = index.query("test query")
except LocalMemoryTreeNotBuildException:
print("Index not built yet - call process() first")
except LocalMemoryBadRequestException as e:
print(f"Invalid request: {e}")Run the test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=yarp
# Run only fast tests (skip integration)
pytest -m "not slow"
# Run integration tests
pytest -m integrationWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/regmibijay/yarp.git
cd yarp
# Install in development mode with dev dependencies
uv sync --dev
# For CPU-only development environments (optional)
# uv sync --dev --extra cpu
# Install pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytestThis project is licensed under the MIT License - see the LICENSE file for details.
- Annoy - Spotify's approximate nearest neighbor library
- Sentence Transformers - State-of-the-art sentence embeddings
- Levenshtein - Fast string distance calculations
- Support for more embedding models (OpenAI, Cohere, etc.)
- Batch query operations
- Distributed index support
- Integration with popular vector databases
- Web API interface
- Advanced filtering capabilities
- Documentation: YARP Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- My Blog: Blog
Made with β€οΈ for the Python community