YARP - Yet Another RAG Pipeline

YARP (Yet Another RAG Pipeline) is a lightweight, high-performance Python library focused on in-memory vector database operations with Approximate Nearest Neighbor (ANN) search. Built for fast document similarity search and retrieval-augmented generation (RAG) applications.

🚀 Key Features

Fast In-Memory Vector Search: Uses Annoy (Spotify's ANN library) for lightning-fast similarity search
Hybrid Scoring: Combines semantic similarity (via sentence transformers) with lexical similarity (Levenshtein distance)
Easy Document Management: Add, delete, and update documents dynamically
Persistence: Save and load your vector indices to/from disk
Lightweight: Minimal dependencies, maximum performance
Configurable: Adjustable similarity metrics, tree counts, and scoring weights
Type Safe: Built with Pydantic models for reliable data handling

📦 Installation

Standard Installation

Default installation does not automatically install sentence-transformers. Please install python-yarp[cpu] or python-yarp[gpu] depending on acceleration type.

uv add python-yarp

Or with pip:

pip install python-yarp

GPU Support

To enable GPU acceleration and install GPU-specific dependencies (PyTorch and sentence-transformers):

uv add python-yarp[gpu]

Or with pip:

pip install 'python-yarp[gpu]'

CPU-Only Installation (Recommended for systems without GPU)

For a leaner installation that installs PyTorch CPU-only wheel without NVIDIA CUDA dependencies:

uv add python-yarp[cpu]

Or with pip:

pip install 'python-yarp[cpu]'

This option is ideal for:

CPU-only environments
Docker containers without GPU support
Systems where you want to minimize package size
Development environments that don't require GPU acceleration

Development Installation

git clone https://github.com/regmibijay/yarp.git
cd yarp
uv sync --dev

🔧 Quick Start

Basic Usage

from yarp import LocalMemoryIndex

# Initialize with your documents
documents = [
    "The cat sat on the mat",
    "Python programming language", 
    "Machine learning with transformers",
    "Natural language processing",
    "Vector similarity search"
]

# Create and build the index
index = LocalMemoryIndex(documents, model_name="all-MiniLM-L6-v2")
index.process()

# Search for similar documents
results = index.query("programming languages", top_k=3)

# Access results
for result in results:
    print(f"Document: {result.document}")
    print(f"Score: {result.matching_score:.2f}%")
    print("---")

Advanced Usage with Hybrid Scoring

from yarp import LocalMemoryIndex

# Initialize index
index = LocalMemoryIndex(documents)
index.process(num_trees=256, metrics_type="angular")

# Query with custom weights
results = index.query(
    "machine learning algorithms",
    top_k=5,
    weight_semantic=0.7,      # 70% semantic similarity
    weight_levenshtein=0.3,   # 30% lexical similarity
    search_k=100             # Search more candidates for better accuracy
)

# Invert results (lowest to highest scores)
inverted_results = results.invert(inplace=False)

Document Management

# Add new documents
index.add("New document about artificial intelligence")
index.add(["Multiple", "documents", "at once"])

# Delete documents  
index.delete("The cat sat on the mat")

# Query updated index
results = index.query("AI and machine learning")

Persistence

# Save index to disk
index.backup("/path/to/backup/directory")

# Load index from disk
loaded_index = LocalMemoryIndex.load("/path/to/backup/directory")

# Continue using loaded index
results = loaded_index.query("your query here")

📖 API Reference

LocalMemoryIndex

The main class for creating and managing vector indices.

Constructor

LocalMemoryIndex(documents: List[str], model_name: str = "all-MiniLM-L6-v2")

documents: List of text documents to index
model_name: SentenceTransformer model name for embeddings

Methods

`process(num_trees: int = 128, metrics_type: str = "angular")`

Build the vector index with specified parameters.

num_trees: Number of trees in Annoy index (more trees = better accuracy, slower build)
metrics_type: Distance metric ("angular", "euclidean", "manhattan", "hamming", "dot")

`query(q: str, top_k: int = 5, weight_semantic: float = 0.5, weight_levenshtein: float = 0.5, search_k: int = 50)`

Search for similar documents.

q: Query string
top_k: Number of results to return
weight_semantic: Weight for semantic similarity (0.0-1.0)
weight_levenshtein: Weight for lexical similarity (0.0-1.0)
search_k: Number of candidates to search (higher = better accuracy)

Returns LocalMemorySearchResult object.

`add(documents: str | List[str])`

Add new documents to the index. Automatically rebuilds the index.

`delete(document: str)`

Remove a document from the index. Automatically rebuilds the index.

`backup(path: str)`

Save the index and metadata to disk.

`load(path: str, model_name: str = "all-MiniLM-L6-v2")`

Class method to load an index from disk.

Data Models

LocalMemorySearchResult

Container for search results with built-in iteration and sorting capabilities.

class LocalMemorySearchResult(BaseModel):
    results: List[LocalMemorySearchResultEntry]
    
    def __iter__(self):
        """Iterate over results"""
        
    def invert(self, inplace: bool = True):
        """Reverse sort order of results"""

LocalMemorySearchResultEntry

Individual search result entry.

class LocalMemorySearchResultEntry(BaseModel):
    document: str           # The matched document
    matching_score: float   # Similarity score (0-100%)

🎯 Use Cases

Document Similarity Search: Find similar documents in large collections
RAG Applications: Retrieve relevant context for language model prompts
Content Recommendation: Recommend similar articles, products, or content
Semantic Search: Search beyond exact keyword matching
Duplicate Detection: Find near-duplicate documents with hybrid scoring
Question Answering: Retrieve relevant passages for Q&A systems

⚡ Performance

YARP is optimized for speed and memory efficiency:

Fast Indexing: Efficient embedding generation and Annoy index building
Quick Queries: Sub-millisecond search times for most datasets
Memory Efficient: Stores embeddings in optimized Annoy format
Scalable: Tested with thousands of documents

Benchmarks

Operation	Small (10 docs)	Medium (100 docs)	Large (1K docs)
Index Build	<1s	~3s	~15s
Query Time	<1ms	<5ms	<10ms
Memory Usage	~10MB	~50MB	~200MB

Benchmarks run on standard laptop with all-MiniLM-L6-v2 model

🛠️ Configuration

Model Selection

Choose from various SentenceTransformer models based on your needs:

# Lightweight and fast
index = LocalMemoryIndex(docs, model_name="all-MiniLM-L6-v2")

# Better accuracy, slower
index = LocalMemoryIndex(docs, model_name="all-mpnet-base-v2")

# Multilingual support
index = LocalMemoryIndex(docs, model_name="paraphrase-multilingual-MiniLM-L12-v2")

Distance Metrics

angular: Cosine similarity (default, good for text)
euclidean: L2 distance
manhattan: L1 distance
dot: Dot product similarity

Tuning Parameters

num_trees: Higher values increase accuracy but slow down indexing
search_k: Higher values increase query accuracy but slow down search
weight_semantic/weight_levenshtein: Balance between semantic and lexical matching

🚦 Error Handling

YARP provides specific exception types for different error conditions:

from yarp.exceptions import (
    LocalMemoryTreeNotBuildException,
    LocalMemoryBadRequestException
)

try:
    results = index.query("test query")
except LocalMemoryTreeNotBuildException:
    print("Index not built yet - call process() first")
except LocalMemoryBadRequestException as e:
    print(f"Invalid request: {e}")

🧪 Testing

Run the test suite:

# Run all tests
pytest

# Run with coverage
pytest --cov=yarp

# Run only fast tests (skip integration)
pytest -m "not slow"

# Run integration tests
pytest -m integration

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Development Setup

# Clone the repository
git clone https://github.com/regmibijay/yarp.git
cd yarp

# Install in development mode with dev dependencies
uv sync --dev

# For CPU-only development environments (optional)
# uv sync --dev --extra cpu

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Annoy - Spotify's approximate nearest neighbor library
Sentence Transformers - State-of-the-art sentence embeddings
Levenshtein - Fast string distance calculations

📈 Roadmap

Support for more embedding models (OpenAI, Cohere, etc.)
Batch query operations
Distributed index support
Integration with popular vector databases
Web API interface
Advanced filtering capabilities

📞 Support

Documentation: YARP Documentation
Issues: GitHub Issues
Discussions: GitHub Discussions
My Blog: Blog

Made with ❤️ for the Python community

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
tests		tests
yarp		yarp
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build_docs.py		build_docs.py
pyproject.toml		pyproject.toml
update_version.py		update_version.py
uv.lock		uv.lock

License

regmibijay/yarp

Folders and files

Latest commit

History

Repository files navigation

YARP - Yet Another RAG Pipeline

🚀 Key Features

📦 Installation

Standard Installation

GPU Support

CPU-Only Installation (Recommended for systems without GPU)

Development Installation

🔧 Quick Start

Basic Usage

Advanced Usage with Hybrid Scoring

Document Management

Persistence

📖 API Reference

LocalMemoryIndex

Constructor

Methods

process(num_trees: int = 128, metrics_type: str = "angular")

query(q: str, top_k: int = 5, weight_semantic: float = 0.5, weight_levenshtein: float = 0.5, search_k: int = 50)

add(documents: str | List[str])

delete(document: str)

backup(path: str)

load(path: str, model_name: str = "all-MiniLM-L6-v2")

Data Models

LocalMemorySearchResult

LocalMemorySearchResultEntry

🎯 Use Cases

⚡ Performance

Benchmarks

🛠️ Configuration

Model Selection

Distance Metrics

Tuning Parameters

🚦 Error Handling

🧪 Testing

🤝 Contributing

Quick Development Setup

📄 License

🙏 Acknowledgments

📈 Roadmap

📞 Support

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

`process(num_trees: int = 128, metrics_type: str = "angular")`

`query(q: str, top_k: int = 5, weight_semantic: float = 0.5, weight_levenshtein: float = 0.5, search_k: int = 50)`

`add(documents: str | List[str])`

`delete(document: str)`

`backup(path: str)`

`load(path: str, model_name: str = "all-MiniLM-L6-v2")`

Packages