tiny-rag

A lightweight RAG (Retrieval-Augmented Generation) package that runs entirely on CPU without requiring GPUs or cloud services.

Overview

tiny-rag is designed to make RAG accessible to everyone by removing the typical hardware and service dependencies. While traditional RAG systems require GPU-accelerated embedding models and reranker models (like OpenAI's text-embedding services), tiny-rag provides a fully functional RAG implementation that runs efficiently on CPU-only environments.

tiny-rag leverages static-embedding-japanese, an ultra-fast embedding model that achieves 126x faster CPU inference compared to traditional transformer models while maintaining strong performance on Japanese text tasks.

Key Features

CPU-Only Execution: No GPU required - runs on any standard computer
No Cloud Dependencies: Fully offline operation without external API calls
Lightweight: Minimal resource footprint for embedding and reranking
Easy to Use: Simple API for quick integration into your projects
Cost-Effective: No cloud service fees or expensive hardware requirements
Japanese Language Support: Currently optimized for Japanese text processing

Language Support

⚠️ Important: tiny-rag currently supports Japanese language only. Support for additional languages is planned for future releases.

Installation

pip install tiny-rag

Quick Start

from tiny_rag import TinyRAG

# Initialize tiny-rag
rag = TinyRAG()

# Add documents (Japanese text)
rag.add_documents([
    "ドキュメント1の内容...",
    "ドキュメント2の内容...",
])

# Query (in Japanese)
results = rag.query("あなたの質問をここに入力")

Use Cases

Local Development: Test RAG pipelines without cloud costs
Edge Computing: Deploy RAG on resource-constrained devices
Privacy-Sensitive Applications: Keep all data processing local
Educational Projects: Learn RAG concepts without infrastructure overhead
Japanese Text Processing: Optimized for Japanese language applications

Technical Details

Embedding Model

tiny-rag uses static-embedding-japanese, which provides:

Ultra-fast CPU Performance: 126x faster inference than comparable transformer models
1024-dimensional embeddings: Can be reduced to 32-512 dimensions for efficiency
Simple Architecture: Uses token embedding averaging instead of complex attention mechanisms
Strong Benchmark Performance: JMTEB score of 67.17 (micro-average)
Matryoshka Representation Learning: Enables efficient dimension reduction without retraining

Reranker Model

For improved retrieval accuracy, tiny-rag employs japanese-reranker-tiny-v2:

Ultra-Compact: Only 29.4M parameters with 3 layers
Blazing Fast: 50-65% faster query processing than larger models
Good Performance: Average score of 0.8138 on Japanese benchmarks
CPU-Optimized: Specifically designed for CPU and Apple Silicon
Modern Architecture: Based on ModernBERT with 256 hidden dimensions
Strong Benchmark Results: JaCWIR (0.9287), JSQuAD (0.9608), MIRACL (0.7201)

Both models are specifically chosen for their exceptional CPU performance while maintaining high-quality results for Japanese text processing.

Benchmark Performance

tiny-rag has been evaluated on standard Japanese retrieval benchmarks to demonstrate its effectiveness:

Datasets

JQaRA (Japanese Question Answering with Retrieval Augmentation): 144,372 documents
JaCWIR (Japanese Casual Web IR): 513,107 documents

Results

Dataset	NDCG@10	MRR@10	MAP@10	Hits@10	Avg Query Time
JQaRA	0.8553	0.8796	-	-	0.771 sec
JaCWIR	-	-	0.8368	0.8646	0.925 sec

Running Benchmarks

# Quick test with 5 queries per dataset
make bench

# Full benchmark evaluation
make bench-full

# Custom benchmark
python -m bench.benchmark --dimensions 512 --max-queries 10

Benchmark Options

--dimensions: Embedding dimensions (32, 64, 128, 256, 512, 1024) - Default: 1024
--max-queries: Maximum queries per dataset (for testing)

Performance Insights

High Accuracy: Achieves 0.84-0.88 scores across all metrics
Practical Speed: Query processing under 1 second even for large datasets
Scalable: Performance scales reasonably with dataset size
CPU-Friendly: All processing runs efficiently on standard hardware

Requirements

Python 3.13+
No GPU required
Minimal RAM requirements

Development

Setup

# Clone the repository
git clone https://github.com/sonesuke/tiny-rag.git
cd tiny-rag

# Install development dependencies
uv sync

# Run tests
uv run pytest --cov=src

# Run benchmarks
make bench       # Quick test (5 queries)
make bench-full  # Full evaluation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

sonesuke - GitHub

Acknowledgments

Built with a focus on accessibility and efficiency, making RAG technology available to everyone regardless of hardware limitations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiny-rag

Overview

Key Features

Language Support

Installation

Quick Start

Use Cases

Technical Details

Embedding Model

Reranker Model

Benchmark Performance

Datasets

Results

Running Benchmarks

Benchmark Options

Performance Insights

Requirements

Development

Setup

Contributing

License

Author

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

tiny-rag

Overview

Key Features

Language Support

Installation

Quick Start

Use Cases

Technical Details

Embedding Model

Reranker Model

Benchmark Performance

Datasets

Results

Running Benchmarks

Benchmark Options

Performance Insights

Requirements

Development

Setup

Contributing

License

Author

Acknowledgments