A curated list of reranking models, libraries, and resources for RAG applications.
Rerankers take a query and retrieved documents and reorder them by relevance. They use cross-encoders to jointly encode query-document pairs, which is slower than vector search but more accurate. Typical pipeline: retrieve 50-100 candidates with vector search, rerank to top 3-5.
- What are Rerankers?
- Top Models Comparison
- Quick Start
- Open Source Models
- Commercial APIs
- Libraries & Frameworks
- RAG Framework Integrations
- Datasets & Benchmarks
- Evaluation Metrics
- Research Papers
- Tutorials & Resources
- Tools & Utilities
- Reranker Leaderboard
- Related Awesome Lists
Rerankers refine search results by re-scoring query-document pairs. Key differences from vector search:
Vector search (bi-encoders):
- Encodes query and documents separately
- Fast (pre-computed embeddings)
- Returns 50-100 candidates
Reranking (cross-encoders):
- Jointly encodes query + document
- Slower but more accurate
- Refines to top 3-5 results
Types: Pointwise (score each doc independently), pairwise (compare pairs), listwise (score entire list)
| Model | Type | Multilingual | Deployment | Best For |
|---|---|---|---|---|
| Cohere Rerank | API | 100+ languages | Cloud | Production, easy start |
| Voyage Rerank 2.5 | API | English-focused | Cloud | Highest accuracy |
| Jina Reranker v2 | API/OSS | 100+ languages | Cloud/Self-host | Balance cost/quality |
| BGE-Reranker-v2-m3 | Open Source | 100+ languages | Self-host | Free, multilingual |
| mxbai-rerank-large-v2 | Open Source | English | Self-host | Best OSS accuracy |
| FlashRank | Open Source | Limited | Self-host | Lightweight, CPU-only |
→ View Full Benchmarks & Leaderboard - Live comparison of rerankers on production benchmarks including NDCG@10, latency, and cost metrics. Updated regularly with new models and real-world performance data.
5-Minute Setup:
# Option 1: Cohere API (easiest)
from cohere import Client
client = Client("your-api-key")
results = client.rerank(
query="What is deep learning?",
documents=["Doc 1...", "Doc 2..."],
model="rerank-v3.5",
top_n=3
)
# Option 2: Self-hosted (free)
from sentence_transformers import CrossEncoder
model = CrossEncoder('BAAI/bge-reranker-v2-m3')
scores = model.predict([
["What is deep learning?", "Doc 1..."],
["What is deep learning?", "Doc 2..."]
])Choosing a Reranker: For help selecting the best reranker for your use case, check out Best Reranker for RAG: We tested the top models where we break down consistency, accuracy, and performance across top models.
Cross-encoders jointly encode query and document pairs for accurate relevance scoring.
BGE-Reranker (GitHub)
- bge-reranker-base - 278M params, fast
- bge-reranker-large - 560M params, high accuracy
- bge-reranker-v2-m3 - 568M params, multilingual (100+ languages)
- bge-reranker-v2-gemma - Gemma architecture
Jina Reranker v2 (HuggingFace)
- 1024 token context, 100+ languages, code search support
Mixedbread AI
- mxbai-rerank-base-v2 - 0.5B params (Qwen-2.5)
- mxbai-rerank-large-v2 - 1.5B params, top BEIR scores
MS MARCO Models
- ms-marco-MiniLM-L-12-v2 - Efficient
- ms-marco-TinyBERT-L-6 - Ultra-lightweight
Sequence-to-sequence models leveraging T5 architecture for text ranking.
- MonoT5 - Pointwise T5-base reranker fine-tuned on MS MARCO, scores documents independently.
- DuoT5 - Pairwise T5-3B reranker for comparing document pairs with O(n²) complexity.
- RankT5 - T5 variant fine-tuned with specialized ranking losses for improved performance.
- PyTerrier T5 - T5-based reranking models integrated with PyTerrier IR platform.
Large language models adapted for reranking tasks with zero-shot or few-shot capabilities.
- RankLLM - Unified framework supporting RankVicuna, RankZephyr, and RankGPT with vLLM/SGLang/TensorRT-LLM integration.
- RankGPT - Zero-shot listwise reranking using GPT-3.5/GPT-4 with permutation generation.
- LiT5 - Listwise reranking model based on T5 architecture.
- RankVicuna - Vicuna LLM fine-tuned for ranking tasks.
- RankZephyr - Zephyr-based model optimized for reranking.
Production-ready reranking services with enterprise support and scalability.
- Cohere Rerank - Leading reranking API with multilingual support (100+ languages) and "Nimble" variant for low latency.
- Voyage AI Rerank - Instruction-following rerankers (rerank-2.5/rerank-2.5-lite) with 200M free tokens.
- Jina AI Reranker API - Cloud-hosted Jina reranker models with pay-as-you-go pricing.
- Pinecone Rerank - Integrated reranking service within Pinecone's vector database platform.
- Mixedbread AI Reranker API - API access to mxbai-rerank models with competitive pricing.
- NVIDIA NeMo Retriever - Enterprise-grade reranking optimized for NVIDIA hardware.
- rerankers - Lightweight Python library providing unified API for all major reranking models (FlashRank, Cohere, RankGPT, cross-encoders).
- FlashRank - Ultra-lite (~4MB) reranking library with zero torch/transformers dependencies, supports CPU inference.
- Sentence-Transformers - Popular library for training and using cross-encoder reranking models.
- rank-llm - Python package for listwise and pairwise reranking with LLMs.
- FlagEmbedding - BAAI's comprehensive toolkit for embeddings and reranking, includes BGE models and training code.
- PyTerrier - Information retrieval platform with extensive reranking support and experimentation tools.
Node postprocessors and document transformers for reranking in LangChain pipelines.
- Cohere Reranker - Official Cohere integration using ContextualCompressionRetriever.
- FlashRank Reranker - Lightweight reranking without heavy dependencies.
- RankLLM Reranker - LLM-based listwise reranking for LangChain.
- Cross Encoder Reranker - Hugging Face cross-encoder models integration.
- Pinecone Rerank - Native Pinecone reranking support.
- VoyageAI Reranker - Voyage AI models for document reranking.
Postprocessor modules for enhancing retrieval in LlamaIndex query engines.
- CohereRerank - Top-N reranking using Cohere's API.
- SentenceTransformerRerank - Cross-encoder reranking from sentence-transformers.
- LLMRerank - Uses LLMs to score and reorder retrieved nodes.
- JinaRerank - Jina AI reranker integration.
- RankLLM Rerank - RankLLM models as postprocessors.
- NVIDIA Rerank - NVIDIA NeMo Retriever integration.
Ranker components for deepset's Haystack framework.
- CohereRanker - Semantic reranking with Cohere models.
- SentenceTransformersRanker - Cross-encoder based reranking.
- JinaRanker - Jina reranker models for Haystack pipelines.
- MixedbreadAIRanker - Mixedbread AI reranker integration.
- LostInTheMiddleRanker - Optimizes document ordering to combat the "lost in the middle" phenomenon.
- MS MARCO - Large-scale passage and document ranking datasets with real Bing queries.
- MS MARCO Passage Ranking - 8.8M passages with 500k+ training queries for passage retrieval.
- MS MARCO Document Ranking - 3.2M documents for full document ranking tasks.
- BEIR - Heterogeneous benchmark with 18 diverse datasets for zero-shot evaluation.
- TREC Deep Learning Track - High-quality test collections (TREC-DL-2019, TREC-DL-2020) for passage/document ranking.
- TREC-DL-2019 - 200 queries with dense relevance judgments.
- TREC-DL-2020 - 200 queries with expanded corpus coverage.
- Natural Questions - Google's dataset of real user questions for QA and retrieval.
- SciRerankBench - Specialized benchmark for scientific document reranking.
- BEIR Benchmark - Zero-shot evaluation across 18 retrieval tasks (NQ, HotpotQA, FiQA, ArguAna, etc.).
- MTEB Reranking - Massive Text Embedding Benchmark including reranking tasks.
Key metrics for assessing reranker performance:
- NDCG (Normalized Discounted Cumulative Gain) - Standard metric emphasizing top results, commonly reported as NDCG@10.
- MRR (Mean Reciprocal Rank) - Measures average inverse rank of first relevant result, used by MS MARCO (MRR@10).
- MAP (Mean Average Precision) - Average precision across all relevant documents.
- Recall@K - Percentage of relevant documents in top-K results.
- Precision@K - Proportion of relevant documents in top-K results.
- Document Ranking with a Pretrained Sequence-to-Sequence Model (2020) - Introduces MonoT5 and DuoT5 for text ranking.
- BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models (2021) - Establishes BEIR benchmark suite.
- Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents (2023) - Introduces RankGPT for zero-shot LLM reranking.
- RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs (2024) - Unified framework for context ranking and answer generation.
- A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE (March 2024) - Comprehensive evaluation on TREC-DL and BEIR showing traditional cross-encoders remain competitive against GPT-4 while being more efficient.
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders (April 2024, ECIR 2025) - Novel cross-encoder architecture with inter-passage attention for efficient listwise reranking, achieving state-of-the-art results while maintaining permutation invariance.
- Don't Forget to Connect! Improving RAG with Graph-based Reranking (May 2024) - Introduces G-RAG, a GNN-based reranker that leverages document connections and semantic graphs, outperforming state-of-the-art approaches with smaller computational footprint.
- CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks (September 2024) - Novel joint ranking approach achieving 4x lower latency than standard cross-encoders while maintaining state-of-the-art accuracy through Ranking Probability Loss.
- Efficient Re-ranking with Cross-encoders via Early Exit (2024, SIGIR 2025) - Introduces early exit mechanisms for cross-encoders to improve inference efficiency without sacrificing accuracy.
- FIRST: Faster Improved Listwise Reranking with Single Token Decoding (June 2024) - Accelerates LLM reranking inference by 50% using output logits of first generated identifier while maintaining robust performance across BEIR benchmark.
- InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking (June 2025) - Demonstrates consistent gains by injecting BM25 scores into zero-shot listwise prompts across Gemini, GPT-4, and Deepseek models.
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking (October 2024) - Agentic reranker using Chain-of-Thought reasoning with query analysis, document analysis, and relevance judgment steps, excelling on BRIGHT benchmark.
- Do Large Language Models Favor Recent Content? A Study on Recency Bias in LLM-Based Reranking (September 2024, SIGIR-AP 2025) - Reveals significant recency bias across GPT and LLaMA models, with fresh passages promoted by up to 95 ranks and date injection reversing 25% of preferences.
- HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse (April 2025) - Achieves 2-3x throughput improvement for decoder-only rerankers through KV-cache reuse while maintaining high generation quality.
- DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation (May 2025, NeurIPS 2025) - RL-based agent that dynamically adjusts both order and number of retrieved documents, achieving state-of-the-art results across seven knowledge-intensive datasets.
- SciRerankBench: Benchmarking Rerankers Towards Scientific RAG-LLMs (August 2025) - Specialized benchmark for scientific document reranking with emphasis on effectiveness-efficiency tradeoffs.
- Rank1: Test-Time Compute for Reranking in Information Retrieval (February 2025, CoLM 2025) - First reranking model leveraging test-time compute with reasoning traces, distilled from R1/o1 models with 600K+ examples, achieving state-of-the-art on reasoning tasks.
- How Good are LLM-based Rerankers? An Empirical Analysis (August 2025) - Comprehensive empirical evaluation comparing state-of-the-art LLM reranking approaches across multiple benchmarks and dimensions.
- The Evolution of Reranking Models in Information Retrieval: From Heuristic Methods to Large Language Models (December 2024) - Comprehensive survey tracing reranking evolution from cross-encoders to LLM-based approaches, covering architectures and training objectives.
- C-Pack: Packaged Resources To Advance General Chinese Embedding (2023) - Introduces BGE reranking model family and training methodologies.
- Pretrained Transformers for Text Ranking: BERT and Beyond (2020) - Survey of neural ranking models.
- Neural Models for Information Retrieval (2017) - Foundational survey of neural IR approaches.
- Top 7 Rerankers for RAG - Analytics Vidhya's comparison of leading reranking models.
- Comprehensive Guide on Reranker for RAG - In-depth tutorial on implementing rerankers in RAG systems.
- Improving RAG Accuracy with Rerankers - Practical guide with implementation examples.
- Mastering RAG: How to Select A Reranking Model - Selection criteria and comparison framework.
- Boosting RAG: Picking the Best Embedding & Reranker Models - LlamaIndex guide with benchmarks.
- Advanced RAG: Evaluating Reranker Models using LlamaIndex - Step-by-step evaluation tutorial.
- Enhancing Advanced RAG Systems Using Reranking with LangChain - LangChain implementation patterns.
- Training and Finetuning Reranker Models with Sentence Transformers v4 - Official Hugging Face training guide.
- Fine-Tuning Re-Ranking Models for LLM-Based Search - Domain-specific fine-tuning techniques.
- Implementing Rerankers in Your AI Workflows - n8n's practical workflow tutorial.
- Cohere Rerank on LangChain Integration Guide - Official Cohere tutorial.
- Rerankers in RAG - Conceptual overview of reranking in RAG pipelines.
- Sentence Embeddings: Cross-encoders and Re-ranking - Technical deep-dive into cross-encoder architectures.
- The Four Types of Passage Reranker in RAG - Classification and comparison of reranker types.
- RAG in 2025: From Quick Fix to Core Architecture - Industry trends and best practices.
- Boosting Your Search and RAG with Voyage's Rerankers - Voyage AI's technical blog.
- ranx - Fast IR evaluation library supporting NDCG, MAP, MRR, and more.
- ir-measures - Comprehensive IR metrics library with TREC integration.
- MTEB - Massive Text Embedding Benchmark for systematic evaluation.
- Haystack Studio - Visual pipeline builder with reranking components.
- LangSmith - Debugging and monitoring for LangChain pipelines including rerankers.
- AutoRAG - Automated RAG optimization including reranker selection.
- Text Embeddings Visualization - TensorFlow's embedding projector for understanding model behavior.
- Phoenix - LLM observability platform with retrieval tracing.
📊 View Live Leaderboard - Compare rerankers using ELO scoring, nDCG@10, latency, and cost
Models ranked by head-to-head GPT-5 comparisons across financial, scientific, and essay datasets.
Current Leaders (as of Nov 2025):
- Zerank 2 - Wins most head-to-head matchups, highest consistency across domains
- Cohere Rerank 4 Pro - Second best win rate, strong performance on complex queries
- Voyage AI Rerank 2.5 - Balanced accuracy and response times
Rankings update as new models are evaluated.
- Awesome RAG - Comprehensive RAG resources and frameworks.
- Awesome LLM - Large Language Models resources and tools.
- Awesome Information Retrieval - IR papers, datasets, and tools.
- Awesome Embedding Models - Vector embeddings and similarity search.
- Awesome Neural Search - Neural search and dense retrieval resources.
- Awesome Vector Search - Vector databases and search engines.
Contributions are welcome! Please read the contribution guidelines first.
To add a new item:
- Search previous suggestions before making a new one
- Make an individual pull request for each suggestion
- Use the following format:
**[Name](link)** - Description. - New categories or improvements to the existing categorization are welcome
- Keep descriptions concise and informative
- Check your spelling and grammar
- Make sure your text editor is set to remove trailing whitespace
To the extent possible under law, the contributors have waived all copyright and related rights to this work.