🏆 RAG Evaluation Suite

A comprehensive, asynchronous, and framework-agnostic library for evaluating Retrieval-Augmented Generation (RAG) pipelines.

RAGscope provides a complete, end-to-end workflow for RAG evaluation—from automatically synthesizing a high-quality test set from your own documents to running a suite of sophisticated, AI-powered diagnostic metrics.

✨ Key Features

Comprehensive "RAG Triad" Metrics: Evaluate Context Relevance, Faithfulness, and Answer Relevance.
Advanced Diagnostics: Includes Answer Completeness to pinpoint why an answer is strong or weak.
Automated Test Case Generation: Use the built-in Data Synthesizer to create test cases from any document.
High-Performance Async Pipeline: Powered by asyncio for fast, parallel execution.
Framework-Agnostic: Works with any RAG system—LangChain, LlamaIndex, or plain Python.
Flexible Judge Model: Supports GPT-4o, Claude 3 Opus, or local models (e.g., via Ollama).

🚀 Quick Start

1. Installation

pip install ragscope

2. Run Your First Evaluation

Create a Python script using the RAGEvaluator:

import asyncio
from ragscope import RAGEvaluator
from ragscope.data_models import EvaluationCase, RAGResult

# 1. Instantiate the evaluator (uses local Ollama model by default)
evaluator = RAGEvaluator(judge_model="ollama/llama3")

# 2. Define test case and RAG system output
test_case = EvaluationCase(
    question="What are the notable features of the stadium's pitch and roof?",
    ground_truth_context=["The stadium features a fully retractable roof... and a hybrid pitch..."],
    ground_truth_answer="The stadium has a fully retractable roof and a hybrid pitch."
)

rag_result = RAGResult(
    retrieved_context=["The stadium features a fully retractable roof... and a hybrid pitch..."],
    final_answer="The stadium has a fully retractable roof."  # Intentionally incomplete
)

# 3. Run the evaluation
async def main():
    evaluation_result = await evaluator.aevaluate(test_case, rag_result)
    print(evaluation_result.scores)

if __name__ == "__main__":
    asyncio.run(main())

📊 Full Report Example

RAGscope makes it easy to visualize results. The output from our evaluation provides detailed, actionable insights for every metric.

Category	Metric	Score	Justification
Retrieval	Hit Rate	True	N/A
Retrieval	MRR	1.00	N/A
Retrieval	Context Relevance	0.90	The context is highly relevant as it directly addresses the question topic...
Generation	Faithfulness	1.00	The answer is fully supported by the context...
Generation	Relevance	0.70	The answer partially addresses the question... but does not mention the pitch.
Generation	Answer Completeness	0.50	The answer... omits other notable features... such as the hybrid pitch.

🔧 Configuration

To use other models (e.g., OpenAI’s GPT-4o), configure via environment variables:

import os
os.environ["OPENAI_API_KEY"] = "sk-..."

evaluator = RAGEvaluator(judge_model="gpt-4o")

🤝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
examples		examples
src/ragscope		src/ragscope
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
my_document.txt		my_document.txt
pyproject.toml		pyproject.toml
test_cases.json		test_cases.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏆 RAG Evaluation Suite

✨ Key Features

🚀 Quick Start

1. Installation

2. Run Your First Evaluation

📊 Full Report Example

🔧 Configuration

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

c21051997/ragscope

Folders and files

Latest commit

History

Repository files navigation

🏆 RAG Evaluation Suite

✨ Key Features

🚀 Quick Start

1. Installation

2. Run Your First Evaluation

📊 Full Report Example

🔧 Configuration

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages