GitHub - arkanto99/nos-rag-eval

Overview

The Nós RAG Evaluation Tool provides a framework to evaluate retrieval-augmented generation (RAG) systems, with a particular focus on the retrieval and reranking stages. It integrates multiple components to process queries, retrieve relevant contexts, and generate responses using metadata-rich datasets.

Key Features

Dataset and Index Management: Create evaluation datasets and manage Elasticsearch indices.
Retrieval Evaluation: Assess retrieval and reranking modules in RAG systems.
Evaluation Metrics:
- Traditional IR Metrics: Precision, Recall, and Mean Reciprocal Rank (MRR).
- LLM-as-a-Judge: Uses the AtlaAI/Selene-1-Mini-Llama-3.1-8B model to compute Context Precision and Context Recall.
Visualization Tools: Edit and visualize datasets for manual inspection.

Project Structure

datasets/: Contains datasets used for evaluation.
- News/: Directory for news datasets.
- Questions/: Directory for question datasets.
- Visualization_Tools/: Tools for editing and visualizing datasets during manual revision.
elasticsearch/: Scripts for creating and managing Elasticsearch indices, including index configuration examples.
ir-metrics/: Implements traditional IR metrics for evaluation.
llm-as-judge/: Evaluation scripts using an LLM as a judge.
rag_retriever/: Implements the RAG system, including context retrieval and reranking logic. Stores experiment configurations.
results/: Stores evaluation outputs.
utils/: Utility functions for loading and processing datasets.

Each directory includes scripts and configuration files with examples to facilitate reproducibility.

System Flow

The following diagram illustrates the main workflow of the Nós RAG Evaluation Tool:

Usage

Prerequisites

Python 3.9+
Elasticsearch running locally or remotely.
Install required Python dependencies (see requirements1.txt and requirements2.txt)

1. Create Elasticsearch Index

Make sure Elasticsearch is running, then create the index:

sh launch_es_index_creation.sh

2. Configure the Experiment

Choose the configuration for your experiment, selecting the index, retrieval model, and reranker.
Example configurations are available in:

rag_retriever/configs/experiments/

3. Run Retrieval

Launch the retrieval process with the chosen configuration:

sh launch_retrieval.sh

4. Evaluate Results

Evaluate the retrieved passages using:

Traditional IR metrics

sh launch_evaluate_ir_traditional.sh

LLM-as-a-Judge

sh launch_llm_judge.sh

5. Aggregate Metrics

Summarize all evaluation results into a single report:

sh launch_aggregate_metrics.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Key Features

Project Structure

System Flow

Usage

Prerequisites

1. Create Elasticsearch Index

2. Configure the Experiment

3. Run Retrieval

4. Evaluate Results

5. Aggregate Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
datasets		datasets
docs		docs
elasticsearch		elasticsearch
ir-metrics		ir-metrics
llm-as-judge		llm-as-judge
rag_retriever		rag_retriever
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_testset.py		generate_testset.py
launch_aggregate_metrics.sh		launch_aggregate_metrics.sh
launch_es_index_creation.sh		launch_es_index_creation.sh
launch_evaluate_ir_traditional.sh		launch_evaluate_ir_traditional.sh
launch_llm_judge.sh		launch_llm_judge.sh
launch_retrieval.sh		launch_retrieval.sh
requirements1.txt		requirements1.txt
requirements2.txt		requirements2.txt

Folders and files

Latest commit

History

Repository files navigation

Overview

Key Features

Project Structure

System Flow

Usage

Prerequisites

1. Create Elasticsearch Index

2. Configure the Experiment

3. Run Retrieval

4. Evaluate Results

5. Aggregate Metrics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages