Skip to content

arkanto99/nos-rag-eval

Repository files navigation

Overview

The Nós RAG Evaluation Tool provides a framework to evaluate retrieval-augmented generation (RAG) systems, with a particular focus on the retrieval and reranking stages. It integrates multiple components to process queries, retrieve relevant contexts, and generate responses using metadata-rich datasets.

Key Features

  • Dataset and Index Management: Create evaluation datasets and manage Elasticsearch indices.
  • Retrieval Evaluation: Assess retrieval and reranking modules in RAG systems.
  • Evaluation Metrics:
    • Traditional IR Metrics: Precision, Recall, and Mean Reciprocal Rank (MRR).
    • LLM-as-a-Judge: Uses the AtlaAI/Selene-1-Mini-Llama-3.1-8B model to compute Context Precision and Context Recall.
  • Visualization Tools: Edit and visualize datasets for manual inspection.

Project Structure

  • datasets/: Contains datasets used for evaluation.
    • News/: Directory for news datasets.
    • Questions/: Directory for question datasets.
    • Visualization_Tools/: Tools for editing and visualizing datasets during manual revision.
  • elasticsearch/: Scripts for creating and managing Elasticsearch indices, including index configuration examples.
  • ir-metrics/: Implements traditional IR metrics for evaluation.
  • llm-as-judge/: Evaluation scripts using an LLM as a judge.
  • rag_retriever/: Implements the RAG system, including context retrieval and reranking logic. Stores experiment configurations.
  • results/: Stores evaluation outputs.
  • utils/: Utility functions for loading and processing datasets.

Each directory includes scripts and configuration files with examples to facilitate reproducibility.

System Flow

The following diagram illustrates the main workflow of the Nós RAG Evaluation Tool:

System Flow

Usage

Prerequisites

  • Python 3.9+
  • Elasticsearch running locally or remotely.
  • Install required Python dependencies (see requirements1.txt and requirements2.txt)

1. Create Elasticsearch Index

Make sure Elasticsearch is running, then create the index:

sh launch_es_index_creation.sh

2. Configure the Experiment

Choose the configuration for your experiment, selecting the index, retrieval model, and reranker.
Example configurations are available in:

rag_retriever/configs/experiments/

3. Run Retrieval

Launch the retrieval process with the chosen configuration:

sh launch_retrieval.sh

4. Evaluate Results

Evaluate the retrieved passages using:

Traditional IR metrics

sh launch_evaluate_ir_traditional.sh

LLM-as-a-Judge

sh launch_llm_judge.sh

5. Aggregate Metrics

Summarize all evaluation results into a single report:

sh launch_aggregate_metrics.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors