Skip to content

This is the code of a agentic rag method with dynamic workflow.

Notifications You must be signed in to change notification settings

double98/Agentic-RAG

 
 

Repository files navigation

MAO-ARAG: Multi-Agent Orchestration for Adaptive Retrieval-Augmented Generation

MAO-ARAG is a multi-agent orchestration framework for adaptive Retrieval-Augmented Generation (RAG) in question-answering systems. It dynamically selects and integrates different RAG modules based on query complexity, balancing answer quality, cost, and latency.


🧠 Overview

Traditional RAG systems struggle to serve all types of queries efficiently, as they target either low-complexity or high-complexity questions. MAO-ARAG introduces a multi-turn, agent-based architecture equipped with a planner agent and multiple executor agents (e.g., reformulators, retrievers, generators). The planner learns to construct optimal workflows per query via reinforcement learning, maximizing answer quality while minimizing cost.


📋 Table of Contents

  1. Computational Resource Requirements
  2. Download and Process Data
  3. Deploy Retriever
  4. Training

⚙️ Computational Resource Requirements

We used 6*A800 for training of MAO-ARAG and deployed retreiver with the help of 1*A800 to accelerate retrieval.


📦 Download and Process Data

MAO-ARAG supports multiple QA datasets from the Hugging Face Hub. All datasets can be loaded using the datasets library and processed into a unified format for downstream tasks.

Supported Datasets

The following datasets are used in our framework:

Dataset Name Hugging Face Identifier
NQ google-research-datasets/nq_open
PopQA akariasai/PopQA
AmbigQA sewon/ambig_qa
HotpotQA hotpotqa/hotpot_qa
2Wiki voidful/2WikiMultihopQA
Musique bdsaglam/musique
Bamboogle chiayewken/bamboogle

To download a dataset, use the following code snippet:

from datasets import load_dataset

data_source = "<dataset_identifier>"  # e.g., "google-research-datasets/nq_open"
dataset = load_dataset(data_source)
Extract Questions and Answers

After downloading the raw dataset, process it into a list of dictionaries. Each dictionary should have the format:

{
    "question": "<question text>",
    "answer": "<answer text>"
}

Save this list to the following path:

data/{dataset_name}/{dataset_name}__train_questions_and_answers.json
data/{dataset_name}/{dataset_name}__test_questions_and_answers.json
Generate Parquet-formatted Data

Once the JSON files are created, run the corresponding dataset processing script to generate the final .parquet files:

python data/{dataset_name}.py

🔍 Deploy Retriever

Firstly, you should getting index for the corpus. You should have a corpus, a dense retrieval model, and run index.py in ./retriever:

CUDA_VISIBLE_DEVICES=0 python index.py

Then, run the run_server.sh in ./qa_manager to deploy the retreiver:

bash run_server.sh

🏋️ Training

Run the run_ppo.sh to start the train loop of MAO-ARAG.

bash run_ppo.sh

About

This is the code of a agentic rag method with dynamic workflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.7%
  • Shell 2.2%
  • Roff 0.1%