Skip to content

Implement RRPO (Reranking Preference Optimization) pipeline (2026) #329

@vkehfdl1

Description

@vkehfdl1

Paper

  • Title: Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
  • Authors: Yuhang Wu, Xiangqing Shen, Fanfan Wang, Cangqi Zhou, Zhen Wu, Xinyu Dai, Rui Xia
  • Link: https://arxiv.org/abs/2604.02091v1
  • Source: Arxiv

Summary

RRPO introduces a reinforcement learning framework that directly aligns reranking with the downstream LLM generation quality. By formulating reranking as a sequential decision-making process and optimizing for context utility using LLM feedback, RRPO eliminates the misalignment between static relevance-based reranking and actual answer generation utility. It includes a reference-anchored deterministic baseline for training stability and outperforms strong baselines including RankZephyr on knowledge-intensive benchmarks.

Relevance to AutoRAG-Research

Directly relevant as a retrieval pipeline enhancement. AutoRAG-Research currently lacks a dedicated reranking optimization module — existing retrieval pipelines (bm25, vector_search, hyde, hybrid) retrieve and rank documents based on static relevance signals without considering downstream generation quality. RRPO bridges this gap by training a reranker that optimizes for what actually helps the LLM generate correct answers. It also integrates orthogonally with query expansion modules (like HyDE) and generalizes across different reader LLMs.

Implementation Notes

  • Pipeline type: retrieval (reranking stage)
  • Key components:
    • Reranking formulated as sequential decision-making (selecting documents one by one)
    • RL optimization with LLM generation quality as reward signal
    • Reference-anchored deterministic baseline for stable training
    • Compatible with various reader LLMs (tested with GPT-4o)
    • Orthogonal integration with query expansion (e.g., Query2Doc/HyDE)
  • Dependencies: PyTorch, RL training utilities (PPO-style), LangChain LLM integration for reward computation

Acceptance Criteria

  • Pipeline class implementing BaseRetrievalPipeline
  • Config dataclass with get_pipeline_class() and get_pipeline_kwargs()
  • YAML configuration file
  • Unit tests following existing test patterns
  • Passes existing test suite without regression

Metadata

Metadata

Assignees

No one assigned

    Labels

    New PipelineAdding new pipeline supportpaper-scoutIssues created by daily paper scout

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions