Skip to content

feat: Add REFRAG preprocessing adapter for next-gen RAG context compression#162

Open
AnasAmchaar wants to merge 3 commits intotavily-ai:masterfrom
AnasAmchaar:feat/refrag-preprocessing-adapter
Open

feat: Add REFRAG preprocessing adapter for next-gen RAG context compression#162
AnasAmchaar wants to merge 3 commits intotavily-ai:masterfrom
AnasAmchaar:feat/refrag-preprocessing-adapter

Conversation

@AnasAmchaar
Copy link

@AnasAmchaar AnasAmchaar commented Mar 22, 2026

Motivation

REFRAG (REpresentation For RAG) is a recent research framework from Meta Superintelligence Labs that fundamentally rethinks how RAG contexts are fed to decoder LLMs. Instead of passing all retrieved passage tokens directly into the decoder, REFRAG:

  1. Compresses passages into compact chunk embeddings via a lightweight encoder (e.g., RoBERTa)
  2. Senses which chunks are important via an RL-trained expansion policy
  3. Expands only the critical chunks back to full tokens

The results are striking: 30.85x TTFT acceleration and 3.75x improvement over prior SOTA (CEPE) with zero loss in perplexity. It also extends the effective context window by 16x, enabling LLMs to consume far more retrieved passages within the same latency budget.

As RAG becomes the dominant pattern for grounding LLMs in real-world knowledge, REFRAG represents the next evolution in how retrieval results are consumed at inference time. This PR positions tavily-python as REFRAG-ready by providing the retrieval-to-decoder preprocessing pipeline.

What this PR adds

A new \\TavilyRefragClient\\ that bridges Tavily search results and REFRAG-compatible decoders. It follows the same architectural pattern as \\TavilyHybridClient\\ -- wrapping an internal \\TavilyClient\\ and accepting pluggable functions -- so users bring their own trained REFRAG models.

New files

File Description
\\ avily/refrag/models.py\\ \\RefragChunk\\ and \\RefragContext\\ dataclasses
\\ avily/refrag/refrag.py\\ \\TavilyRefragClient\\ with composable pipeline
\\ avily/refrag/init.py\\ Package exports
\\ ests/test_refrag.py\\ 23 unit tests using existing interceptor framework
\\examples/refrag.py\\ Full usage example with mock encoder and policy

Modified files

File Change
\\ avily/init.py\\ Export \\TavilyRefragClient\, \\RefragChunk\, \\RefragContext\\

Pipeline architecture

\
User Query
--> TavilyClient.search() # Retrieve passages from the web
--> chunk_passages() # Split into k-token chunks (k=8,16,32)
--> encode_chunks() # Pluggable encoder -> chunk embeddings
--> apply_expansion_policy() # Pluggable RL policy -> expand/compress decisions
--> RefragContext # Ready for external REFRAG decoder
\\

Each step is callable independently or combined via \\prepare_context()\.

Pluggable interfaces (no heavy dependencies)

Interface Signature Default
\\ okenizer_function\\ \(str) -> list[int]\\ tiktoken (already a dependency)
\\encoder_function\\ \(list[list[int]]) -> list[list[float]]\\ None (user-provided)
\\expansion_policy\\ \(chunk_embs, query_emb) -> list[bool]\\ Compress all (full compression)

This means zero new dependencies -- the core only uses \\ iktoken\\ which is already in \\setup.py\. Users plug in their own encoder (RoBERTa, etc.) and RL policy when they have trained REFRAG models.

Usage

\\python
from tavily import TavilyRefragClient

client = TavilyRefragClient(
api_key='tvly-...',
chunk_size=16, # k=16 as in the paper
encoder_function=my_roberta_encoder,
expansion_policy=my_rl_policy,
)

Full pipeline in one call

ctx = client.prepare_context('What are the latest advances in RAG?', max_results=10)

print(len(ctx.compressed_chunks)) # Feed as embeddings to decoder
print(len(ctx.expanded_chunks)) # Feed as full tokens to decoder

Or step-by-step for more control

chunks = client.chunk_passages(tavily_results, chunk_size=8)
chunks = client.encode_chunks(chunks)
chunks = client.apply_expansion_policy(chunks, query='...')
\\

Design decisions

  • Preprocessing only, not full inference: This is a search SDK, not an ML framework. The client handles retrieval + chunking + encoding and outputs a clean \\RefragContext\\ dataclass. Actual LLM decoding happens externally.
  • Follows existing patterns: Mirrors \\TavilyHybridClient\'s approach of wrapping an internal \\TavilyClient\\ and accepting pluggable callables.
  • Composable pipeline: Each step works independently, so users can mix and match (e.g., use their own retrieval but Tavily's chunking, or chunk externally but use the policy step).
  • Future-proof: When Meta releases the official REFRAG code at github.com/facebookresearch/refrag, users can plug those models directly into the encoder and policy interfaces.

Test plan

  • 23 new unit tests covering all code paths (constructor, chunking, encoding, policy, end-to-end pipeline, edge cases)
  • All tests use the existing \
    equest_intercept.py\\ framework -- no real API calls
  • Full existing test suite (71 tests) still passes -- zero regressions
  • 94/94 tests pass total

References


Note

Low Risk
Additive feature introducing a new TavilyRefragClient wrapper and datamodels without modifying existing request/response behavior; primary risk is new public API surface and tokenization/encoding assumptions in the new adapter.

Overview
Adds a new REFRAG preprocessing flow via TavilyRefragClient, including passage tokenization/chunking, optional embedding via a user-supplied encoder_function, and optional chunk selection via a pluggable expansion_policy, returning a RefragContext with compressed_chunks/expanded_chunks.

Exposes the new client and RefragChunk/RefragContext from tavily.__init__, and adds an end-to-end example (examples/refrag.py) plus a new tests/test_refrag.py covering constructor validation, chunking/encoding/policy behavior, and prepare_context() integration with forwarded search kwargs.

Written by Cursor Bugbot for commit 41b4b9a. This will update automatically on new commits. Configure here.

This update introduces the TavilyRefragClient, RefragChunk, and RefragContext classes to the tavily module, enhancing its functionality for handling refrag operations.
Updated the chunk size assignment to use a conditional expression, ensuring that the default chunk size is used only when chunk_size is explicitly set to None.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if encode and self.encoder_function is not None:
self.encode_chunks(chunks)
if apply_policy:
self.apply_expansion_policy(chunks, query=query)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent skip of encoding contradicts docstring and encode_chunks

Medium Severity

When encode=True (the default) but encoder_function is None, prepare_context silently skips both encoding and the expansion policy. This contradicts the docstring which states encode "Requires encoder_function to be set" and is inconsistent with encode_chunks, which raises a RuntimeError when called without an encoder. A user who explicitly passes encode=True signals intent to encode — silently producing a RefragContext with no embeddings and no expansion flags creates a subtle failure mode where downstream REFRAG decoders receive incomplete data without any error being surfaced.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant