feat: Add REFRAG preprocessing adapter for next-gen RAG context compression by AnasAmchaar · Pull Request #162 · tavily-ai/tavily-python

AnasAmchaar · 2026-03-22T02:34:30Z

Motivation

REFRAG (REpresentation For RAG) is a recent research framework from Meta Superintelligence Labs that fundamentally rethinks how RAG contexts are fed to decoder LLMs. Instead of passing all retrieved passage tokens directly into the decoder, REFRAG:

Compresses passages into compact chunk embeddings via a lightweight encoder (e.g., RoBERTa)
Senses which chunks are important via an RL-trained expansion policy
Expands only the critical chunks back to full tokens

The results are striking: 30.85x TTFT acceleration and 3.75x improvement over prior SOTA (CEPE) with zero loss in perplexity. It also extends the effective context window by 16x, enabling LLMs to consume far more retrieved passages within the same latency budget.

As RAG becomes the dominant pattern for grounding LLMs in real-world knowledge, REFRAG represents the next evolution in how retrieval results are consumed at inference time. This PR positions tavily-python as REFRAG-ready by providing the retrieval-to-decoder preprocessing pipeline.

What this PR adds

A new \\TavilyRefragClient\\ that bridges Tavily search results and REFRAG-compatible decoders. It follows the same architectural pattern as \\TavilyHybridClient\\ -- wrapping an internal \\TavilyClient\\ and accepting pluggable functions -- so users bring their own trained REFRAG models.

New files

File	Description
\\ avily/refrag/models.py\\	\\RefragChunk\\ and \\RefragContext\\ dataclasses
\\ avily/refrag/refrag.py\\	\\TavilyRefragClient\\ with composable pipeline
\\ avily/refrag/init.py\\	Package exports
\\ ests/test_refrag.py\\	23 unit tests using existing interceptor framework
\\examples/refrag.py\\	Full usage example with mock encoder and policy

Modified files

File	Change
\\ avily/init.py\\	Export \\TavilyRefragClient\, \\RefragChunk\, \\RefragContext\\

Pipeline architecture

\
User Query
--> TavilyClient.search() # Retrieve passages from the web
--> chunk_passages() # Split into k-token chunks (k=8,16,32)
--> encode_chunks() # Pluggable encoder -> chunk embeddings
--> apply_expansion_policy() # Pluggable RL policy -> expand/compress decisions
--> RefragContext # Ready for external REFRAG decoder
\\

Each step is callable independently or combined via \\prepare_context()\.

Pluggable interfaces (no heavy dependencies)

Interface	Signature	Default
\\ okenizer_function\\	\(str) -> list[int]\\	tiktoken (already a dependency)
\\encoder_function\\	\(list[list[int]]) -> list[list[float]]\\	None (user-provided)
\\expansion_policy\\	\(chunk_embs, query_emb) -> list[bool]\\	Compress all (full compression)

This means zero new dependencies -- the core only uses \\ iktoken\\ which is already in \\setup.py\. Users plug in their own encoder (RoBERTa, etc.) and RL policy when they have trained REFRAG models.

Usage

\\python
from tavily import TavilyRefragClient

client = TavilyRefragClient(
api_key='tvly-...',
chunk_size=16, # k=16 as in the paper
encoder_function=my_roberta_encoder,
expansion_policy=my_rl_policy,
)

Full pipeline in one call

ctx = client.prepare_context('What are the latest advances in RAG?', max_results=10)

print(len(ctx.compressed_chunks)) # Feed as embeddings to decoder
print(len(ctx.expanded_chunks)) # Feed as full tokens to decoder

Or step-by-step for more control

chunks = client.chunk_passages(tavily_results, chunk_size=8)
chunks = client.encode_chunks(chunks)
chunks = client.apply_expansion_policy(chunks, query='...')
\\

Design decisions

Preprocessing only, not full inference: This is a search SDK, not an ML framework. The client handles retrieval + chunking + encoding and outputs a clean \\RefragContext\\ dataclass. Actual LLM decoding happens externally.
Follows existing patterns: Mirrors \\TavilyHybridClient\'s approach of wrapping an internal \\TavilyClient\\ and accepting pluggable callables.
Composable pipeline: Each step works independently, so users can mix and match (e.g., use their own retrieval but Tavily's chunking, or chunk externally but use the policy step).
Future-proof: When Meta releases the official REFRAG code at github.com/facebookresearch/refrag, users can plug those models directly into the encoder and policy interfaces.

Test plan

23 new unit tests covering all code paths (constructor, chunking, encoding, policy, end-to-end pipeline, edge cases)
All tests use the existing \
equest_intercept.py\\ framework -- no real API calls
Full existing test suite (71 tests) still passes -- zero regressions
94/94 tests pass total

References

REFRAG paper: arXiv:2509.01092v2 (Lin et al., Meta, Oct 2025)
Official code (upcoming): https://github.com/facebookresearch/refrag

Note

Low Risk
Additive feature introducing a new TavilyRefragClient wrapper and datamodels without modifying existing request/response behavior; primary risk is new public API surface and tokenization/encoding assumptions in the new adapter.

Overview
Adds a new REFRAG preprocessing flow via TavilyRefragClient, including passage tokenization/chunking, optional embedding via a user-supplied encoder_function, and optional chunk selection via a pluggable expansion_policy, returning a RefragContext with compressed_chunks/expanded_chunks.

Exposes the new client and RefragChunk/RefragContext from tavily.__init__, and adds an end-to-end example (examples/refrag.py) plus a new tests/test_refrag.py covering constructor validation, chunking/encoding/policy behavior, and prepare_context() integration with forwarded search kwargs.

^{Written by Cursor Bugbot for commit 41b4b9a. This will update automatically on new commits. Configure here.}

This update introduces the TavilyRefragClient, RefragChunk, and RefragContext classes to the tavily module, enhancing its functionality for handling refrag operations.

…ansion policy Made-with: Cursor

tavily/refrag/refrag.py

Updated the chunk size assignment to use a conditional expression, ensuring that the default chunk size is used only when chunk_size is explicitly set to None.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-22T11:49:13Z

tavily/refrag/refrag.py

+        if encode and self.encoder_function is not None:
+            self.encode_chunks(chunks)
+            if apply_policy:
+                self.apply_expansion_policy(chunks, query=query)


Silent skip of encoding contradicts docstring and encode_chunks

Medium Severity

When encode=True (the default) but encoder_function is None, prepare_context silently skips both encoding and the expansion policy. This contradicts the docstring which states encode "Requires encoder_function to be set" and is inconsistent with encode_chunks, which raises a RuntimeError when called without an encoder. A user who explicitly passes encode=True signals intent to encode — silently producing a RefragContext with no embeddings and no expansion flags creates a subtle failure mode where downstream REFRAG decoders receive incomplete data without any error being surfaced.

Additional Locations (1)

tavily/refrag/refrag.py#L215-L217

AnasAmchaar added 2 commits March 22, 2026 03:24

feat: add refrag client and related classes to the module

7955c65

This update introduces the TavilyRefragClient, RefragChunk, and RefragContext classes to the tavily module, enhancing its functionality for handling refrag operations.

feat: add REFRAG preprocessing example with pluggable encoder and exp…

d2b9da9

…ansion policy Made-with: Cursor

cursor bot reviewed Mar 22, 2026

View reviewed changes

tavily/refrag/refrag.py Outdated Show resolved Hide resolved

fix: handle None chunk_size in TavilyRefragClient

41b4b9a

Updated the chunk size assignment to use a conditional expression, ensuring that the default chunk size is used only when chunk_size is explicitly set to None.

cursor bot reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add REFRAG preprocessing adapter for next-gen RAG context compression#162

feat: Add REFRAG preprocessing adapter for next-gen RAG context compression#162
AnasAmchaar wants to merge 3 commits intotavily-ai:masterfrom
AnasAmchaar:feat/refrag-preprocessing-adapter

AnasAmchaar commented Mar 22, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AnasAmchaar commented Mar 22, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What this PR adds

New files

Modified files

Pipeline architecture

Pluggable interfaces (no heavy dependencies)

Usage

Full pipeline in one call

Or step-by-step for more control

Design decisions

Test plan

References

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 22, 2026

Choose a reason for hiding this comment

Silent skip of encoding contradicts docstring and encode_chunks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AnasAmchaar commented Mar 22, 2026 •

edited by cursor bot

Loading

Silent skip of encoding contradicts docstring and `encode_chunks`