Skip to content

feat(rag): add PathRetriever, SimpleFusion and LLMLingua HTTP compressor#9

Open
AsterZephyr wants to merge 1 commit intomainfrom
feat/dual-sparse-retrieval-llmlingua
Open

feat(rag): add PathRetriever, SimpleFusion and LLMLingua HTTP compressor#9
AsterZephyr wants to merge 1 commit intomainfrom
feat/dual-sparse-retrieval-llmlingua

Conversation

@AsterZephyr
Copy link
Copy Markdown
Contributor

This commit enhances the RAG pipeline with three major features:

  1. Path Retriever (双路径稀疏检索)

    • Implements path-based sparse retrieval for hierarchical document structures
    • Supports configurable path fields (know_path, file_path, etc.)
    • Enables dual sparse retrieval (BM25 + Path) with automatic fusion
    • Adds PathRetriever with BM25-weighted path field queries
    • Updates retrieval provider to classify path as sparse retrieval type
  2. Simple Fusion Strategy

    • Adds SimpleFusionStrategy matching EasyRAG's HybridRetriever.fusion behavior
    • Merges results by document ID, keeping highest score per document
    • Supports configurable topK limit after fusion
    • Provides simple alternative to RRF for result merging
  3. LLMLingua HTTP Compression Integration

    • Adds HTTPCompressor for external compression services (e.g., LLMLingua)
    • Extends pipeline.post.compress config with endpoint and headers support
    • Supports method: http or llmlingua for external service calls
    • Adds validation for HTTP compression endpoint requirements
    • Integrates HTTPCompressor into RAG client initialization

Changes:

  • Add retriever/path.go: PathRetriever implementation
  • Add retriever/README_PATH_RETRIEVER.md: Path Retriever documentation
  • Add fusion/simple.go: SimpleFusionStrategy implementation
  • Update rag_client.go: Integrate Path Retriever and HTTPCompressor
  • Update retrieval/provider.go: Classify path as sparse retrieval
  • Update post/compress.go: Add HTTPCompressor with batch compression
  • Update config/pipeline.go: Add endpoint and headers to compress config
  • Update config/validation.go: Validate HTTP compression endpoint
  • Update server.go: Load HTTP compression configuration
  • Update README.md: Document new compression and retrieval features

This enables dual sparse retrieval workflows and flexible external compression service integration, improving retrieval accuracy and context compression options.

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

…ession

This commit enhances the RAG pipeline with three major features:

1. **Path Retriever (双路径稀疏检索)**
   - Implements path-based sparse retrieval for hierarchical document structures
   - Supports configurable path fields (know_path, file_path, etc.)
   - Enables dual sparse retrieval (BM25 + Path) with automatic fusion
   - Adds PathRetriever with BM25-weighted path field queries
   - Updates retrieval provider to classify path as sparse retrieval type

2. **Simple Fusion Strategy**
   - Adds SimpleFusionStrategy matching EasyRAG's HybridRetriever.fusion behavior
   - Merges results by document ID, keeping highest score per document
   - Supports configurable topK limit after fusion
   - Provides simple alternative to RRF for result merging

3. **LLMLingua HTTP Compression Integration**
   - Adds HTTPCompressor for external compression services (e.g., LLMLingua)
   - Extends pipeline.post.compress config with endpoint and headers support
   - Supports method: http or llmlingua for external service calls
   - Adds validation for HTTP compression endpoint requirements
   - Integrates HTTPCompressor into RAG client initialization

Changes:
- Add retriever/path.go: PathRetriever implementation
- Add retriever/README_PATH_RETRIEVER.md: Path Retriever documentation
- Add fusion/simple.go: SimpleFusionStrategy implementation
- Update rag_client.go: Integrate Path Retriever and HTTPCompressor
- Update retrieval/provider.go: Classify path as sparse retrieval
- Update post/compress.go: Add HTTPCompressor with batch compression
- Update config/pipeline.go: Add endpoint and headers to compress config
- Update config/validation.go: Validate HTTP compression endpoint
- Update server.go: Load HTTP compression configuration
- Update README.md: Document new compression and retrieval features

This enables dual sparse retrieval workflows and flexible external compression
service integration, improving retrieval accuracy and context compression options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant