feat(rag): add PathRetriever, SimpleFusion and LLMLingua HTTP compressor#9
Open
AsterZephyr wants to merge 1 commit intomainfrom
Open
feat(rag): add PathRetriever, SimpleFusion and LLMLingua HTTP compressor#9AsterZephyr wants to merge 1 commit intomainfrom
AsterZephyr wants to merge 1 commit intomainfrom
Conversation
…ession This commit enhances the RAG pipeline with three major features: 1. **Path Retriever (双路径稀疏检索)** - Implements path-based sparse retrieval for hierarchical document structures - Supports configurable path fields (know_path, file_path, etc.) - Enables dual sparse retrieval (BM25 + Path) with automatic fusion - Adds PathRetriever with BM25-weighted path field queries - Updates retrieval provider to classify path as sparse retrieval type 2. **Simple Fusion Strategy** - Adds SimpleFusionStrategy matching EasyRAG's HybridRetriever.fusion behavior - Merges results by document ID, keeping highest score per document - Supports configurable topK limit after fusion - Provides simple alternative to RRF for result merging 3. **LLMLingua HTTP Compression Integration** - Adds HTTPCompressor for external compression services (e.g., LLMLingua) - Extends pipeline.post.compress config with endpoint and headers support - Supports method: http or llmlingua for external service calls - Adds validation for HTTP compression endpoint requirements - Integrates HTTPCompressor into RAG client initialization Changes: - Add retriever/path.go: PathRetriever implementation - Add retriever/README_PATH_RETRIEVER.md: Path Retriever documentation - Add fusion/simple.go: SimpleFusionStrategy implementation - Update rag_client.go: Integrate Path Retriever and HTTPCompressor - Update retrieval/provider.go: Classify path as sparse retrieval - Update post/compress.go: Add HTTPCompressor with batch compression - Update config/pipeline.go: Add endpoint and headers to compress config - Update config/validation.go: Validate HTTP compression endpoint - Update server.go: Load HTTP compression configuration - Update README.md: Document new compression and retrieval features This enables dual sparse retrieval workflows and flexible external compression service integration, improving retrieval accuracy and context compression options.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit enhances the RAG pipeline with three major features:
Path Retriever (双路径稀疏检索)
Simple Fusion Strategy
LLMLingua HTTP Compression Integration
Changes:
This enables dual sparse retrieval workflows and flexible external compression service integration, improving retrieval accuracy and context compression options.
Ⅰ. Describe what this PR did
Ⅱ. Does this pull request fix one issue?
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews