[CLEAN] Synthetic Benchmark PR #29981 - perf: optimize DatasetRetrieval.retrieve、RetrievalService._deduplicat… #49
+207
−151
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark PR langgenius#29981
Type: Clean (correct implementation)
Original PR Title: perf: optimize DatasetRetrieval.retrieve、RetrievalService._deduplicat…
Original PR Description: …e_documents、RetrievalService.format_retrieval_documents
Important
Fixes #<issue number>.Summary
fix langgenius#29750
Based on typical RAG retrieval scenarios (assuming 50-200 documents):
Small-scale scenario (10-50 documents)
Medium-scale scenario (100-500 documents)
Large-scale scenario (1000+ documents)
Summary
Significant performance improvements:
optimize RetrievalService._deduplicate_documents speed o(n^2) -> o(n)
Screenshots
Checklist
dev/reformat(backend) andcd web && npx lint-staged(frontend) to appease the lint godsOriginal PR URL: langgenius#29981