Build a stable retrieval foundation for agent runtime use:
- high recall for semantically similar questions
- low irrelevant-hit rate
- predictable latency under embedding/rerank degradation
- Hybrid retrieval: Milvus vector + PostgreSQL BM25 + RRF fusion.
- Query-term overlap rerank and configurable relevance threshold.
- Lexical fallback when BM25 under-retrieves.
- Query analysis is a first-class stage (token expansion, synonym/fine-grained keywords).
- Retrieval uses a large candidate pool, then reranks and applies
similarity_threshold. - Threshold is applied after rerank, not before.
- Retrieval output includes stable chunk metadata for downstream prompt assembly and citation.
Reference:
examples-of-reference/ragflow/rag/nlp/search.pyexamples-of-reference/ragflow/rag/nlp/query.py
- Query analysis
- Extract normalized terms + language-aware tokens.
- Optional query rewrite and synonym expansion.
- Multi-channel recall
- Vector recall (Milvus).
- Full-text/BM25 recall (PostgreSQL).
- Keyword fallback recall (for timeout/degraded cases).
- Candidate fusion
- RRF for mixed channels.
- Keep source score and retrieval method in metadata.
- Rerank
- Prefer model-based rerank when configured.
- Fallback to lexical+vector heuristic rerank when reranker unavailable.
- Relevance gate
- Apply configurable
min_relevance_score. - Keep channel-specific guards for noisy fallback channels.
- Agent context assembly
- Return top-k chunks + citations + retrieval traces.
- Support token-budget-aware context packing.
Expose a retrieval contract suitable for agent framework calls:
- input: query, scope filters, top_k, min_score, latency budget
- output: chunks, scores, document metadata, retrieval traces
- behavior: graceful degradation when embeddings/reranker timeout
- Add model-based rerank execution for
knowledge_base.search.rerank_*. - Add retrieval traces in API response for observability.
- Add offline eval set and NDCG/Recall@k regression checks.
- Integrate retrieval contract into agent knowledge toolchain.