Production RAG system for enterprise document intelligence.
A battle-tested Retrieval-Augmented Generation (RAG) system designed for enterprise environments. Ingests multi-source documents, retrieves relevant context with hybrid search, and generates grounded responses with confidence scores and source attribution.
Problem • Solution • Architecture • Quick Start • Features • Performance
Standard RAG implementations struggle at enterprise scale:
- Hallucination Risk: LLMs generate plausible-sounding but factually incorrect information
- Source Loss: Generated responses lack citation and traceability
- Retrieval Brittleness: Simple embeddings miss relevant documents
- Format Chaos: Enterprise documents come in PDF, Word, HTML, databases—integration nightmare
- Performance Degradation: Retrieval quality decreases with document collection size
- No Quality Metrics: Impossible to measure answer faithfulness and accuracy
Enterprise RAG Engine provides:
✅ Hybrid Retrieval - BM25 keyword search + semantic embeddings with Reciprocal Rank Fusion
✅ Multi-Source Ingestion - PDF, Word, HTML, structured databases, APIs
✅ Semantic Chunking - Intelligent document splitting based on content coherence
✅ Hallucination Guards - Factual grounding checks and confidence scoring
✅ Citation Engine - Automatic source attribution with exact quotes
✅ Quality Evaluation - RAGAS framework metrics (faithfulness, relevance, precision)
graph LR
A["PDF Files"] --> B["Document Ingestion"]
C["Word Docs"] --> B
D["Web Content"] --> B
E["Databases"] --> B
B -->|Extract Text| F["Text Preprocessing"]
F -->|Clean & Normalize| G["Semantic Chunking"]
G -->|Generate Embeddings| H["Vector Store<br/>Pinecone/Chroma"]
G -->|Index Keywords| I["BM25 Index"]
J["User Query"] --> K["Query Processing"]
K -->|Semantic Search| H
K -->|Keyword Search| I
H -->|Top-k Results| L["Hybrid Fusion<br/>RRF"]
I -->|Top-k Results| L
L -->|Rerank Results| M["Cross-Encoder<br/>Reranker"]
M -->|Top Documents| N["Hallucination Guard<br/>Factual Check"]
N -->|Grounded Context| O["LLM Generation"]
O -->|Generated Text| P["Citation Engine"]
P -->|Final Response<br/>+ Sources| Q["User"]
O -->|Generated Content| R["RAGAS Evaluator<br/>Quality Metrics"]
R -->|Scores| S["Feedback Loop"]
S -->|Improve| G
| Feature | Description |
|---|---|
| BM25 + Semantic Hybrid Search | Combines keyword and semantic similarity for robust retrieval |
| Reciprocal Rank Fusion | Intelligently merges BM25 and embedding results |
| Cross-Encoder Reranking | Fine-tuned models rerank candidates based on query relevance |
| Semantic Chunking | Splits documents at logical boundaries using sentence embeddings |
| Hallucination Detection | Flags responses unsupported by source documents |
| Citation Engine | Extracts and includes exact quotes with source attribution |
| Multi-Format Support | Handles PDF, DOCX, HTML, plain text, CSV, JSON |
| Vector Store Agnostic | Works with Pinecone, Chroma, Weaviate, Milvus |
| Quality Metrics | RAGAS evaluation framework integration |
| Metric | Single Embedding | Hybrid Search | With Reranking | With Guards |
|---|---|---|---|---|
| Retrieval Precision@10 | 0.72 | 0.89 | 0.94 | 0.94 |
| Retrieval Recall@10 | 0.68 | 0.85 | 0.88 | 0.88 |
| Answer Faithfulness | 0.74 | 0.81 | 0.88 | 0.95 |
| Avg Latency (ms) | 180 | 250 | 380 | 420 |
| Hallucination Rate | 22% | 18% | 12% | 3% |
pip install enterprise-rag-enginefrom rag_engine.ingestion import PDFParser
from rag_engine.chunking import SemanticChunker
from rag_engine.retrieval import HybridRetriever
from rag_engine.generation import HallucinationGuard
# 1. Ingest documents
parser = PDFParser()
documents = parser.parse_directory("./docs/")
# 2. Chunk documents semantically
chunker = SemanticChunker(chunk_size=512, overlap=50)
chunks = chunker.chunk_documents(documents)
# 3. Create hybrid retriever
retriever = HybridRetriever(
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
vector_store="pinecone"
)
retriever.index_chunks(chunks)
# 4. Query with hallucination guards
query = "What are the key benefits of our product?"
results = retriever.retrieve(query, top_k=5)
guard = HallucinationGuard()
grounded_results = guard.filter(results, query)
# 5. Generate with citations
from rag_engine.generation import CitationEngine
citations = CitationEngine()
response = citations.generate_with_sources(
query=query,
context=grounded_results
)
print(response.answer)
print(response.sources) # Exact quotes with document references- PDF extraction (text + tables)
- DOCX, PPTX, HTML parsing
- CSV/JSON structured data
- Web scraping and API integration
- Sentence-level boundary detection
- Overlap for context preservation
- Adaptive chunk sizing
- Metadata preservation
- BM25 sparse retrieval
- Dense vector search
- Reciprocal Rank Fusion fusion
- Per-document ranking
- Entailment checking
- Answer relevance scoring
- Confidence calibration
- Contradiction detection
- Automatic quote extraction
- Source attribution
- Confidence scoring
- Multi-source synthesis
from rag_engine.evaluation import RAGASEvaluator
evaluator = RAGASEvaluator()
metrics = evaluator.evaluate(
questions=test_questions,
answers=generated_answers,
documents=source_documents
)
print(f"Faithfulness: {metrics.faithfulness:.3f}")
print(f"Answer Relevance: {metrics.answer_relevance:.3f}")
print(f"Context Precision: {metrics.context_precision:.3f}")We welcome contributions! Please see CONTRIBUTING.md.
MIT License - see LICENSE for details.
Built by Sainath Pattipati
Enterprise-grade document intelligence at your fingertips.