This document outlines the complete implementation of a Retrieval-Augmented Generation (RAG) system for HEAL using Genkit for Python. The system enables intelligent chatbot functionality by creating a knowledge base from uploaded insurance documents.
Document Upload → Text Extraction → Chunking → Embedding → Vector Storage → RAG Retrieval → Chatbot Response
- Document Ingestion Pipeline
- Genkit Embedder Integration
- Local Vector Store
- RAG Retrieval System
- Chatbot with Context
- Debug Dashboard
-- Store uploaded documents
CREATE TABLE documents (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT NOT NULL,
file_path TEXT NOT NULL,
original_name TEXT NOT NULL,
file_size INTEGER,
mime_type TEXT,
upload_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
document_type TEXT, -- 'image' or 'pdf'
extracted_text TEXT,
processing_status TEXT DEFAULT 'pending', -- 'pending', 'processed', 'failed'
chunk_count INTEGER DEFAULT 0
);
-- Store document chunks with embeddings
CREATE TABLE document_chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
document_id INTEGER NOT NULL,
chunk_text TEXT NOT NULL,
chunk_index INTEGER NOT NULL,
chunk_type TEXT DEFAULT 'paragraph', -- 'paragraph', 'section', 'table'
embedding BLOB, -- Serialized embedding vector
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE
);
-- Store chat sessions
CREATE TABLE chat_sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT UNIQUE NOT NULL,
user_id TEXT, -- Future user management
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
last_activity DATETIME DEFAULT CURRENT_TIMESTAMP,
document_context TEXT -- JSON array of document IDs in context
);
-- Store chat messages with RAG context
CREATE TABLE chat_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
message_type TEXT NOT NULL, -- 'user' or 'assistant'
content TEXT NOT NULL,
relevant_chunks TEXT, -- JSON array of chunk IDs used for context
confidence_score REAL,
model_used TEXT,
tokens_used INTEGER,
processing_time_ms INTEGER,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (session_id) REFERENCES chat_sessions(session_id)
);
-- Store RAG debug information
CREATE TABLE rag_queries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query_text TEXT NOT NULL,
embedding BLOB,
top_chunks TEXT, -- JSON array of retrieved chunks with scores
execution_time_ms INTEGER,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);- File upload with storage to disk
- Text extraction from images (OCR) and PDFs
- Document chunking strategies
- Metadata extraction and storage
POST /upload # Enhanced with RAG ingestion
GET /documents # List all documents
GET /documents/{id} # Get document details
DELETE /documents/{id} # Delete document and chunks
POST /documents/{id}/reprocess # Reprocess document chunks- Genkit embedder integration with Gemini
- Local vector store setup
- Chunk embedding generation
- Similarity search implementation
POST /embeddings/generate # Generate embeddings for text
POST /embeddings/search # Search similar chunks
GET /embeddings/stats # Embedding statistics- Query embedding and similarity search
- Context ranking and filtering
- Multi-document context aggregation
- Source attribution
POST /rag/query # Test RAG retrieval
GET /rag/context/{query} # Get context for query
POST /rag/evaluate # Evaluate retrieval quality- Session management
- Conversation history
- Context-aware responses
- Multi-turn conversations
POST /chat/sessions # Create new chat session
GET /chat/sessions/{id} # Get session details
POST /chat/sessions/{id}/messages # Send message
GET /chat/sessions/{id}/history # Get conversation history
DELETE /chat/sessions/{id} # Delete session- Document inspection
- Chunk visualization
- Embedding similarity testing
- Chat conversation debugging
- Performance metrics
GET /debug/documents # List documents with stats
GET /debug/chunks/{doc_id} # View document chunks
POST /debug/similarity # Test similarity search
GET /debug/chat/{session_id} # Debug chat session
GET /debug/embeddings/{chunk_id} # Inspect embeddings
GET /debug/performance # System performance metricsfrom genkit import genkit
from genkit.plugins.google_genai import GoogleAI
# Initialize Genkit with Google AI
ai = genkit(
plugins=[GoogleAI()],
model='gemini-2.5-pro'
)
# Configure embedder
embedder = ai.embedder('text-embedding-004') # Latest Gemini embedding model@ai.flow()
async def process_document(file_path: str, document_id: int) -> ProcessingResult:
"""Complete document processing pipeline"""
# 1. Extract text
text = await extract_text_from_file(file_path)
# 2. Chunk text
chunks = chunk_text_intelligently(text)
# 3. Generate embeddings
embeddings = []
for chunk in chunks:
embedding = await embedder.embed(chunk.text)
embeddings.append(embedding)
# 4. Store in database
store_chunks_with_embeddings(document_id, chunks, embeddings)
return ProcessingResult(
chunks_created=len(chunks),
embeddings_generated=len(embeddings),
processing_time=time.time() - start_time
)@ai.flow()
async def rag_retrieve(query: str, top_k: int = 5) -> RetrievalResult:
"""Retrieve relevant chunks for query"""
# 1. Generate query embedding
query_embedding = await embedder.embed(query)
# 2. Search similar chunks
similar_chunks = search_similar_chunks(query_embedding, top_k)
# 3. Rank and filter results
ranked_chunks = rank_chunks_by_relevance(similar_chunks, query)
return RetrievalResult(
query=query,
chunks=ranked_chunks,
total_found=len(similar_chunks)
)@ai.flow()
async def chat_with_rag(
message: str,
session_id: str,
context_limit: int = 3
) -> ChatResponse:
"""Chat with RAG context"""
# 1. Retrieve relevant context
context = await rag_retrieve(message, top_k=context_limit)
# 2. Build contextual prompt
prompt = build_rag_prompt(message, context.chunks)
# 3. Generate response
response = await ai.generate(
prompt=prompt,
model='gemini-2.5-pro'
)
# 4. Save conversation
save_chat_message(session_id, message, response.text, context.chunks)
return ChatResponse(
message=response.text,
sources=[chunk.source for chunk in context.chunks],
confidence=calculate_confidence(context.chunks),
session_id=session_id
)- Visual flow debugging
- Embedding inspection
- Performance profiling
- Error tracing
# Document debugging
GET /debug/documents/{id}/chunks # View all chunks
GET /debug/documents/{id}/embeddings # View embeddings
# Search debugging
POST /debug/search/test # Test similarity search
GET /debug/search/history # Search history
# Chat debugging
GET /debug/chat/{session}/trace # Full conversation trace
GET /debug/chat/{session}/context # Context used in responses# Metrics tracked:
- Document processing time
- Embedding generation speed
- Search query latency
- Chat response time
- Token usage and costs
- Memory usage
- Database query performance// Document upload and management
<DocumentManager>
<UploadZone />
<DocumentList />
<ProcessingStatus />
</DocumentManager>// RAG-powered chatbot
<ChatInterface>
<MessageHistory />
<ContextSources />
<InputArea />
<TypingIndicator />
</ChatInterface>// Development debugging tools
<DebugDashboard>
<DocumentInspector />
<EmbeddingVisualizer />
<ChatTracer />
<PerformanceMetrics />
</DebugDashboard>- Real-time chat with typing indicators
- Source attribution for responses
- Document context highlighting
- Conversation history
- Debug mode toggle
- Performance metrics display
- Database schema updates
- Document storage system
- Basic text extraction
- Genkit embedder setup
- Chunking algorithms
- Embedding generation
- Vector similarity search
- Basic retrieval system
- Chat session management
- RAG-powered responses
- Conversation history
- Context management
- Chat UI components
- Document management UI
- Debug dashboard
- Performance optimization
- Document processing speed: < 30 seconds per document
- Chat response time: < 5 seconds
- Retrieval accuracy: > 80% relevant results
- System uptime: > 99%
- Chat response relevance
- Source attribution accuracy
- Conversation flow quality
- Debug tool usability
- Secure file storage
- Embedding data encryption
- Session management security
- API rate limiting
- Document access controls
- Chat history privacy
- Embedding anonymization
- Audit logging
This implementation provides a complete, debuggable RAG system that enhances HEAL's capabilities while maintaining the existing document analysis functionality.