HEAL RAG Implementation Guide

🎯 Overview

This document outlines the complete implementation of a Retrieval-Augmented Generation (RAG) system for HEAL using Genkit for Python. The system enables intelligent chatbot functionality by creating a knowledge base from uploaded insurance documents.

🏗️ Architecture

System Flow

Document Upload → Text Extraction → Chunking → Embedding → Vector Storage → RAG Retrieval → Chatbot Response

Components

Document Ingestion Pipeline
Genkit Embedder Integration
Local Vector Store
RAG Retrieval System
Chatbot with Context
Debug Dashboard

📊 Database Schema Extensions

New Tables

-- Store uploaded documents
CREATE TABLE documents (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    filename TEXT NOT NULL,
    file_path TEXT NOT NULL,
    original_name TEXT NOT NULL,
    file_size INTEGER,
    mime_type TEXT,
    upload_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
    document_type TEXT, -- 'image' or 'pdf'
    extracted_text TEXT,
    processing_status TEXT DEFAULT 'pending', -- 'pending', 'processed', 'failed'
    chunk_count INTEGER DEFAULT 0
);

-- Store document chunks with embeddings
CREATE TABLE document_chunks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    document_id INTEGER NOT NULL,
    chunk_text TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,
    chunk_type TEXT DEFAULT 'paragraph', -- 'paragraph', 'section', 'table'
    embedding BLOB, -- Serialized embedding vector
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE
);

-- Store chat sessions
CREATE TABLE chat_sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT UNIQUE NOT NULL,
    user_id TEXT, -- Future user management
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    last_activity DATETIME DEFAULT CURRENT_TIMESTAMP,
    document_context TEXT -- JSON array of document IDs in context
);

-- Store chat messages with RAG context
CREATE TABLE chat_messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT NOT NULL,
    message_type TEXT NOT NULL, -- 'user' or 'assistant'
    content TEXT NOT NULL,
    relevant_chunks TEXT, -- JSON array of chunk IDs used for context
    confidence_score REAL,
    model_used TEXT,
    tokens_used INTEGER,
    processing_time_ms INTEGER,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (session_id) REFERENCES chat_sessions(session_id)
);

-- Store RAG debug information
CREATE TABLE rag_queries (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    query_text TEXT NOT NULL,
    embedding BLOB,
    top_chunks TEXT, -- JSON array of retrieved chunks with scores
    execution_time_ms INTEGER,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

🔧 Implementation Phases

Phase 1: Document Ingestion & Storage

Features:

File upload with storage to disk
Text extraction from images (OCR) and PDFs
Document chunking strategies
Metadata extraction and storage

Endpoints:

POST /upload                    # Enhanced with RAG ingestion
GET /documents                  # List all documents
GET /documents/{id}             # Get document details
DELETE /documents/{id}          # Delete document and chunks
POST /documents/{id}/reprocess  # Reprocess document chunks

Phase 2: Genkit Embeddings & Vector Storage

Features:

Genkit embedder integration with Gemini
Local vector store setup
Chunk embedding generation
Similarity search implementation

Endpoints:

POST /embeddings/generate       # Generate embeddings for text
POST /embeddings/search         # Search similar chunks
GET /embeddings/stats          # Embedding statistics

Phase 3: RAG Retrieval System

Features:

Query embedding and similarity search
Context ranking and filtering
Multi-document context aggregation
Source attribution

Endpoints:

POST /rag/query                # Test RAG retrieval
GET /rag/context/{query}       # Get context for query
POST /rag/evaluate             # Evaluate retrieval quality

Phase 4: Chatbot Integration

Features:

Session management
Conversation history
Context-aware responses
Multi-turn conversations

Endpoints:

POST /chat/sessions            # Create new chat session
GET /chat/sessions/{id}        # Get session details
POST /chat/sessions/{id}/messages  # Send message
GET /chat/sessions/{id}/history    # Get conversation history
DELETE /chat/sessions/{id}     # Delete session

Phase 5: Debug Dashboard

Features:

Document inspection
Chunk visualization
Embedding similarity testing
Chat conversation debugging
Performance metrics

Endpoints:

GET /debug/documents           # List documents with stats
GET /debug/chunks/{doc_id}     # View document chunks
POST /debug/similarity         # Test similarity search
GET /debug/chat/{session_id}   # Debug chat session
GET /debug/embeddings/{chunk_id} # Inspect embeddings
GET /debug/performance         # System performance metrics

🛠️ Technical Implementation Details

Genkit Configuration

from genkit import genkit
from genkit.plugins.google_genai import GoogleAI

# Initialize Genkit with Google AI
ai = genkit(
    plugins=[GoogleAI()],
    model='gemini-2.5-pro'
)

# Configure embedder
embedder = ai.embedder('text-embedding-004')  # Latest Gemini embedding model

Document Processing Pipeline

@ai.flow()
async def process_document(file_path: str, document_id: int) -> ProcessingResult:
    """Complete document processing pipeline"""
    
    # 1. Extract text
    text = await extract_text_from_file(file_path)
    
    # 2. Chunk text
    chunks = chunk_text_intelligently(text)
    
    # 3. Generate embeddings
    embeddings = []
    for chunk in chunks:
        embedding = await embedder.embed(chunk.text)
        embeddings.append(embedding)
    
    # 4. Store in database
    store_chunks_with_embeddings(document_id, chunks, embeddings)
    
    return ProcessingResult(
        chunks_created=len(chunks),
        embeddings_generated=len(embeddings),
        processing_time=time.time() - start_time
    )

RAG Retrieval Flow

@ai.flow()
async def rag_retrieve(query: str, top_k: int = 5) -> RetrievalResult:
    """Retrieve relevant chunks for query"""
    
    # 1. Generate query embedding
    query_embedding = await embedder.embed(query)
    
    # 2. Search similar chunks
    similar_chunks = search_similar_chunks(query_embedding, top_k)
    
    # 3. Rank and filter results
    ranked_chunks = rank_chunks_by_relevance(similar_chunks, query)
    
    return RetrievalResult(
        query=query,
        chunks=ranked_chunks,
        total_found=len(similar_chunks)
    )

Chat Flow with RAG

@ai.flow()
async def chat_with_rag(
    message: str, 
    session_id: str,
    context_limit: int = 3
) -> ChatResponse:
    """Chat with RAG context"""
    
    # 1. Retrieve relevant context
    context = await rag_retrieve(message, top_k=context_limit)
    
    # 2. Build contextual prompt
    prompt = build_rag_prompt(message, context.chunks)
    
    # 3. Generate response
    response = await ai.generate(
        prompt=prompt,
        model='gemini-2.5-pro'
    )
    
    # 4. Save conversation
    save_chat_message(session_id, message, response.text, context.chunks)
    
    return ChatResponse(
        message=response.text,
        sources=[chunk.source for chunk in context.chunks],
        confidence=calculate_confidence(context.chunks),
        session_id=session_id
    )

🔍 Debugging Features

1. Genkit Developer UI Integration

Visual flow debugging
Embedding inspection
Performance profiling
Error tracing

2. Custom Debug Endpoints

# Document debugging
GET /debug/documents/{id}/chunks    # View all chunks
GET /debug/documents/{id}/embeddings # View embeddings

# Search debugging  
POST /debug/search/test             # Test similarity search
GET /debug/search/history           # Search history

# Chat debugging
GET /debug/chat/{session}/trace     # Full conversation trace
GET /debug/chat/{session}/context   # Context used in responses

3. Performance Monitoring

# Metrics tracked:
- Document processing time
- Embedding generation speed
- Search query latency
- Chat response time
- Token usage and costs
- Memory usage
- Database query performance

📱 Frontend Integration

New UI Components

1. Document Manager

// Document upload and management
<DocumentManager>
  <UploadZone />
  <DocumentList />
  <ProcessingStatus />
</DocumentManager>

2. Chat Interface

// RAG-powered chatbot
<ChatInterface>
  <MessageHistory />
  <ContextSources />
  <InputArea />
  <TypingIndicator />
</ChatInterface>

3. Debug Dashboard

// Development debugging tools
<DebugDashboard>
  <DocumentInspector />
  <EmbeddingVisualizer />
  <ChatTracer />
  <PerformanceMetrics />
</DebugDashboard>

Frontend Features

Real-time chat with typing indicators
Source attribution for responses
Document context highlighting
Conversation history
Debug mode toggle
Performance metrics display

🚀 Development Timeline

Week 1: Foundation

Database schema updates
Document storage system
Basic text extraction
Genkit embedder setup

Week 2: RAG Core

Chunking algorithms
Embedding generation
Vector similarity search
Basic retrieval system

Week 3: Chatbot

Chat session management
RAG-powered responses
Conversation history
Context management

Week 4: Frontend & Debug

Chat UI components
Document management UI
Debug dashboard
Performance optimization

🎯 Success Metrics

Technical Metrics

Document processing speed: < 30 seconds per document
Chat response time: < 5 seconds
Retrieval accuracy: > 80% relevant results
System uptime: > 99%

User Experience Metrics

Chat response relevance
Source attribution accuracy
Conversation flow quality
Debug tool usability

🔒 Security Considerations

Data Protection

Secure file storage
Embedding data encryption
Session management security
API rate limiting

Privacy

Document access controls
Chat history privacy
Embedding anonymization
Audit logging

📚 Resources & References

This implementation provides a complete, debuggable RAG system that enhances HEAL's capabilities while maintaining the existing document analysis functionality.

FilesExpand file tree

RAG_IMPLEMENTATION_GUIDE.md

Latest commit

History

RAG_IMPLEMENTATION_GUIDE.md

File metadata and controls

HEAL RAG Implementation Guide

🎯 Overview

🏗️ Architecture

System Flow

Components

📊 Database Schema Extensions

New Tables

🔧 Implementation Phases

Phase 1: Document Ingestion & Storage

Features:

Endpoints:

Phase 2: Genkit Embeddings & Vector Storage

Features:

Endpoints:

Phase 3: RAG Retrieval System

Features:

Endpoints:

Phase 4: Chatbot Integration

Features:

Endpoints:

Phase 5: Debug Dashboard

Features:

Endpoints:

🛠️ Technical Implementation Details

Genkit Configuration

Document Processing Pipeline

RAG Retrieval Flow

Chat Flow with RAG

🔍 Debugging Features

1. Genkit Developer UI Integration

2. Custom Debug Endpoints

3. Performance Monitoring

📱 Frontend Integration

New UI Components

1. Document Manager

2. Chat Interface

3. Debug Dashboard

Frontend Features

🚀 Development Timeline

Week 1: Foundation

Week 2: RAG Core

Week 3: Chatbot

Week 4: Frontend & Debug

🎯 Success Metrics

Technical Metrics

User Experience Metrics

🔒 Security Considerations

Data Protection

Privacy

📚 Resources & References