Skip to content

Feature: AI Interactive Chat with RAG System for Selected Transcripts #52

@davidamacey

Description

@davidamacey

Feature Summary

Implement an AI-powered interactive chat system that allows users to select multiple media files from the gallery view and start a conversational AI session with those transcripts as context. The system should use Retrieval Augmented Generation (RAG) with OpenSearch to provide accurate, context-aware responses about the selected transcript content, mimicking the ChatGPT interface experience.

Problem Statement

Users often want to ask questions about their transcripts, extract specific information, or analyze content across multiple recordings. Currently, they must manually read through entire transcripts to find relevant information. An interactive AI chat system would allow users to:

  • Ask questions about specific topics across multiple transcripts
  • Get summaries of discussions on particular subjects
  • Find action items, decisions, or key points mentioned by specific speakers
  • Analyze trends and patterns across multiple meetings/recordings
  • Extract insights without manually searching through hours of content

Current State Analysis

Existing Infrastructure

  • Selection System: MediaLibrary.svelte already has complete multi-select functionality (selectedFiles Set, checkboxes, batch operations)
  • OpenSearch: Full-text search infrastructure for transcripts exists
  • WebSocket: Real-time communication infrastructure (backend/app/api/websockets.py)
  • LLM Integration: Will leverage same multi-provider system from Issue Feature: Implement LLM-based Transcript Summarization with Multi-Provider Support and OpenSearch Integration #51
  • Chat Interface: No existing chat UI components
  • RAG System: No retrieval augmented generation implementation
  • Chat Session Management: No backend chat session handling

Selection Infrastructure (Already Available)

// From MediaLibrary.svelte
let selectedFiles = new Set<number>();
// Complete multi-select with UI controls already implemented

Proposed Solution

User Experience Flow

  1. Gallery Selection: User selects one or more media files using existing checkbox system
  2. Chat Initiation: New "Start AI Chat" button appears when files are selected
  3. Chat Session: Modal or full-page chat interface opens with ChatGPT-like experience
  4. Context Loading: System loads selected transcripts as RAG context
  5. Interactive Chat: User asks questions, AI responds with context-aware answers
  6. Reference Links: Responses include links to specific transcript segments/timestamps

Technical Architecture

RAG System with OpenSearch

Following industry best practices for enterprise RAG implementation:

graph TD
    A[User Query] --> B[Query Processing]
    B --> C[OpenSearch Retrieval]
    C --> D[Context Ranking]
    D --> E[LLM + Context]
    E --> F[Response Generation]
    F --> G[Response with Citations]
Loading

OpenSearch RAG Implementation

{
  "query": {
    "bool": {
      "must": [
        {"terms": {"file_id": [123, 456, 789]}},
        {
          "multi_match": {
            "query": "user question about budget planning",
            "fields": ["text^2", "speaker_name", "summary"],
            "type": "best_fields",
            "fuzziness": "AUTO"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "text": {"fragment_size": 150, "number_of_fragments": 3}
    }
  },
  "_source": ["text", "speaker_name", "start_time", "end_time", "file_id", "filename"],
  "size": 10,
  "sort": [{"_score": {"order": "desc"}}]
}

Hybrid Search Strategy

  1. Semantic Search: Vector embeddings for conceptual matches
  2. Keyword Search: BM25 for exact term matching
  3. Contextual Filtering: File-specific and speaker-specific results
  4. Temporal Awareness: Time-based context understanding

Technical Implementation

Phase 1: Core Chat Infrastructure

  1. Chat Session Management

    • Session creation with selected file context
    • Conversation state management
    • Session cleanup and timeout handling
  2. RAG Service Implementation

    • OpenSearch retrieval with context ranking
    • Chunk management and relevance scoring
    • Citation tracking for response attribution
  3. Chat API Endpoints

    • Session creation and management
    • Message processing with RAG
    • Real-time streaming responses

Phase 2: Frontend Chat Interface

  1. ChatGPT-like UI Components

    • Message thread display
    • Real-time typing indicators
    • Copy/paste functionality
    • Message actions (copy, regenerate)
  2. Gallery Integration

    • Enhanced selection controls
    • Chat initiation button
    • Context file display

Phase 3: Advanced Features

  1. Enhanced RAG Capabilities

    • Multi-turn conversation context
    • Cross-reference detection
    • Temporal query understanding
  2. Chat History (Optional)

    • Session persistence
    • Chat search and retrieval
    • Export functionality

Backend Architecture

1. Chat Session Service (backend/app/services/chat_service.py)

class ChatSession:
    def __init__(self, session_id: str, user_id: int, file_ids: List[int]):
        self.session_id = session_id
        self.user_id = user_id
        self.file_ids = file_ids
        self.conversation_history = []
        self.context_cache = {}
        
    async def process_message(self, message: str) -> ChatResponse:
        # 1. Retrieve relevant context from OpenSearch
        # 2. Combine with conversation history
        # 3. Generate LLM response
        # 4. Return with citations

2. RAG Service (backend/app/services/rag_service.py)

class RAGService:
    async def retrieve_context(self, query: str, file_ids: List[int], limit: int = 10) -> List[ContextChunk]:
        # Hybrid search: semantic + keyword
        # Relevance scoring and ranking
        # Context window management
        
    async def generate_response(self, query: str, context: List[ContextChunk], history: List[ChatMessage]) -> str:
        # LLM integration with context injection
        # Citation generation
        # Response streaming

3. Enhanced OpenSearch Service (backend/app/services/opensearch_chat_service.py)

class OpenSearchChatService:
    async def hybrid_search(self, query: str, file_ids: List[int]) -> SearchResults:
        # Combine semantic and keyword search
        # File-specific filtering
        # Relevance scoring
        
    async def get_context_window(self, segment_id: str, window_size: int = 3) -> List[Segment]:
        # Retrieve surrounding context for better understanding
        # Speaker continuity
        # Temporal context

4. WebSocket Chat Handler (backend/app/api/chat_websocket.py)

@router.websocket("/ws/chat/{session_id}")
async def chat_websocket(websocket: WebSocket, session_id: str, current_user: User):
    # Real-time chat communication
    # Streaming response delivery
    # Connection management

Frontend Architecture

1. Chat Interface (frontend/src/components/ChatInterface.svelte)

<script lang="ts">
  export let sessionId: string;
  export let selectedFiles: number[];
  
  let messages: ChatMessage[] = [];
  let currentMessage = "";
  let isLoading = false;
  let wsConnection: WebSocket;
  
  // ChatGPT-like interface
  // Real-time message streaming
  // Copy/paste functionality
  // Message actions
</script>

2. Chat Message Component (frontend/src/components/ChatMessage.svelte)

<script lang="ts">
  export let message: ChatMessage;
  export let showCitations: boolean = true;
  
  // Message rendering with markdown
  // Citation links to transcript segments
  // Copy functionality
  // Regenerate option
</script>

3. Context Panel (frontend/src/components/ChatContextPanel.svelte)

<script lang="ts">
  export let selectedFiles: MediaFile[];
  export let currentContext: ContextChunk[];
  
  // Display selected files as context
  // Show current relevant segments
  // Jump to transcript functionality
</script>

4. Enhanced Gallery Integration (frontend/src/routes/MediaLibrary.svelte)

<\!-- Add to existing selection controls -->
<div class="selection-controls">
  <\!-- Existing buttons -->
  {#if selectedFiles.size > 0}
    <button 
      class="chat-btn"
      on:click={startChatSession}
      title="Start AI chat with selected {selectedFiles.size} file{selectedFiles.size === 1 ? '' : 's'}"
    >
      <ChatIcon />
      Start AI Chat ({selectedFiles.size})
    </button>
  {/if}
</div>

Database Schema

Chat Session Management (Optional - for history)

-- Optional: For persistent chat history
CREATE TABLE chat_session (
    id VARCHAR(255) PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES "user"(id) ON DELETE CASCADE,
    title VARCHAR(255),
    file_ids INTEGER[] NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP WITH TIME ZONE,
    INDEX(user_id),
    INDEX(created_at)
);

CREATE TABLE chat_message (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(255) NOT NULL REFERENCES chat_session(id) ON DELETE CASCADE,
    role VARCHAR(20) NOT NULL, -- 'user' or 'assistant'
    content TEXT NOT NULL,
    context_used JSONB, -- Citations and context chunks used
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    INDEX(session_id),
    INDEX(created_at)
);

API Endpoints

Chat Session Management

  • POST /api/chat/sessions - Create new chat session with selected files
  • GET /api/chat/sessions/{session_id} - Get session details
  • DELETE /api/chat/sessions/{session_id} - Delete session
  • GET /api/chat/sessions - List user's chat sessions (if history enabled)

Chat Interaction

  • POST /api/chat/sessions/{session_id}/messages - Send message (alternative to WebSocket)
  • GET /api/chat/sessions/{session_id}/messages - Get conversation history
  • WebSocket /ws/chat/{session_id} - Real-time chat communication

Context & Search

  • POST /api/chat/search - Search within session context
  • GET /api/chat/sessions/{session_id}/context - Get current context files
  • POST /api/chat/sessions/{session_id}/regenerate - Regenerate last response

RAG Implementation Details

Context Retrieval Strategy

async def retrieve_context(self, query: str, file_ids: List[int]) -> List[ContextChunk]:
    # 1. Hybrid Search (Keyword + Semantic)
    keyword_results = await self.opensearch.search(
        query=query, 
        file_ids=file_ids,
        search_type="keyword"
    )
    
    # 2. Semantic Search (if embeddings available)
    if self.embeddings_enabled:
        semantic_results = await self.opensearch.vector_search(
            query_embedding=await self.embed_query(query),
            file_ids=file_ids
        )
        
    # 3. Combine and rank results
    combined_results = self.rank_results(keyword_results, semantic_results)
    
    # 4. Add contextual window
    enriched_results = []
    for result in combined_results[:10]:
        context_window = await self.get_surrounding_context(result.segment_id)
        enriched_results.append(ContextChunk(
            text=result.text,
            speaker=result.speaker,
            timestamp=result.timestamp,
            file_id=result.file_id,
            context_window=context_window,
            relevance_score=result.score
        ))
    
    return enriched_results

LLM Prompt Template

CHAT_SYSTEM_PROMPT = """
You are an AI assistant helping users analyze and understand their transcript content. 
You have access to transcript segments from the user's selected media files.

IMPORTANT GUIDELINES:
1. Base your responses on the provided transcript context
2. Always cite specific speakers, timestamps, and files when referencing information
3. If information isn't in the provided context, clearly state this
4. Provide specific quotes when relevant
5. Be conversational but accurate
6. Suggest follow-up questions when appropriate

CONTEXT FILES:
{file_context}

RELEVANT TRANSCRIPT SEGMENTS:
{transcript_context}

CONVERSATION HISTORY:
{conversation_history}

USER QUESTION: {user_query}

Provide a helpful, accurate response based on the transcript content above.
"""

Citation Format

interface ChatResponse {
  content: string;
  citations: Citation[];
  suggestions: string[];
}

interface Citation {
  file_id: number;
  filename: string;
  speaker: string;
  timestamp: string;
  text: string;
  relevance_score: number;
}

UI/UX Design

ChatGPT-like Interface Features

  1. Message Threading

    • User messages aligned right
    • AI responses aligned left
    • Timestamps and status indicators
  2. Interactive Elements

    • Copy message button
    • Regenerate response option
    • Citation links to transcript segments
    • Suggested follow-up questions
  3. Real-time Features

    • Typing indicators during AI processing
    • Streaming response display
    • Connection status indicators
  4. Context Display

    • Selected files sidebar
    • Current context highlights
    • Quick jump to transcript segments

Modal vs Full-Page Design

Recommended: Modal Approach

  • Overlay on gallery view
  • Maintain context of selected files
  • Easy to close and return to selection
  • Better for quick questions

Alternative: Full-Page

  • Dedicated chat route /chat/{session_id}
  • More space for complex conversations
  • Better for extended analysis sessions

Configuration

Environment Variables

# Chat Configuration
CHAT_ENABLED=true
CHAT_SESSION_TIMEOUT=3600  # 1 hour
CHAT_MAX_CONTEXT_LENGTH=8000
CHAT_MAX_HISTORY_MESSAGES=20

# RAG Configuration
RAG_CHUNK_SIZE=500
RAG_CHUNK_OVERLAP=50
RAG_MAX_CHUNKS=10
RAG_SIMILARITY_THRESHOLD=0.7

# LLM Configuration (from Issue #51)
LLM_PROVIDER=openai
LLM_MODEL=gpt-3.5-turbo
LLM_MAX_TOKENS=2000
LLM_TEMPERATURE=0.3

# WebSocket Configuration
WS_CHAT_MAX_CONNECTIONS=100
WS_CHAT_PING_INTERVAL=30

Implementation Phases

Phase 1: Core Infrastructure (Sprint 1-2)

  • Chat session management service
  • Basic RAG service with OpenSearch integration
  • WebSocket chat handler
  • API endpoints for session management
  • Basic message processing

Acceptance Criteria:

  • Can create chat sessions with selected files
  • Basic question-answering works with transcript context
  • WebSocket communication functional
  • Context retrieval from OpenSearch accurate

Phase 2: Frontend Interface (Sprint 3)

  • ChatGPT-like UI components
  • Gallery integration with chat button
  • Real-time message display
  • Citation links to transcript segments
  • Copy/paste functionality

Acceptance Criteria:

  • Chat interface matches ChatGPT user experience
  • Messages stream in real-time
  • Citations link correctly to transcript segments
  • Copy functionality works reliably
  • UI responsive on mobile and desktop

Phase 3: Enhanced RAG (Sprint 4)

  • Hybrid search implementation
  • Context window optimization
  • Multi-turn conversation handling
  • Response quality improvements
  • Performance optimization

Acceptance Criteria:

  • Responses relevant and accurate
  • Conversation context maintained across turns
  • Search performance under 2 seconds
  • High-quality context retrieval

Phase 4: Advanced Features (Sprint 5)

  • Chat history persistence (optional)
  • Advanced search within chat
  • Export chat functionality
  • Analytics and usage tracking
  • Mobile optimization

Acceptance Criteria:

  • Chat history accessible across sessions
  • Export works in multiple formats
  • Mobile experience optimized
  • Usage analytics available

Testing Strategy

Unit Tests

  • RAG service context retrieval accuracy
  • Chat session management
  • Message processing and response generation
  • Citation generation and linking
  • WebSocket connection handling

Integration Tests

  • End-to-end chat workflow
  • OpenSearch + LLM integration
  • Frontend + Backend communication
  • Multi-file context handling
  • Real-time message streaming

Performance Tests

  • Context retrieval speed
  • Concurrent chat sessions
  • Large transcript handling
  • WebSocket connection limits
  • Memory usage optimization

User Experience Tests

  • Chat interface usability
  • Response quality assessment
  • Citation accuracy verification
  • Mobile responsiveness
  • Accessibility compliance

Security Considerations

  1. Session Security

    • Session-based authentication
    • File access verification per user
    • Rate limiting on chat requests
    • WebSocket connection limits
  2. Data Privacy

    • Context data handling
    • LLM provider data policies
    • Local vs cloud processing options
    • User consent for AI processing
  3. Input Validation

    • Message content sanitization
    • File ID validation
    • Session ownership verification
    • XSS prevention in chat interface

Success Metrics

  1. Functionality

    • 95%+ successful chat sessions
    • <3 second average response time
    • 90%+ citation accuracy
    • Support for 10+ concurrent users
  2. User Experience

    • <2 second chat interface load time
    • 95%+ message delivery success rate
    • Positive user feedback on response quality
    • High task completion rates
  3. Adoption

    • 60%+ of users try chat feature
    • 40%+ use chat regularly
    • Average 5+ messages per session
    • 80%+ user satisfaction rating

Future Enhancements

  1. Advanced AI Features

    • Multi-modal chat (text + audio)
    • Voice-to-text chat input
    • Automated question suggestions
    • Sentiment-aware responses
  2. Collaboration Features

    • Shared chat sessions
    • Team knowledge base
    • Chat templates for common queries
    • Integration with business tools
  3. Analytics & Insights

    • Chat usage analytics
    • Popular query patterns
    • Response quality metrics
    • Content gap identification

Files to Create/Modify

New Backend Files

  • backend/app/services/chat_service.py
  • backend/app/services/rag_service.py
  • backend/app/services/opensearch_chat_service.py
  • backend/app/api/chat_websocket.py
  • backend/app/api/endpoints/chat.py
  • backend/app/schemas/chat.py
  • backend/app/models/chat.py (if history enabled)

New Frontend Files

  • frontend/src/components/ChatInterface.svelte
  • frontend/src/components/ChatMessage.svelte
  • frontend/src/components/ChatContextPanel.svelte
  • frontend/src/components/ChatModal.svelte
  • frontend/src/lib/types/chat.ts
  • frontend/src/lib/services/chatService.ts
  • frontend/src/stores/chat.ts

Modified Files

  • frontend/src/routes/MediaLibrary.svelte - Add chat button to selection controls
  • backend/app/api/router.py - Include chat endpoints
  • backend/app/core/config.py - Add chat configuration
  • backend/app/services/opensearch_service.py - Extend for RAG functionality
  • database/init_db.sql - Add chat tables (if history enabled)

Priority

High Priority - This feature transforms the application from a passive transcription tool into an interactive AI assistant, significantly increasing user engagement and value proposition. It leverages existing infrastructure while providing a modern, ChatGPT-like experience that users expect from AI applications.

Dependencies

  1. Infrastructure (Already Available)

  2. External Services

    • LLM provider APIs (OpenAI, vLLM, Ollama, Claude)
    • Optional: Embedding service for semantic search
  3. Performance Requirements

    • Sufficient server resources for concurrent chat sessions
    • OpenSearch cluster capacity for real-time search
    • WebSocket connection handling

Labels

enhancement, ai-integration, chat, rag, opensearch, high-priority, backend, frontend, websocket, user-experience


Reporter: Claude Code Assistant
Epic: AI-Powered Interactive Features
Component: Chat & RAG System
Estimated Effort: 5 sprints (25-30 story points)
Related Issues: #51 (LLM Integration), #29 (OpenSearch Enhancement)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions