Skip to content

Enhanced LLM Chat System with Source Attribution and Anti-Hallucination #76

@klappy

Description

@klappy

Problem Statement

The current LLM chat system has several critical issues:

  1. Hallucination Risk: The LLM can generate information not present in the supplied context
  2. DOM Dependency: Chat context extraction relies on visible DOM elements, causing incomplete context when tabs aren't active
  3. Inconsistent Citations: No standardized way to reference specific resources or verses
  4. Missing RC Links: Translation Words and Academy articles aren't linked for easy access
  5. Duplicate Loading: Each panel loads resources independently, causing inconsistency

Proposed Solution

Create a unified ResourcesContext that serves as the single source of truth for all translation resources, with enhanced citation capabilities and anti-hallucination measures.

Implementation Plan

Phase 1: Create Unified ResourcesContext

1.1 New ResourcesContext Structure

Create src-new/context/ResourcesContext.jsx with the following structure:

{
  resources: {
    scripture: {
      resourceId: "ult",
      title: "unfoldingWord® Literal Text",  // From manifest
      languageId: "en",
      usfm: "\\c 1\\v 1...",  // Raw USFM for USFMRenderer
      verses: {               // Parsed verses for all consumers
        "1": "In the beginning God created the heavens and the earth.",
        "2": "The earth was formless and empty..."
      }
    },
    translationNotes: {
      resourceId: "tn",
      title: "unfoldingWord® Translation Notes",
      languageId: "en",
      items: [
        {
          id: 1,
          quote: "In the beginning",
          text: "This phrase establishes...",
          occurrence: "1",
          reference: "1:1"
        }
      ]
    },
    translationQuestions: {
      resourceId: "tq",
      title: "unfoldingWord® Translation Questions",
      languageId: "en",
      items: [
        {
          id: 1,
          question: "What did God create in the beginning?",
          answer: "God created the heavens and the earth."
        }
      ]
    },
    translationWords: {
      resourceId: "tw",
      title: "unfoldingWord® Translation Words",
      languageId: "en",
      items: [
        {
          id: 1,
          term: "God",
          definition: "The supreme being who created...",
          aliases: ["deity", "creator"],
          rcLink: "rc://en/tw/dict/bible/kt/god"  // ← Functional RC link
        }
      ]
    },
    translationWordLinks: {
      resourceId: "twl",
      title: "Translation Word Links",
      languageId: "en",
      items: [
        {
          id: 1,
          word: "God",
          occurrence: "1",
          twArticleId: "god",
          rcLink: "rc://en/tw/dict/bible/kt/god"
        }
      ]
    }
  },
  metadata: {
    organization: "unfoldingWord",
    languageId: "en",
    bookId: "gen",
    chapter: 1,
    verse: 1,
    manifestInfo: {
      // Store actual manifest metadata for proper citations
    }
  },
  isLoading: false,
  error: null
}

1.2 Resource Loading Logic

  • Watch ReferenceContext for changes
  • Parallel fetch all resources using existing services
  • Parse USFM to verses using Proskomma (same logic as USFMRenderer for consistency)
  • Include manifest titles for natural language citations
  • Store RC links for clickable references

Tasks:

  • Create ResourcesContext.jsx
  • Implement resource loading with manifest metadata
  • Add USFM parsing using Proskomma (reuse USFMRenderer logic)
  • Include RC link generation for TW and TWL items
  • Add comprehensive error handling

Phase 2: Update ChatContext Integration

2.1 Remove DOM Extraction

  • Delete all extractFromPanel functions in ChatContext
  • Remove dependency on visible tabs
  • Remove DOM-based content extraction

2.2 Enhanced Context Packaging

const packageContext = () => {
  const { resources, metadata } = useResourcesContext();

  return {
    reference: `${metadata.bookId} ${metadata.chapter}:${metadata.verse}`,
    scripture: formatScripture(resources.scripture),
    translationNotes: formatNotes(resources.translationNotes),
    translationQuestions: formatQuestions(resources.translationQuestions),
    translationWords: formatWords(resources.translationWords),
    translationWordLinks: formatWordLinks(resources.translationWordLinks),
  };
};

// Formatting functions preserve verse references
const formatScripture = (scripture) => {
  return Object.entries(scripture.verses)
    .map(([v, text]) => `[${v}] ${text}`)
    .join("\n");
};

Tasks:

  • Update ChatContext to use ResourcesContext
  • Implement context formatting functions
  • Ensure verse-level referenceability
  • Add validation for complete context

Phase 3: Enhanced System Prompt

3.1 Core Anti-Hallucination Instructions

You are a Bible translation assistant. You MUST:
1. Only use information from the provided context
2. Never generate information not explicitly in the context
3. Always cite your sources using natural language with resource titles
4. Include RC links when referencing Translation Words or Academy articles
5. When you cannot answer from the provided context, explicitly state: "I don't have information about that in the current context."

CRITICAL: Do not make up, infer, or generate any information not directly present in the context.

3.2 Citation Format Instructions

Citation Examples:
- Scripture: "According to the unfoldingWord® Literal Text, verse 1 states..."
- Notes: "The unfoldingWord® Translation Notes explain that..."
- Questions: "The unfoldingWord® Translation Questions ask..."
- Words: "The term 'God' [rc://en/tw/dict/bible/kt/god] is defined as..."

Always include the resource title from the manifest when citing.
Resource titles available: {dynamically include actual manifest titles}

Tasks:

  • Update system prompt with anti-hallucination instructions
  • Add natural language citation requirements
  • Include resource titles dynamically from context
  • Add RC link formatting instructions

Phase 4: Component Updates

4.1 Update Panel Components

Transform panels from data loaders to pure presentational components:

  • TranslationNotesPanel: Remove loading logic, use useResourcesContext()
  • TranslationQuestionsPanel: Remove loading logic, use useResourcesContext()
  • TranslationWordsPanel: Remove loading logic, use useResourcesContext()
  • TWLPanel: Remove loading logic, use useResourcesContext()
  • Keep existing rendering logic intact

4.2 LLMChatPanel Enhancement

// Process RC links in responses
import { processRcLinks } from "../utils/rcLinkUtils";

const processedResponse = processRcLinks(response.text, handleRcLinkClick, languageId);

Tasks:

  • Update LLMChatPanel to process RC links in responses
  • Make Translation Words and Academy links clickable
  • Ensure proper RC link handling integration

Phase 5: Server-Side Updates

5.1 Update chat.js Endpoint

  • Include enhanced citation instructions in system prompt
  • Add validation for context-only responses
  • Log citations for debugging purposes
  • Improve error handling for incomplete context

5.2 Response Processing

  • Parse LLM responses for RC links
  • Validate citations against provided context
  • Maintain markdown formatting in responses

Phase 6: Testing Strategy

6.1 Unit Tests

  • ResourcesContext loading logic
  • Citation formatting functions
  • RC link processing
  • Context packaging accuracy
  • Anti-hallucination prompt effectiveness

6.2 Integration Tests

  • Reference change triggers complete resource loading
  • Chat context generation includes all resources
  • Citation accuracy and consistency
  • RC link functionality
  • Error handling for missing resources

6.3 E2E Tests

  • User changes reference → Resources load → Chat has complete context
  • User sends chat message → Receives properly cited response
  • User clicks RC link → Opens correct article
  • Test with multiple resource types and languages

Success Criteria

  1. Zero Hallucination: LLM responses only contain information from provided context
  2. 100% Citation Coverage: Every factual claim includes proper source attribution
  3. Functional RC Links: Translation Words and Academy links open correct articles
  4. Performance Maintained: No degradation in resource loading times
  5. Data Consistency: All components display identical resource data
  6. Complete Context: Chat always has access to all loaded resources regardless of UI state

Implementation Notes

Technical Considerations

  1. Proskomma Integration: Reuse existing USFMRenderer parsing logic for consistency
  2. Manifest Integration: Use actual resource IDs and titles from manifest files
  3. RC Link Standard: Adhere to existing RC link specification
  4. Caching Strategy: Leverage existing service caching mechanisms
  5. Error Boundaries: Implement proper error handling for resource loading failures

Dependencies

  • Existing manifestService.js for resource metadata
  • Existing rcLinkUtils.jsx for link processing
  • Existing Proskomma integration in USFMRenderer
  • All current resource services (tnService, tqService, etc.)

Related Issues/PRs

  • References existing RC link implementation
  • Builds on current manifest service architecture
  • Extends USFMRenderer Proskomma integration

Definition of Done

  • All tests pass (unit, integration, E2E)
  • LLM responses only use provided context (verified through testing)
  • All resources include proper citations with manifest titles
  • RC links are functional for TW and TA resources
  • No performance regression in resource loading
  • Documentation updated for new architecture
  • Code review completed
  • QA testing completed

Estimated Timeline: 7-9 hours of implementation across all phases
Priority: High (addresses critical hallucination and citation issues)
Labels: enhancement, llm-chat, architecture, citations, anti-hallucination

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions