Enhanced LLM Chat System with Source Attribution and Anti-Hallucination

## Problem Statement

The current LLM chat system has several critical issues:

1. **Hallucination Risk**: The LLM can generate information not present in the supplied context
2. **DOM Dependency**: Chat context extraction relies on visible DOM elements, causing incomplete context when tabs aren't active
3. **Inconsistent Citations**: No standardized way to reference specific resources or verses
4. **Missing RC Links**: Translation Words and Academy articles aren't linked for easy access
5. **Duplicate Loading**: Each panel loads resources independently, causing inconsistency

## Proposed Solution

Create a unified ResourcesContext that serves as the single source of truth for all translation resources, with enhanced citation capabilities and anti-hallucination measures.

## Implementation Plan

### Phase 1: Create Unified ResourcesContext

#### 1.1 New ResourcesContext Structure

Create `src-new/context/ResourcesContext.jsx` with the following structure:

```javascript
{
  resources: {
    scripture: {
      resourceId: "ult",
      title: "unfoldingWord® Literal Text",  // From manifest
      languageId: "en",
      usfm: "\\c 1\\v 1...",  // Raw USFM for USFMRenderer
      verses: {               // Parsed verses for all consumers
        "1": "In the beginning God created the heavens and the earth.",
        "2": "The earth was formless and empty..."
      }
    },
    translationNotes: {
      resourceId: "tn",
      title: "unfoldingWord® Translation Notes",
      languageId: "en",
      items: [
        {
          id: 1,
          quote: "In the beginning",
          text: "This phrase establishes...",
          occurrence: "1",
          reference: "1:1"
        }
      ]
    },
    translationQuestions: {
      resourceId: "tq",
      title: "unfoldingWord® Translation Questions",
      languageId: "en",
      items: [
        {
          id: 1,
          question: "What did God create in the beginning?",
          answer: "God created the heavens and the earth."
        }
      ]
    },
    translationWords: {
      resourceId: "tw",
      title: "unfoldingWord® Translation Words",
      languageId: "en",
      items: [
        {
          id: 1,
          term: "God",
          definition: "The supreme being who created...",
          aliases: ["deity", "creator"],
          rcLink: "rc://en/tw/dict/bible/kt/god"  // ← Functional RC link
        }
      ]
    },
    translationWordLinks: {
      resourceId: "twl",
      title: "Translation Word Links",
      languageId: "en",
      items: [
        {
          id: 1,
          word: "God",
          occurrence: "1",
          twArticleId: "god",
          rcLink: "rc://en/tw/dict/bible/kt/god"
        }
      ]
    }
  },
  metadata: {
    organization: "unfoldingWord",
    languageId: "en",
    bookId: "gen",
    chapter: 1,
    verse: 1,
    manifestInfo: {
      // Store actual manifest metadata for proper citations
    }
  },
  isLoading: false,
  error: null
}
```

#### 1.2 Resource Loading Logic

- Watch ReferenceContext for changes
- Parallel fetch all resources using existing services
- Parse USFM to verses using Proskomma (same logic as USFMRenderer for consistency)
- Include manifest titles for natural language citations
- Store RC links for clickable references

**Tasks:**

- [ ] Create ResourcesContext.jsx
- [ ] Implement resource loading with manifest metadata
- [ ] Add USFM parsing using Proskomma (reuse USFMRenderer logic)
- [ ] Include RC link generation for TW and TWL items
- [ ] Add comprehensive error handling

### Phase 2: Update ChatContext Integration

#### 2.1 Remove DOM Extraction

- [ ] Delete all `extractFromPanel` functions in ChatContext
- [ ] Remove dependency on visible tabs
- [ ] Remove DOM-based content extraction

#### 2.2 Enhanced Context Packaging

```javascript
const packageContext = () => {
  const { resources, metadata } = useResourcesContext();

  return {
    reference: `${metadata.bookId} ${metadata.chapter}:${metadata.verse}`,
    scripture: formatScripture(resources.scripture),
    translationNotes: formatNotes(resources.translationNotes),
    translationQuestions: formatQuestions(resources.translationQuestions),
    translationWords: formatWords(resources.translationWords),
    translationWordLinks: formatWordLinks(resources.translationWordLinks),
  };
};

// Formatting functions preserve verse references
const formatScripture = (scripture) => {
  return Object.entries(scripture.verses)
    .map(([v, text]) => `[${v}] ${text}`)
    .join("\n");
};
```

**Tasks:**

- [ ] Update ChatContext to use ResourcesContext
- [ ] Implement context formatting functions
- [ ] Ensure verse-level referenceability
- [ ] Add validation for complete context

### Phase 3: Enhanced System Prompt

#### 3.1 Core Anti-Hallucination Instructions

```
You are a Bible translation assistant. You MUST:
1. Only use information from the provided context
2. Never generate information not explicitly in the context
3. Always cite your sources using natural language with resource titles
4. Include RC links when referencing Translation Words or Academy articles
5. When you cannot answer from the provided context, explicitly state: "I don't have information about that in the current context."

CRITICAL: Do not make up, infer, or generate any information not directly present in the context.
```

#### 3.2 Citation Format Instructions

```
Citation Examples:
- Scripture: "According to the unfoldingWord® Literal Text, verse 1 states..."
- Notes: "The unfoldingWord® Translation Notes explain that..."
- Questions: "The unfoldingWord® Translation Questions ask..."
- Words: "The term 'God' [rc://en/tw/dict/bible/kt/god] is defined as..."

Always include the resource title from the manifest when citing.
Resource titles available: {dynamically include actual manifest titles}
```

**Tasks:**

- [ ] Update system prompt with anti-hallucination instructions
- [ ] Add natural language citation requirements
- [ ] Include resource titles dynamically from context
- [ ] Add RC link formatting instructions

### Phase 4: Component Updates

#### 4.1 Update Panel Components

Transform panels from data loaders to pure presentational components:

- [ ] **TranslationNotesPanel**: Remove loading logic, use `useResourcesContext()`
- [ ] **TranslationQuestionsPanel**: Remove loading logic, use `useResourcesContext()`
- [ ] **TranslationWordsPanel**: Remove loading logic, use `useResourcesContext()`
- [ ] **TWLPanel**: Remove loading logic, use `useResourcesContext()`
- [ ] Keep existing rendering logic intact

#### 4.2 LLMChatPanel Enhancement

```javascript
// Process RC links in responses
import { processRcLinks } from "../utils/rcLinkUtils";

const processedResponse = processRcLinks(response.text, handleRcLinkClick, languageId);
```

**Tasks:**

- [ ] Update LLMChatPanel to process RC links in responses
- [ ] Make Translation Words and Academy links clickable
- [ ] Ensure proper RC link handling integration

### Phase 5: Server-Side Updates

#### 5.1 Update chat.js Endpoint

- [ ] Include enhanced citation instructions in system prompt
- [ ] Add validation for context-only responses
- [ ] Log citations for debugging purposes
- [ ] Improve error handling for incomplete context

#### 5.2 Response Processing

- [ ] Parse LLM responses for RC links
- [ ] Validate citations against provided context
- [ ] Maintain markdown formatting in responses

### Phase 6: Testing Strategy

#### 6.1 Unit Tests

- [ ] ResourcesContext loading logic
- [ ] Citation formatting functions
- [ ] RC link processing
- [ ] Context packaging accuracy
- [ ] Anti-hallucination prompt effectiveness

#### 6.2 Integration Tests

- [ ] Reference change triggers complete resource loading
- [ ] Chat context generation includes all resources
- [ ] Citation accuracy and consistency
- [ ] RC link functionality
- [ ] Error handling for missing resources

#### 6.3 E2E Tests

- [ ] User changes reference → Resources load → Chat has complete context
- [ ] User sends chat message → Receives properly cited response
- [ ] User clicks RC link → Opens correct article
- [ ] Test with multiple resource types and languages

## Success Criteria

1. **Zero Hallucination**: LLM responses only contain information from provided context
2. **100% Citation Coverage**: Every factual claim includes proper source attribution
3. **Functional RC Links**: Translation Words and Academy links open correct articles
4. **Performance Maintained**: No degradation in resource loading times
5. **Data Consistency**: All components display identical resource data
6. **Complete Context**: Chat always has access to all loaded resources regardless of UI state

## Implementation Notes

### Technical Considerations

1. **Proskomma Integration**: Reuse existing USFMRenderer parsing logic for consistency
2. **Manifest Integration**: Use actual resource IDs and titles from manifest files
3. **RC Link Standard**: Adhere to existing RC link specification
4. **Caching Strategy**: Leverage existing service caching mechanisms
5. **Error Boundaries**: Implement proper error handling for resource loading failures

### Dependencies

- Existing manifestService.js for resource metadata
- Existing rcLinkUtils.jsx for link processing
- Existing Proskomma integration in USFMRenderer
- All current resource services (tnService, tqService, etc.)

## Related Issues/PRs

- References existing RC link implementation
- Builds on current manifest service architecture
- Extends USFMRenderer Proskomma integration

## Definition of Done

- [ ] All tests pass (unit, integration, E2E)
- [ ] LLM responses only use provided context (verified through testing)
- [ ] All resources include proper citations with manifest titles
- [ ] RC links are functional for TW and TA resources
- [ ] No performance regression in resource loading
- [ ] Documentation updated for new architecture
- [ ] Code review completed
- [ ] QA testing completed

---

**Estimated Timeline**: 7-9 hours of implementation across all phases
**Priority**: High (addresses critical hallucination and citation issues)
**Labels**: enhancement, llm-chat, architecture, citations, anti-hallucination

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced LLM Chat System with Source Attribution and Anti-Hallucination #76

Problem Statement

Proposed Solution

Implementation Plan

Phase 1: Create Unified ResourcesContext

1.1 New ResourcesContext Structure

1.2 Resource Loading Logic

Phase 2: Update ChatContext Integration

2.1 Remove DOM Extraction

2.2 Enhanced Context Packaging

Phase 3: Enhanced System Prompt

3.1 Core Anti-Hallucination Instructions

3.2 Citation Format Instructions

Phase 4: Component Updates

4.1 Update Panel Components

4.2 LLMChatPanel Enhancement

Phase 5: Server-Side Updates

5.1 Update chat.js Endpoint

5.2 Response Processing

Phase 6: Testing Strategy

6.1 Unit Tests

6.2 Integration Tests

6.3 E2E Tests

Success Criteria

Implementation Notes

Technical Considerations

Dependencies

Related Issues/PRs

Definition of Done

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhanced LLM Chat System with Source Attribution and Anti-Hallucination #76

Description

Problem Statement

Proposed Solution

Implementation Plan

Phase 1: Create Unified ResourcesContext

1.1 New ResourcesContext Structure

1.2 Resource Loading Logic

Phase 2: Update ChatContext Integration

2.1 Remove DOM Extraction

2.2 Enhanced Context Packaging

Phase 3: Enhanced System Prompt

3.1 Core Anti-Hallucination Instructions

3.2 Citation Format Instructions

Phase 4: Component Updates

4.1 Update Panel Components

4.2 LLMChatPanel Enhancement

Phase 5: Server-Side Updates

5.1 Update chat.js Endpoint

5.2 Response Processing

Phase 6: Testing Strategy

6.1 Unit Tests

6.2 Integration Tests

6.3 E2E Tests

Success Criteria

Implementation Notes

Technical Considerations

Dependencies

Related Issues/PRs

Definition of Done

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions