forked from unfoldingWord-box3/translation-helps
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
anti-hallucinationarchitecturecitationsenhancementNew feature or requestNew feature or requestllm-chat
Description
Problem Statement
The current LLM chat system has several critical issues:
- Hallucination Risk: The LLM can generate information not present in the supplied context
- DOM Dependency: Chat context extraction relies on visible DOM elements, causing incomplete context when tabs aren't active
- Inconsistent Citations: No standardized way to reference specific resources or verses
- Missing RC Links: Translation Words and Academy articles aren't linked for easy access
- Duplicate Loading: Each panel loads resources independently, causing inconsistency
Proposed Solution
Create a unified ResourcesContext that serves as the single source of truth for all translation resources, with enhanced citation capabilities and anti-hallucination measures.
Implementation Plan
Phase 1: Create Unified ResourcesContext
1.1 New ResourcesContext Structure
Create src-new/context/ResourcesContext.jsx with the following structure:
{
resources: {
scripture: {
resourceId: "ult",
title: "unfoldingWord® Literal Text", // From manifest
languageId: "en",
usfm: "\\c 1\\v 1...", // Raw USFM for USFMRenderer
verses: { // Parsed verses for all consumers
"1": "In the beginning God created the heavens and the earth.",
"2": "The earth was formless and empty..."
}
},
translationNotes: {
resourceId: "tn",
title: "unfoldingWord® Translation Notes",
languageId: "en",
items: [
{
id: 1,
quote: "In the beginning",
text: "This phrase establishes...",
occurrence: "1",
reference: "1:1"
}
]
},
translationQuestions: {
resourceId: "tq",
title: "unfoldingWord® Translation Questions",
languageId: "en",
items: [
{
id: 1,
question: "What did God create in the beginning?",
answer: "God created the heavens and the earth."
}
]
},
translationWords: {
resourceId: "tw",
title: "unfoldingWord® Translation Words",
languageId: "en",
items: [
{
id: 1,
term: "God",
definition: "The supreme being who created...",
aliases: ["deity", "creator"],
rcLink: "rc://en/tw/dict/bible/kt/god" // ← Functional RC link
}
]
},
translationWordLinks: {
resourceId: "twl",
title: "Translation Word Links",
languageId: "en",
items: [
{
id: 1,
word: "God",
occurrence: "1",
twArticleId: "god",
rcLink: "rc://en/tw/dict/bible/kt/god"
}
]
}
},
metadata: {
organization: "unfoldingWord",
languageId: "en",
bookId: "gen",
chapter: 1,
verse: 1,
manifestInfo: {
// Store actual manifest metadata for proper citations
}
},
isLoading: false,
error: null
}1.2 Resource Loading Logic
- Watch ReferenceContext for changes
- Parallel fetch all resources using existing services
- Parse USFM to verses using Proskomma (same logic as USFMRenderer for consistency)
- Include manifest titles for natural language citations
- Store RC links for clickable references
Tasks:
- Create ResourcesContext.jsx
- Implement resource loading with manifest metadata
- Add USFM parsing using Proskomma (reuse USFMRenderer logic)
- Include RC link generation for TW and TWL items
- Add comprehensive error handling
Phase 2: Update ChatContext Integration
2.1 Remove DOM Extraction
- Delete all
extractFromPanelfunctions in ChatContext - Remove dependency on visible tabs
- Remove DOM-based content extraction
2.2 Enhanced Context Packaging
const packageContext = () => {
const { resources, metadata } = useResourcesContext();
return {
reference: `${metadata.bookId} ${metadata.chapter}:${metadata.verse}`,
scripture: formatScripture(resources.scripture),
translationNotes: formatNotes(resources.translationNotes),
translationQuestions: formatQuestions(resources.translationQuestions),
translationWords: formatWords(resources.translationWords),
translationWordLinks: formatWordLinks(resources.translationWordLinks),
};
};
// Formatting functions preserve verse references
const formatScripture = (scripture) => {
return Object.entries(scripture.verses)
.map(([v, text]) => `[${v}] ${text}`)
.join("\n");
};Tasks:
- Update ChatContext to use ResourcesContext
- Implement context formatting functions
- Ensure verse-level referenceability
- Add validation for complete context
Phase 3: Enhanced System Prompt
3.1 Core Anti-Hallucination Instructions
You are a Bible translation assistant. You MUST:
1. Only use information from the provided context
2. Never generate information not explicitly in the context
3. Always cite your sources using natural language with resource titles
4. Include RC links when referencing Translation Words or Academy articles
5. When you cannot answer from the provided context, explicitly state: "I don't have information about that in the current context."
CRITICAL: Do not make up, infer, or generate any information not directly present in the context.
3.2 Citation Format Instructions
Citation Examples:
- Scripture: "According to the unfoldingWord® Literal Text, verse 1 states..."
- Notes: "The unfoldingWord® Translation Notes explain that..."
- Questions: "The unfoldingWord® Translation Questions ask..."
- Words: "The term 'God' [rc://en/tw/dict/bible/kt/god] is defined as..."
Always include the resource title from the manifest when citing.
Resource titles available: {dynamically include actual manifest titles}
Tasks:
- Update system prompt with anti-hallucination instructions
- Add natural language citation requirements
- Include resource titles dynamically from context
- Add RC link formatting instructions
Phase 4: Component Updates
4.1 Update Panel Components
Transform panels from data loaders to pure presentational components:
- TranslationNotesPanel: Remove loading logic, use
useResourcesContext() - TranslationQuestionsPanel: Remove loading logic, use
useResourcesContext() - TranslationWordsPanel: Remove loading logic, use
useResourcesContext() - TWLPanel: Remove loading logic, use
useResourcesContext() - Keep existing rendering logic intact
4.2 LLMChatPanel Enhancement
// Process RC links in responses
import { processRcLinks } from "../utils/rcLinkUtils";
const processedResponse = processRcLinks(response.text, handleRcLinkClick, languageId);Tasks:
- Update LLMChatPanel to process RC links in responses
- Make Translation Words and Academy links clickable
- Ensure proper RC link handling integration
Phase 5: Server-Side Updates
5.1 Update chat.js Endpoint
- Include enhanced citation instructions in system prompt
- Add validation for context-only responses
- Log citations for debugging purposes
- Improve error handling for incomplete context
5.2 Response Processing
- Parse LLM responses for RC links
- Validate citations against provided context
- Maintain markdown formatting in responses
Phase 6: Testing Strategy
6.1 Unit Tests
- ResourcesContext loading logic
- Citation formatting functions
- RC link processing
- Context packaging accuracy
- Anti-hallucination prompt effectiveness
6.2 Integration Tests
- Reference change triggers complete resource loading
- Chat context generation includes all resources
- Citation accuracy and consistency
- RC link functionality
- Error handling for missing resources
6.3 E2E Tests
- User changes reference → Resources load → Chat has complete context
- User sends chat message → Receives properly cited response
- User clicks RC link → Opens correct article
- Test with multiple resource types and languages
Success Criteria
- Zero Hallucination: LLM responses only contain information from provided context
- 100% Citation Coverage: Every factual claim includes proper source attribution
- Functional RC Links: Translation Words and Academy links open correct articles
- Performance Maintained: No degradation in resource loading times
- Data Consistency: All components display identical resource data
- Complete Context: Chat always has access to all loaded resources regardless of UI state
Implementation Notes
Technical Considerations
- Proskomma Integration: Reuse existing USFMRenderer parsing logic for consistency
- Manifest Integration: Use actual resource IDs and titles from manifest files
- RC Link Standard: Adhere to existing RC link specification
- Caching Strategy: Leverage existing service caching mechanisms
- Error Boundaries: Implement proper error handling for resource loading failures
Dependencies
- Existing manifestService.js for resource metadata
- Existing rcLinkUtils.jsx for link processing
- Existing Proskomma integration in USFMRenderer
- All current resource services (tnService, tqService, etc.)
Related Issues/PRs
- References existing RC link implementation
- Builds on current manifest service architecture
- Extends USFMRenderer Proskomma integration
Definition of Done
- All tests pass (unit, integration, E2E)
- LLM responses only use provided context (verified through testing)
- All resources include proper citations with manifest titles
- RC links are functional for TW and TA resources
- No performance regression in resource loading
- Documentation updated for new architecture
- Code review completed
- QA testing completed
Estimated Timeline: 7-9 hours of implementation across all phases
Priority: High (addresses critical hallucination and citation issues)
Labels: enhancement, llm-chat, architecture, citations, anti-hallucination
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
anti-hallucinationarchitecturecitationsenhancementNew feature or requestNew feature or requestllm-chat