Skip to content

Conversation

@maxkulish
Copy link

Summary

  • Implement multi-language support allowing users to configure multiple spoken languages
  • Add "Single Pass + Intelligent Retry" strategy: Whisper auto-detects language, retries with fallback if detected language not in user's list
  • Pass language context to Reasoning Service for semantic correction of acoustically similar languages (e.g., Russian vs Ukrainian)

Changes

Phase 1: Core Infrastructure

  • Add LanguageDecision, LanguageContext, LanguageSettings types
  • Add helper functions: SELECTABLE_LANGUAGES, getLanguageName, isValidLanguageCode
  • Update useSettings.ts with selectedLanguages and defaultLanguage state
  • Add migration from old preferredLanguage format

Phase 2: Whisper Integration

  • Update parseWhisperResult to extract detectedLanguage and detectedConfidence from multiple JSON locations

Phase 3: AudioManager Refactor

  • Add resolveLanguage, getLanguageSettings, normalizeLanguageCode helpers
  • Implement Single Pass + Retry in processWithLocalWhisper
  • Update processWithOpenAIAPI for multi-language support
  • Pass languageContext through transcription pipeline

Phase 4: Reasoning Service

  • Add buildSystemPrompt method for language-aware prompts
  • Multi-language prompts include ALL candidate languages for semantic correction
  • Update all provider methods (OpenAI, Anthropic, Local, Gemini, Groq)

Phase 5: UI Components

  • Create MultiLanguageSelector component with search, chips, default star indicator
  • Mode indicator showing auto/single/multi with descriptions
  • Soft warning when >4 languages selected
  • Update SettingsPage to use new selector

Phase 6: Testing & Polish

  • Update OnboardingFlow to use new language system
  • Keep onboarding simple (single language, configure more in settings)

Test plan

  • Verify new users can select a primary language during onboarding
  • Verify existing users' preferredLanguage migrates correctly
  • Test MultiLanguageSelector in Settings page
  • Test transcription with single language selected
  • Test transcription with multiple languages (verify auto-detection)
  • Test fallback behavior when detected language not in list
  • Verify language context reaches Reasoning Service
  • Test with acoustically similar languages (e.g., Russian/Ukrainian)

- Add SELECTABLE_LANGUAGES, getLanguageName, isValidLanguageCode, MAX_RECOMMENDED_LANGUAGES to languages.ts
- Create src/types/language.ts with LanguageDecision, LanguageContext, LanguageSettings interfaces
- Add selectedLanguages and defaultLanguage to useSettings.ts
- Add migration logic from old preferredLanguage to new multi-language format
- Update parseWhisperResult to extract detectedLanguage from multiple JSON locations
- Extract detectedConfidence from language_probs when available
- Add debug logging for language detection
- Return language info in all result paths (success, failure, text-only)
- Add resolveLanguage, getLanguageSettings, normalizeLanguageCode helper functions
- Implement Single Pass + Retry on Miss strategy in processWithLocalWhisper
- Update processWithOpenAIAPI to use new language settings and verbose_json
- Add languageContext parameter to processTranscription and processWithReasoningModel
- Support single, multi, and auto language modes
- Log language detection and decision info for debugging
- Add buildSystemPrompt method for language-aware system prompts
- Update processText signature to accept LanguageContext
- Update all provider methods (OpenAI, Anthropic, Local, Gemini, Groq) to use language context
- Multi-language prompts include all candidate languages for semantic correction
- Single-language prompts are simple and specific
- Auto mode uses generic multilingual prompt
- Create MultiLanguageSelector with search, chips, and default language star
- Display mode indicator (auto/single/multi) with descriptions
- Add soft warning when >4 languages selected
- Quick select buttons for common languages when empty
- Update SettingsPage to use new selector with selectedLanguages/defaultLanguage
- Fix TypeScript null checks in ReasoningService for IPC results
- Replace preferredLanguage with selectedLanguages/defaultLanguage
- Update language selector to set both new settings on change
- Keep onboarding simple (single language selection, more in settings)
- Update completeOnboarding to persist new language settings
Implements comprehensive command system for managing OpenWhispr
development workflow with design document creation, review,
finalization, and implementation planning capabilities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant