We'll implement a real offline transcription feature using Whisper.js through the @xenova/transformers package. This will replace the current dummy function in src/frontend/components/Chat/ChatInput.tsx. We'll also add a new settings section to allow users to choose between different model sizes.
src/frontend/components/Settings/SpeechRecognitionSettings.tsx- Settings UI for speech model optionssrc/frontend/services/transcription/index.ts- Service to handle transcription logicsrc/frontend/services/transcription/whisperModel.ts- Whisper model managementsrc/shared/types/settings/speech.ts- Type definitions for speech settings
src/frontend/components/Chat/ChatInput.tsx- Update audio recording logicsrc/frontend/components/Settings/index.tsx- Add speech recognition settings sectionpackage.json- Add @xenova/transformers dependencysrc/shared/types/storage/storage.ts- Add speech settings types
npm install @xenova/transformersIn src/shared/types/settings/speech.ts:
export type WhisperModelSize = 'tiny' | 'base' | 'small' | 'medium' | 'large';
export interface SpeechRecognitionSettings {
enabled: boolean;
modelSize: WhisperModelSize;
autoDownload: boolean;
language?: string; // Optional language specification
}Modify src/shared/types/storage/storage.ts to include speech settings.
In src/frontend/services/transcription/whisperModel.ts:
- Implement model downloading and caching
- Handle model loading states
- Create transcription function
In src/frontend/services/transcription/index.ts:
- Create a wrapper service for transcription logic
- Handle error cases and provide status updates
In src/frontend/components/Settings/SpeechRecognitionSettings.tsx:
- Create UI for selecting model size
- Add options for auto-download
- Include information about model sizes and requirements
Modify src/frontend/components/Settings/index.tsx to include the new Speech Recognition settings panel.
Modify src/frontend/components/Chat/ChatInput.tsx to:
- Use the Whisper transcription service
- Handle loading states
- Display transcription results
The settings component will include:
- Model size selection (tiny, base, small, medium, large)
- Information about each model size and its requirements
- Option to pre-download models
- Progress indicators for downloads
The service will:
- Lazily load models when needed
- Cache models in IndexedDB
- Provide progress feedback during downloads
- Handle transcription processing
The updated component will:
- Use selected model size from settings
- Show appropriate loading indicators during transcription
- Handle errors gracefully
- Update UI based on transcription state
- Show download progress when models are being downloaded
- Provide clear feedback during transcription processing
- Show helpful error messages if transcription fails
- Allow cancellation of long-running operations
- Support for multiple languages
- Fine-tuning options for accuracy
- Transcription history
- Ability to edit transcriptions before adding to chat