feat(soniox): add Soniox real-time streaming STT provider#418
Open
DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
Open
feat(soniox): add Soniox real-time streaming STT provider#418DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
Conversation
Add Soniox as a fourth cloud streaming provider alongside Deepgram, AssemblyAI, and OpenAI Realtime. Includes WebSocket streaming core with cold-start buffering, full Electron IPC pipeline, settings UI with API key management, onboarding validation, and BYOK detection.
- Remove Soniox-specific render branch in TranscriptionModelPicker, use same ModelCardList + API key maps as OpenAI/Groq/Mistral - Replace hardcoded "stt-rt-v4" in UI with registry-based model selection - Add Soniox "S" icon SVG (from official wordmark) - Translate soniox_stt_rt_v4 model description in 9 locale files
When audioManager calls finalize() before disconnect(), the server has already received it. Sending it again in drainFinalTokens() caused a 3s timeout waiting for a response that would never come. Track finalize state with _finalizeSent flag and skip the redundant call.
Soniox connects in ~250ms, no benefit from keeping an idle WebSocket between dictation sessions. Avoids unnecessary Soniox session usage and potential idle timeout issues.
- Remove closeResolve (never assigned, close handler check unreachable) - Use getFullTranscript() instead of inline .map().join() duplicate - Remove soniox special-case in handleCloudProviderChange (generic path handles it)
Soniox supports multi-language transcription via language_hints array. Add a secondary language selector in the Soniox provider tab so users can hint a second language (e.g. Polish + English) for code-switching. - New sonioxSecondaryLanguage setting in store/hook - LanguageSelector dropdown in Soniox tab (inline layout) - Disabled when primary language is auto (no bias needed) - Language codes normalized to base form (en-US → en) - i18n keys added for all 10 locales
- Add 30s idle timeout to Soniox keepalive to prevent zombie WebSocket connections surviving renderer hot-reload or crash - Add cleanupAllStreaming() to close all streaming backends on app quit - Add isDestroyed() guards to Soniox and dictation IPC callbacks, matching the pattern used by Deepgram and AssemblyAI - Prefer cleanupAll() over cleanup() for backends that support it (Deepgram, AssemblyAI) to also clean warm connections and timers
Soniox sends a U+FFFD replacement character as a final token when recording silence, which gets pasted as garbage. Filter out empty, whitespace-only, and replacement character tokens in Soniox handler. Also trim finalText before the paste guard in audioManager as a defensive check for all streaming providers.
Strip hesitation fillers (uh, um, yyy, eee, mmm, hmm) from assembled transcript text. Soniox BPE tokenization splits fillers across sub-word tokens, so removal works on joined text using word boundaries. Capitalizes first letter after filler removal at sentence boundaries (.!?) and at text start, with full Unicode support (Polish ć/ó/ś, accented Latin, Cyrillic). Preserves real exclamations (Oh, Ah) and words containing filler substrings (umbrella, human, summer). Adds first test infrastructure (node:test, zero deps) with 25 tests.
221b476 to
9d02380
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Soniox as a fifth cloud STT provider. Soniox offers strong accuracy on English as well as Slavic and Eastern European languages, competitive pricing (significantly cheaper than Deepgram/AssemblyAI for comparable quality), and sub-second cold start (~250ms, no warmup connection needed).
Key additions:
Also introduces the project's first unit tests (25 tests, Node built-in runner, zero new deps).
Changes
Core streaming (
src/helpers/sonioxStreaming.js): New 375-line module. WebSocket connection to Soniox RT API, cold-start PCM buffering (3s at 16kHz), keepalive with 30s idle timeout, graceful finalization with drain. Includes text-level filler word cleanup to handle Soniox BPE tokenization artifacts.IPC & audio (
ipcHandlers.js,audioManager.js): Soniox handlers mirroring existing providers.isDestroyed()guards,cleanupAllStreaming()on app quit, defensive trim before paste.UI (
TranscriptionModelPicker.tsx,SettingsPage.tsx,OnboardingFlow.tsx): Soniox tab with API key input, model selection via registry, secondary language selector for mixed-language transcription. Unified with existing provider card pattern.Tests (
tests/helpers/sonioxStreaming.test.js): 25 tests for text processing using Node built-in test runner (zero new dependencies).Test plan
npm test— 25 unit tests pass