feat(soniox): add Soniox real-time streaming STT provider by DamianPala · Pull Request #418 · OpenWhispr/openwhispr

DamianPala · 2026-03-12T12:05:03Z

Summary

Adds Soniox as a fifth cloud STT provider. Soniox offers strong accuracy on English as well as Slavic and Eastern European languages, competitive pricing (significantly cheaper than Deepgram/AssemblyAI for comparable quality), and sub-second cold start (~250ms, no warmup connection needed).

Key additions:

Secondary language hints for mixed-language transcription (e.g. Polish + English in the same session), useful for multilingual users who code-switch
Full integration matching existing provider patterns: settings UI, onboarding, API key management, BYOK detection, icon, i18n (10 locales)

Also introduces the project's first unit tests (25 tests, Node built-in runner, zero new deps).

Changes

Core streaming (src/helpers/sonioxStreaming.js): New 375-line module. WebSocket connection to Soniox RT API, cold-start PCM buffering (3s at 16kHz), keepalive with 30s idle timeout, graceful finalization with drain. Includes text-level filler word cleanup to handle Soniox BPE tokenization artifacts.

IPC & audio (ipcHandlers.js, audioManager.js): Soniox handlers mirroring existing providers. isDestroyed() guards, cleanupAllStreaming() on app quit, defensive trim before paste.

UI (TranscriptionModelPicker.tsx, SettingsPage.tsx, OnboardingFlow.tsx): Soniox tab with API key input, model selection via registry, secondary language selector for mixed-language transcription. Unified with existing provider card pattern.

Tests (tests/helpers/sonioxStreaming.test.js): 25 tests for text processing using Node built-in test runner (zero new dependencies).

Test plan

npm test — 25 unit tests pass
Manual: Add Soniox API key in Settings → Soniox tab, select stt-rt-v4 model
Manual: Record speech with fillers ("uh", "um", "hmm") → verify they are stripped from transcript
Manual: Record speech starting with a filler → verify first letter is capitalized
Manual: Set secondary language (e.g. English + Polish), speak mixed-language → verify transcription
Manual: Verify no WebSocket leak after multiple start/stop cycles (check DevTools Network tab)
CI: Linux and Windows builds pass (build run)

Add Soniox as a fourth cloud streaming provider alongside Deepgram, AssemblyAI, and OpenAI Realtime. Includes WebSocket streaming core with cold-start buffering, full Electron IPC pipeline, settings UI with API key management, onboarding validation, and BYOK detection.

- Remove Soniox-specific render branch in TranscriptionModelPicker, use same ModelCardList + API key maps as OpenAI/Groq/Mistral - Replace hardcoded "stt-rt-v4" in UI with registry-based model selection - Add Soniox "S" icon SVG (from official wordmark) - Translate soniox_stt_rt_v4 model description in 9 locale files

When audioManager calls finalize() before disconnect(), the server has already received it. Sending it again in drainFinalTokens() caused a 3s timeout waiting for a response that would never come. Track finalize state with _finalizeSent flag and skip the redundant call.

Soniox connects in ~250ms, no benefit from keeping an idle WebSocket between dictation sessions. Avoids unnecessary Soniox session usage and potential idle timeout issues.

- Remove closeResolve (never assigned, close handler check unreachable) - Use getFullTranscript() instead of inline .map().join() duplicate - Remove soniox special-case in handleCloudProviderChange (generic path handles it)

Soniox supports multi-language transcription via language_hints array. Add a secondary language selector in the Soniox provider tab so users can hint a second language (e.g. Polish + English) for code-switching. - New sonioxSecondaryLanguage setting in store/hook - LanguageSelector dropdown in Soniox tab (inline layout) - Disabled when primary language is auto (no bias needed) - Language codes normalized to base form (en-US → en) - i18n keys added for all 10 locales

- Add 30s idle timeout to Soniox keepalive to prevent zombie WebSocket connections surviving renderer hot-reload or crash - Add cleanupAllStreaming() to close all streaming backends on app quit - Add isDestroyed() guards to Soniox and dictation IPC callbacks, matching the pattern used by Deepgram and AssemblyAI - Prefer cleanupAll() over cleanup() for backends that support it (Deepgram, AssemblyAI) to also clean warm connections and timers

Soniox sends a U+FFFD replacement character as a final token when recording silence, which gets pasted as garbage. Filter out empty, whitespace-only, and replacement character tokens in Soniox handler. Also trim finalText before the paste guard in audioManager as a defensive check for all streaming providers.

Strip hesitation fillers (uh, um, yyy, eee, mmm, hmm) from assembled transcript text. Soniox BPE tokenization splits fillers across sub-word tokens, so removal works on joined text using word boundaries. Capitalizes first letter after filler removal at sentence boundaries (.!?) and at text start, with full Unicode support (Polish ć/ó/ś, accented Latin, Cyrillic). Preserves real exclamations (Oh, Ah) and words containing filler substrings (umbrella, human, summer). Adds first test infrastructure (node:test, zero deps) with 25 tests.

DamianPala added 9 commits March 12, 2026 13:06

fix(soniox): make warmup a no-op for cold-start-only design

e3ef004

Soniox connects in ~250ms, no benefit from keeping an idle WebSocket between dictation sessions. Avoids unnecessary Soniox session usage and potential idle timeout issues.

refactor(soniox): remove dead code and redundant special-case

cd5279f

- Remove closeResolve (never assigned, close handler check unreachable) - Use getFullTranscript() instead of inline .map().join() duplicate - Remove soniox special-case in handleCloudProviderChange (generic path handles it)

DamianPala force-pushed the feat/soniox-streaming branch from 221b476 to 9d02380 Compare March 12, 2026 12:07

DamianPala marked this pull request as ready for review March 12, 2026 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(soniox): add Soniox real-time streaming STT provider#418

feat(soniox): add Soniox real-time streaming STT provider#418
DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
DamianPala:feat/soniox-streaming

DamianPala commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DamianPala commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DamianPala commented Mar 12, 2026 •

edited

Loading