Skip to content

feat(soniox): add Soniox real-time streaming STT provider#418

Open
DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
DamianPala:feat/soniox-streaming
Open

feat(soniox): add Soniox real-time streaming STT provider#418
DamianPala wants to merge 9 commits intoOpenWhispr:mainfrom
DamianPala:feat/soniox-streaming

Conversation

@DamianPala
Copy link

@DamianPala DamianPala commented Mar 12, 2026

Summary

Adds Soniox as a fifth cloud STT provider. Soniox offers strong accuracy on English as well as Slavic and Eastern European languages, competitive pricing (significantly cheaper than Deepgram/AssemblyAI for comparable quality), and sub-second cold start (~250ms, no warmup connection needed).

Key additions:

  • Secondary language hints for mixed-language transcription (e.g. Polish + English in the same session), useful for multilingual users who code-switch
  • Full integration matching existing provider patterns: settings UI, onboarding, API key management, BYOK detection, icon, i18n (10 locales)

Also introduces the project's first unit tests (25 tests, Node built-in runner, zero new deps).

Changes

Core streaming (src/helpers/sonioxStreaming.js): New 375-line module. WebSocket connection to Soniox RT API, cold-start PCM buffering (3s at 16kHz), keepalive with 30s idle timeout, graceful finalization with drain. Includes text-level filler word cleanup to handle Soniox BPE tokenization artifacts.

IPC & audio (ipcHandlers.js, audioManager.js): Soniox handlers mirroring existing providers. isDestroyed() guards, cleanupAllStreaming() on app quit, defensive trim before paste.

UI (TranscriptionModelPicker.tsx, SettingsPage.tsx, OnboardingFlow.tsx): Soniox tab with API key input, model selection via registry, secondary language selector for mixed-language transcription. Unified with existing provider card pattern.

Tests (tests/helpers/sonioxStreaming.test.js): 25 tests for text processing using Node built-in test runner (zero new dependencies).

Test plan

  • npm test — 25 unit tests pass
  • Manual: Add Soniox API key in Settings → Soniox tab, select stt-rt-v4 model
  • Manual: Record speech with fillers ("uh", "um", "hmm") → verify they are stripped from transcript
  • Manual: Record speech starting with a filler → verify first letter is capitalized
  • Manual: Set secondary language (e.g. English + Polish), speak mixed-language → verify transcription
  • Manual: Verify no WebSocket leak after multiple start/stop cycles (check DevTools Network tab)
  • CI: Linux and Windows builds pass (build run)

Add Soniox as a fourth cloud streaming provider alongside Deepgram,
AssemblyAI, and OpenAI Realtime. Includes WebSocket streaming core with
cold-start buffering, full Electron IPC pipeline, settings UI with API
key management, onboarding validation, and BYOK detection.
- Remove Soniox-specific render branch in TranscriptionModelPicker,
  use same ModelCardList + API key maps as OpenAI/Groq/Mistral
- Replace hardcoded "stt-rt-v4" in UI with registry-based model selection
- Add Soniox "S" icon SVG (from official wordmark)
- Translate soniox_stt_rt_v4 model description in 9 locale files
When audioManager calls finalize() before disconnect(), the server has
already received it. Sending it again in drainFinalTokens() caused a 3s
timeout waiting for a response that would never come. Track finalize
state with _finalizeSent flag and skip the redundant call.
Soniox connects in ~250ms, no benefit from keeping an idle WebSocket
between dictation sessions. Avoids unnecessary Soniox session usage
and potential idle timeout issues.
- Remove closeResolve (never assigned, close handler check unreachable)
- Use getFullTranscript() instead of inline .map().join() duplicate
- Remove soniox special-case in handleCloudProviderChange (generic path handles it)
Soniox supports multi-language transcription via language_hints array.
Add a secondary language selector in the Soniox provider tab so users
can hint a second language (e.g. Polish + English) for code-switching.

- New sonioxSecondaryLanguage setting in store/hook
- LanguageSelector dropdown in Soniox tab (inline layout)
- Disabled when primary language is auto (no bias needed)
- Language codes normalized to base form (en-US → en)
- i18n keys added for all 10 locales
- Add 30s idle timeout to Soniox keepalive to prevent zombie WebSocket
  connections surviving renderer hot-reload or crash
- Add cleanupAllStreaming() to close all streaming backends on app quit
- Add isDestroyed() guards to Soniox and dictation IPC callbacks,
  matching the pattern used by Deepgram and AssemblyAI
- Prefer cleanupAll() over cleanup() for backends that support it
  (Deepgram, AssemblyAI) to also clean warm connections and timers
Soniox sends a U+FFFD replacement character as a final token when
recording silence, which gets pasted as garbage. Filter out empty,
whitespace-only, and replacement character tokens in Soniox handler.
Also trim finalText before the paste guard in audioManager as a
defensive check for all streaming providers.
Strip hesitation fillers (uh, um, yyy, eee, mmm, hmm) from assembled
transcript text. Soniox BPE tokenization splits fillers across sub-word
tokens, so removal works on joined text using word boundaries.

Capitalizes first letter after filler removal at sentence boundaries
(.!?) and at text start, with full Unicode support (Polish ć/ó/ś,
accented Latin, Cyrillic). Preserves real exclamations (Oh, Ah) and
words containing filler substrings (umbrella, human, summer).

Adds first test infrastructure (node:test, zero deps) with 25 tests.
@DamianPala DamianPala force-pushed the feat/soniox-streaming branch from 221b476 to 9d02380 Compare March 12, 2026 12:07
@DamianPala DamianPala marked this pull request as ready for review March 12, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant