Releases: KoljaB/RealtimeTTS
v0.6.0
RealtimeTTS v0.6.0
Features
-
New TTS Engines
- Faster Qwen3 TTS: Added
FasterQwenEngineengine. See tests/faster_qwen_emotions.py and tests/faster_qwen_test.py for details how to implement. Demo video: https://www.youtube.com/watch?v=ZOKcUpJlrXQ - Cartesia: Added engine for Cartesia speech synthesis via WebSocket API. ([#348])
- MiniMax Cloud TTS: Added
MiniMaxEnginefor MiniMax T2A v2 API with 2 models (speech-2.8-hd,speech-2.8-turbo), 12 voice presets, and runtime parameter control. ([#369]) - ModelsLab TTS: Added
ModelsLabEnginesupporting 30+ voices in 9 languages, speed and emotion control, and lazy loading. ([#368]) - CAMB AI MARS TTS: Added
CambEngineandCambVoicefor CAMB AI's MARS models, supporting 140+ languages and streaming output. ([#367]) - NeuTTS: Added
NeuTTSEnginefor on-device TTS with voice cloning (3s reference audio, CPU/CUDA/MPS support). ([#359]) - PocketTTS: Added
PocketTTSEnginefor CPU-optimized TTS (Kyutai Labs, 8 voices, voice cloning, low latency). ([#358])
- Faster Qwen3 TTS: Added
-
WebSocket Streaming
- Real-time TTS streaming via WebSocket endpoint with multi-user support, bidirectional audio, and enhanced web UI. Includes Python demo client. ([#356])
Improvements
-
Engine Usability
- Allow
OpenAIEngineto accept API key as parameter (fallback to env var). ([#361]) - Add language parameter to
ZipvoiceEnginefor output and prompt speech. ([#362]) - Add
mpv_audio_deviceoption toAudioConfiguration/TextToAudioStreamfor MPV playback device selection. ([#327]) - Add adjustable volume parameter (0.0–1.0) to
TextToAudioStream. ([#335]) - PiperEngine: Streamline synthesis and add samplerate detection from model config for better quality with larger models. ([#346], [#347])
- Conditional logging: Only print "SYNTHESIS FINISHED" if logging is enabled. ([#332])
- Allow
-
General
- Install
portaudiofor MacOS. ([#328])
- Install
Fixes
- Correct typo in
requirements.txtforpypinyinversion. ([#355]) - Fix missing comma in
__all__that affected engine exports. ([#367]) - Add audio format detection/conversion for non-Kokoro TTS engines. ([#356])
- Fix voice retrieval errors and improve engine initialization logic. ([#356])
Other
- Updated documentation for new engines, playback options, and WebSocket usage.
- Added/updated test files and demo scripts for new engines.
- No breaking changes; all updates are backward compatible.
PRs:
[#369], [#368], [#367], [#362], [#361], [#359], [#358], [#356], [#355], [#348], [#347], [#346], [#335], [#332], [#328], [#327]
v0.5.7
RealtimeTTS v0.5.7
New Engine:
✨ Added ZipVoiceEngine - ZipVoice is small, fast and delivers high-quality output with voice cloning.
- see zipvoice_test file for an implementation example
- see the zipvoice docker folder for a realtime streaming fastapi server implementation for zipvoice
Bug Fixes & Improvements:
- fixes #320
v0.5.6
v0.5.5
RealtimeTTS v0.5.5
Bug Fixes & Improvements:
- Coqui Engine: Enhanced text normalization to improve handling of special characters (leading non-alphanumeric, smart quotes, em-dashes, etc.) and various whitespace types, leading to more reliable synthesis.
- Stream Status: Fixed logic in
TextToAudioStreamfor more accurate reporting of the stream's active state.
Other Changes:
- Orpheus Engine: Relaxed strict validation for voice names (validation code commented out).
- Dependencies: Upgraded
openai(to 1.77.0) andedge-tts(to 7.0.2).
v0.5.3
RealtimeTTS v0.5.3 Release Notes
Enhancements & Changes:
- Silence Control: Moved silence‑insertion out of CoquiEngine into
TextToAudioStream; introduced configurablecomma,sentence, anddefaultsilence durations in play- and play-async methods. - KokoroEngine: Uses KokoroVoice now. #303 (thank you)
- OrpheusEngine: Lets you pick model now and has improved stop‑on‑demand checks.
- More Thread‑Safe Pipes (?): Added
SafePipefor what I hope is more reliable inter‑process comm in CoquiEngine (needs to be tested more).
v0.5.1
RealtimeTTS v0.5.1 Release Notes
Enhancements & Changes:
- Audio Trimming & Fading: Trim leading/trailing silence and apply fades to generated audio (configurable in Kokoro & StyleTTS engines).
- Stop Event Handling: Implemented fast reliable stop of ongoing synthesis especially for the Coqui engine.
- Dynamic Coqui Settings: Change language and stream chunk size for the Coqui engine on the fly using
engine.set_language()andengine.set_stream_chunk_size(). - Dependency Updates: Upgraded libraries (e.g.,
openai,kokoro,elevenlabs, etc.) to newer versions.
v0.5.0
v0.4.55
RealtimeTTS v0.4.55 Release Notes
-
Enhanced OpenAI Engine Initialization:
- Added optional parameters:
instructions,debug,speed,response_format, andtimeout. - Updated available voices to include "ash", "coral", and "sage".
- Note: The
speedandtimeoutparameters are not working at the moment; unclear why - they are being submitted to the API.
- Added optional parameters:
-
Text-to-Stream Improvements:
- Introduced an
error_flagto track errors during playback. - Set the synthesis worker thread as daemon to ensure proper thread termination.
- Introduced an
-
Dependency Update:
- Upgraded the OpenAI package from version 1.66.3 to 1.68.2.
v0.4.54
RealtimeTTS v0.4.54 Release Notes
New Engine:
✨ Added OrpheusEngine - Real-time TTS for Orpheus-3B model with:
- multiple voice presets (
zac,zoe,tara, etc.) - emotive speech tags support (
<laugh>,<gasp>, etc.) - low-latency streaming (<100ms time to first audio token)
- uses an external server → lets you generate tts on another network system
Installation:
pip install realtimetts[orpheus]Requires: LM Studio (local) or compatible API server running Orpheus-3B-0.1-ft-Q8_0-GGUF. Load model in LM Studio before use.
Example code:
Here is a code example showcasing how you can use the OrpheusEngine.