Changelog

All notable changes to OpenSpeakers are documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

0.1.0 - 2026-04-12

Overview

Initial public release of OpenSpeakers — a unified TTS and voice cloning application
supporting 11 open-source models with GPU hot-swap, async job queuing, real-time
streaming, and a SvelteKit UI.

Added

11 TTS Models

Kokoro 82M — Fastest model (~1s), 50+ preset voices, standby mode
VibeVoice 0.5B — Real-time streaming TTS, 12 built-in voices, 10 languages
VibeVoice 1.5B — High-quality long-form, multi-speaker dialogue, zero-shot cloning
Fish Audio S2-Pro — Multilingual (80+ languages), emotion tags, voice cloning
Qwen3 TTS 1.7B — Expressive multilingual with instruct mode and voice cloning
Orpheus 3B — Emotional speech with laugh/sigh/gasp tags, vLLM backend
F5-TTS — Fast flow-matching (15x realtime), MIT license, reference-audio cloning
Chatterbox — Expressive TTS with exaggeration/CFG controls, voice cloning
CosyVoice 2.0 — Ultra-low latency (150ms), voice design via text description
Parler TTS Mini — Generate any voice from a text description, no reference audio
Dia 1.6B — Multi-speaker dialogue with [S1]/[S2] tags and nonverbal sounds

GPU Hot-Swap Architecture

ModelManager singleton with threading.Lock for GPU serialization
Automatic model unloading between tasks (gc.collect + torch.cuda.empty_cache)
60-second idle timer auto-unloads non-standby models
Kokoro standby mode — stays loaded permanently for instant responses
Ollama-style keep_alive TTL per model (-1 = indefinite, 0 = clear, N = seconds)

Worker Architecture

7 dedicated Celery worker containers with model-specific queues
QUEUE_MAP routing — single source of truth for model-to-queue mapping
Shared GPU base image (torch 2.10+cu128) for all secondary workers
nvidia runtime on all containers for reliable GPU access
Startup validation warns about unrouted or stale QUEUE_MAP entries

API Endpoints

POST /api/tts/generate — Submit TTS job (async, returns job_id)
GET /api/tts/jobs/{id} — Poll job status
GET /api/tts/jobs/{id}/audio — Download generated audio
GET /api/tts/jobs — List jobs with pagination, filtering, search
DELETE /api/tts/jobs/{id} — Cancel pending/running job (revokes Celery task)
POST /api/tts/batch — Submit up to 100 lines as a batch
GET /api/tts/batches/{id} — Batch status with aggregate counts
GET /api/tts/batches/{id}/zip — Download all completed audio as ZIP
POST /api/voices — Upload reference audio for voice cloning
GET /api/voices — List voice profiles
GET /api/voices/builtin/{model_id} — List preset voices per model
PATCH /api/voices/{id} — Update voice profile name/description/tags
DELETE /api/voices/{id} — Delete voice profile and files
GET /api/models — List all models with capabilities and status
POST /api/models/{id}/load — Pre-warm model into GPU VRAM
DELETE /api/models/{id}/load — Force-unload model
POST /v1/audio/speech — OpenAI-compatible endpoint (tts-1 → Kokoro, tts-1-hd → Orpheus)
GET /health — Docker health check
GET /api/system/info — GPU stats, disk usage, registered models

WebSocket Endpoints

/ws/jobs/{id} — Real-time job progress (queued, loading, generating, audio_chunk, complete)
/ws/gpu — Live GPU stats stream (1s interval)

Frontend (SvelteKit 2 + Svelte 5)

TTS Page — Model selector with help text, voice picker, speed/pitch/language controls
Dialogue Editor — Structured multi-speaker turn editor for Dia and VibeVoice 1.5B
Batch Page — Dynamic add/remove text entries, per-job progress, ZIP download
Compare Page — Side-by-side generation across up to 4 models
Clone Page — Upload reference audio, manage voice profiles
History Page — Full job history with search, filter, pagination, audio playback
Models Page — Model catalog with help text, capability badges, VRAM bars, filters
Settings Page — Live GPU stats via WebSocket, storage paths, system info
About Page — Model descriptions and HuggingFace links
Dark mode default with theme toggle
Mobile responsive sidebar
Real-time streaming audio playback (Web Audio API) for VibeVoice 0.5B
Per-model parameter panels with emotion tag quick-insert
Keyboard shortcuts modal (press ?)
Toast notification system

Infrastructure

PostgreSQL for job history, voice profiles, batch tracking
Redis for Celery broker and WebSocket pub/sub
Alembic migrations (auto-run on backend startup)
pynvml GPU stats in API container (no torch dependency)
Path traversal guard on batch ZIP downloads
Extension whitelist on voice profile uploads
CORS configuration via environment variable
Pre-commit hooks: ruff, bandit, shellcheck, conventional commits

Testing

18 fast API smoke tests
Kokoro end-to-end generation test
Full-matrix parametrized test for all 11 models (TEST_ALL_MODELS=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changelog

0.1.0 - 2026-04-12

Overview

Added

11 TTS Models

GPU Hot-Swap Architecture

Worker Architecture

API Endpoints

WebSocket Endpoints

Frontend (SvelteKit 2 + Svelte 5)

Infrastructure

Testing

Uh oh!

Releases: attevon-llc/OpenSpeakers

v0.1.0 — Initial Release: 11 TTS Models, GPU Hot-Swap, Voice Cloning

Changelog

0.1.0 - 2026-04-12

Overview

Added

11 TTS Models

GPU Hot-Swap Architecture

Worker Architecture

API Endpoints

WebSocket Endpoints

Frontend (SvelteKit 2 + Svelte 5)

Infrastructure

Testing

Uh oh!