18 Apr 11:48

367222a

v0.1.1 — production install hardening Latest

Latest

Production install hardening

End-to-end validated on a fresh install: 11/11 TTS models pass from an empty model cache, downloading all weights from HuggingFace Hub on first run.

Highlights

11/11 models validated end-to-end on a clean machine with an empty HuggingFace cache.
Hardened setup-openspeakers.sh: network reachability checks (github.com, hub.docker.com, huggingface.co), 3-retry download loop for every file, docker compose config validation before up, 120 s backend health poll, OPENSPEAKERS_UNATTENDED=1 env var for CI / scripted installs, OPENSPEAKERS_BRANCH override for testing pre-release branches.
Fish Speech upgraded to fishaudio/s2-pro — the installed fish-speech library v2.0.0 expects this DAC architecture; the older fish-speech-1.5 checkpoint was incompatible.
All workers forward HF_TOKEN from .env, unblocking gated-model downloads (Orpheus 3B).

Fixed

Frontend port mapping (5200:80, not 5200:3000)
Workers cannot connect to Redis/Celery (missing broker URL env vars)
tts.kokoro queue not registered in Celery app
Missing HF_HOME/HOME in secondary workers
VibeVoice 0.5B voice files at wrong path (/app/demo/voices/..., not /opt/vibevoice/...)
Fish Speech crashed on first-run download (now uses snapshot_download())
Fish Speech decoder filename varies by model version (candidate search)
Qwen3 TTS blocked first-run download (local_files_only=True → False)
Orpheus 3B 401 — workers now forward HF_TOKEN from .env

Added

HF_TOKEN documented in .env.example with pointer to the Orpheus model license
scripts/fix-model-permissions.sh helper for model_cache/ ownership

See CHANGELOG.md for the full list.

Install

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenSpeakers/main/setup-openspeakers.sh | bash

Docker images

All tagged v0.1.1 and latest on Docker Hub under davidamacey/openspeakers-*:

openspeakers-backend, openspeakers-frontend
openspeakers-worker, openspeakers-worker-kokoro, openspeakers-worker-fish, openspeakers-worker-qwen3, openspeakers-worker-orpheus, openspeakers-worker-dia, openspeakers-worker-f5

Assets 2

12 Apr 22:39

davidamacey

v0.1.0

33870ac

v0.1.0 — Initial Release: 11 TTS Models, GPU Hot-Swap, Voice Cloning

Changelog

All notable changes to OpenSpeakers are documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

0.1.0 - 2026-04-12

Overview

Initial public release of OpenSpeakers — a unified TTS and voice cloning application
supporting 11 open-source models with GPU hot-swap, async job queuing, real-time
streaming, and a SvelteKit UI.

Added

11 TTS Models

Kokoro 82M — Fastest model (~1s), 50+ preset voices, standby mode
VibeVoice 0.5B — Real-time streaming TTS, 12 built-in voices, 10 languages
VibeVoice 1.5B — High-quality long-form, multi-speaker dialogue, zero-shot cloning
Fish Audio S2-Pro — Multilingual (80+ languages), emotion tags, voice cloning
Qwen3 TTS 1.7B — Expressive multilingual with instruct mode and voice cloning
Orpheus 3B — Emotional speech with laugh/sigh/gasp tags, vLLM backend
F5-TTS — Fast flow-matching (15x realtime), MIT license, reference-audio cloning
Chatterbox — Expressive TTS with exaggeration/CFG controls, voice cloning
CosyVoice 2.0 — Ultra-low latency (150ms), voice design via text description
Parler TTS Mini — Generate any voice from a text description, no reference audio
Dia 1.6B — Multi-speaker dialogue with [S1]/[S2] tags and nonverbal sounds

GPU Hot-Swap Architecture

ModelManager singleton with threading.Lock for GPU serialization
Automatic model unloading between tasks (gc.collect + torch.cuda.empty_cache)
60-second idle timer auto-unloads non-standby models
Kokoro standby mode — stays loaded permanently for instant responses
Ollama-style keep_alive TTL per model (-1 = indefinite, 0 = clear, N = seconds)

Worker Architecture

7 dedicated Celery worker containers with model-specific queues
QUEUE_MAP routing — single source of truth for model-to-queue mapping
Shared GPU base image (torch 2.10+cu128) for all secondary workers
nvidia runtime on all containers for reliable GPU access
Startup validation warns about unrouted or stale QUEUE_MAP entries

API Endpoints

POST /api/tts/generate — Submit TTS job (async, returns job_id)
GET /api/tts/jobs/{id} — Poll job status
GET /api/tts/jobs/{id}/audio — Download generated audio
GET /api/tts/jobs — List jobs with pagination, filtering, search
DELETE /api/tts/jobs/{id} — Cancel pending/running job (revokes Celery task)
POST /api/tts/batch — Submit up to 100 lines as a batch
GET /api/tts/batches/{id} — Batch status with aggregate counts
GET /api/tts/batches/{id}/zip — Download all completed audio as ZIP
POST /api/voices — Upload reference audio for voice cloning
GET /api/voices — List voice profiles
GET /api/voices/builtin/{model_id} — List preset voices per model
PATCH /api/voices/{id} — Update voice profile name/description/tags
DELETE /api/voices/{id} — Delete voice profile and files
GET /api/models — List all models with capabilities and status
POST /api/models/{id}/load — Pre-warm model into GPU VRAM
DELETE /api/models/{id}/load — Force-unload model
POST /v1/audio/speech — OpenAI-compatible endpoint (tts-1 → Kokoro, tts-1-hd → Orpheus)
GET /health — Docker health check
GET /api/system/info — GPU stats, disk usage, registered models

WebSocket Endpoints

/ws/jobs/{id} — Real-time job progress (queued, loading, generating, audio_chunk, complete)
/ws/gpu — Live GPU stats stream (1s interval)

Frontend (SvelteKit 2 + Svelte 5)

TTS Page — Model selector with help text, voice picker, speed/pitch/language controls
Dialogue Editor — Structured multi-speaker turn editor for Dia and VibeVoice 1.5B
Batch Page — Dynamic add/remove text entries, per-job progress, ZIP download
Compare Page — Side-by-side generation across up to 4 models
Clone Page — Upload reference audio, manage voice profiles
History Page — Full job history with search, filter, pagination, audio playback
Models Page — Model catalog with help text, capability badges, VRAM bars, filters
Settings Page — Live GPU stats via WebSocket, storage paths, system info
About Page — Model descriptions and HuggingFace links
Dark mode default with theme toggle
Mobile responsive sidebar
Real-time streaming audio playback (Web Audio API) for VibeVoice 0.5B
Per-model parameter panels with emotion tag quick-insert
Keyboard shortcuts modal (press ?)
Toast notification system

Infrastructure

PostgreSQL for job history, voice profiles, batch tracking
Redis for Celery broker and WebSocket pub/sub
Alembic migrations (auto-run on backend startup)
pynvml GPU stats in API container (no torch dependency)
Path traversal guard on batch ZIP downloads
Extension whitelist on voice profile uploads
CORS configuration via environment variable
Pre-commit hooks: ruff, bandit, shellcheck, conventional commits

Testing

18 fast API smoke tests
Kokoro end-to-end generation test
Full-matrix parametrized test for all 11 models (TEST_ALL_MODELS=1)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Production install hardening

Highlights

Fixed

Added

Install

Docker images

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changelog

0.1.0 - 2026-04-12

Overview

Added

11 TTS Models

GPU Hot-Swap Architecture

Worker Architecture

API Endpoints

WebSocket Endpoints

Frontend (SvelteKit 2 + Svelte 5)

Infrastructure

Testing

Uh oh!

Releases: attevon-llc/OpenSpeakers

v0.1.1 — production install hardening

Production install hardening

Highlights

Fixed

Added

Install

Docker images

Uh oh!

v0.1.0 — Initial Release: 11 TTS Models, GPU Hot-Swap, Voice Cloning

Changelog

0.1.0 - 2026-04-12

Overview

Added

11 TTS Models

GPU Hot-Swap Architecture

Worker Architecture

API Endpoints

WebSocket Endpoints

Frontend (SvelteKit 2 + Svelte 5)

Infrastructure

Testing

Uh oh!