17 Mar 09:46

Immutable

v0.3.0

ac68052

v0.3.0 Latest

Latest

This release rewrites the backend into a modular architecture, overhauls the settings UI into routed sub-pages, fixes audio player freezing, migrates documentation to Fumadocs, and ships a batch of bug fixes targeting the most-reported issues from the tracker.

The backend's 3,000-line monolith main.py has been decomposed into domain routers, a services layer, and a proper database package. A style guide and ruff configuration now enforce consistency. On the frontend, settings have been split into dedicated routed pages with server logs, a changelog viewer, and an about page. The audio player no longer freezes mid-playback, and model loading status is now visible in the UI. Seven user-reported bugs have been fixed, including server crashes during sample uploads, generation list staleness, cryptic error messages, and CUDA support for RTX 50-series GPUs.

Settings Overhaul (#294)

Split settings into routed sub-tabs: General, Generation, GPU, Logs, Changelog, About
Added live server log viewer with auto-scroll
Added in-app changelog page that parses CHANGELOG.md at build time
Added About page with version info, license, and generation folder quick-open
Extracted reusable SettingRow component for consistent setting layouts

Audio Player Fix (#293)

Fixed audio player freezing during playback
Improved playback UX with better state management and listener cleanup
Fixed restart race condition during regeneration
Added stable keys for audio element re-rendering
Improved accessibility across player controls

Backend Refactor (#285)

Extracted all routes from main.py into 13 domain routers under backend/routes/ — main.py dropped from ~3,100 lines to ~10
Moved CRUD and service modules into backend/services/, platform detection into backend/utils/
Split monolithic database.py into a database/ package with separate models, session, migrations, and seed modules
Added backend/STYLE_GUIDE.md and pyproject.toml with ruff linting config
Removed dead code: unused _get_cuda_dll_excludes, stale studio.py, example_usage.py, old Makefile
Deduplicated shared logic across TTS backends into backends/base.py
Improved startup logging with version, platform, data directory, and database stats
Fixed startup database session leak — sessions now rollback and close in finally block
Isolated shutdown unload calls so one backend failure doesn't block the others
Handled null duration in story_items migration
Reject model migration when target is a subdirectory of source cache

Documentation Rewrite (#288)

Migrated docs site from Mintlify to Fumadocs (Next.js-based)
Rewrote introduction and root page with content from README
Added "Edit on GitHub" links and last-updated timestamps on all pages
Generated OpenAPI spec and auto-generated API reference pages
Removed stale planning docs (CUDA_BACKEND_SWAP, EXTERNAL_PROVIDERS, MLX_AUDIO, TTS_PROVIDER_ARCHITECTURE, etc.)
Sidebar groups now expand by default; root redirects to /docs
Added OG image metadata and /og preview page

UI & Frontend

Added model loading status indicator and effects preset dropdown (3187344)
Fixed take-label race condition during regeneration
Added accessible focus styling to select component
Softened select focus indicator opacity
Addressed 4 critical and 12 major issues from CodeRabbit review

Bug Fixes (#295)

Fixed sample uploads crashing the server — audio decoding now runs in a thread pool instead of blocking the async event loop (#278)
Fixed generation list not updating when a generation completes — switched to refetchQueries for reliable cache busting, added SSE error fallback, and page reset on completion (#231)
Fixed error toasts showing [object Object] instead of the actual error message (#290)
Added Whisper model selection (base, small, medium, large, turbo) and expanded language support to the /transcribe endpoint (#233)
Upgraded CUDA backend build from cu121 to cu126 for RTX 50-series (Blackwell) GPU support (#289)
Handled client disconnects in SSE and streaming endpoints to suppress [Errno 32] Broken Pipe errors (#248)
Fixed Docker build failure from pip hash mismatch on Qwen3-TTS dependencies (#286)
Added 50 MB upload size limit with chunked reads to prevent unbounded memory allocation on sample uploads
Eliminated redundant double audio decode in sample processing pipeline

Platform Fixes

Replaced netstat with TcpStream + PowerShell for Windows port detection (#277)
Fixed Docker frontend build and cleaned up Docker docs
Fixed macOS download links to use .dmg instead of .app.tar.gz
Added dynamic download redirect routes to landing site

Release Tooling

Added draft-release-notes and release-bump agent skills
Wired CI release workflow to extract notes from CHANGELOG.md for GitHub Releases
Backfilled changelog with all historical releases

Assets 22

latest.json

sha256:cfc20bf41628701bd2e144053067268f29a8b7f24dce6d3bbaa7d2df1f3522c4

9.68 KB 2026-03-17T09:15:17Z
voicebox-server-cuda.manifest

sha256:19e79035191f3d5fb7389c308abbc7b870ff09417f3103f1a2fea65aaf20eb7a

66 Bytes 2026-03-17T08:28:18Z
voicebox-server-cuda.part00.exe

sha256:d3ee5c261a4cbc432b40f809e0ba5be30d469c3765facb72afc53f83fed386a4

1.77 GB 2026-03-17T08:28:20Z
voicebox-server-cuda.part01.exe

sha256:27256d2e973efd80ea96d7a485d0d65b20370366615564281e75938fda5d579c

908 MB 2026-03-17T08:28:18Z
voicebox-server-cuda.sha256

sha256:755b6886222daafe978a4dbfe4bf3991d9c88f667ceca3a2303f29333811106e

92 Bytes 2026-03-17T08:28:20Z
Voicebox_0.3.0_aarch64.dmg

sha256:9ada6bc6606ae43161676427b4ecd33eb2b422a0add34a90e7fdc386c61919cb

366 MB 2026-03-17T09:03:51Z
Voicebox_0.3.0_x64-setup.exe

sha256:ab8ebdfb888b907fd0715962dde46a314b4f435316e429bb3e4fec4aa8cc5573

368 MB 2026-03-17T08:31:54Z
Voicebox_0.3.0_x64-setup.exe.sig

sha256:88e06b46195c2f6bc299121499194492b01e368ae92cb5fda47aef1a6d09e31c

420 Bytes 2026-03-17T08:32:07Z
Voicebox_0.3.0_x64-setup.nsis.zip

sha256:c7d05689656d9517d22d1b113e5b3df5e15b116ceaff0884190e259f2f5feb79

368 MB 2026-03-17T08:32:08Z
Voicebox_0.3.0_x64-setup.nsis.zip.sig

sha256:178679dc6075ff98d2791aaa82b2386eb96690967529121146b6b9afba365b6e

424 Bytes 2026-03-17T08:32:19Z
Source code (zip)

2026-03-17T08:07:54Z
Source code (tar.gz)

2026-03-17T08:07:54Z
Release attestation (json)

2026-03-17T08:07:54Z

15 Mar 23:28

github-actions

v0.2.3

34e17bd

v0.2.3

The "it works in dev but not in prod" release. This version fixes a series of PyInstaller bundling issues that prevented model downloading, loading, generation, and progress tracking from working in production builds.

Model Downloads Now Actually Work

The v0.2.1/v0.2.2 builds could not download or load models that weren't already cached from a dev install. This release fixes the entire chain:

Chatterbox, Chatterbox Turbo, and LuxTTS all download, load, and generate correctly in bundled builds
Real-time download progress — byte-level progress bars now work in production. The root cause: huggingface_hub silently disables tqdm progress bars based on logger level, which prevented our progress tracker from receiving byte updates. We now force-enable the internal counter regardless.
Fixed Python 3.12.0 code.replace() bug — the macOS build was on Python 3.12.0, which has a known CPython bug that corrupts bytecode when PyInstaller rewrites code objects. This caused NameError: name 'obj' is not defined crashes during scipy/torch imports. Upgraded to Python 3.12.13.

PyInstaller Fixes

Collect all inflect files — typeguard's @typechecked decorator calls inspect.getsource() at import time, which needs .py source files, not just bytecode. Fixes LuxTTS "could not get source code" error.
Collect all perth files — bundles the pretrained watermark model (hparams.yaml, .pth.tar) needed by Chatterbox at runtime
Collect all piper_phonemize files — bundles espeak-ng-data/ (phoneme tables, language dicts) needed by LuxTTS for text-to-phoneme conversion
Set ESPEAK_DATA_PATH in frozen builds so the espeak-ng C library finds the bundled data instead of looking at /usr/share/espeak-ng-data/
Collect all linacodec files — fixes inspect.getsource error in Vocos codec
Collect all zipvoice files — fixes source code lookup in LuxTTS voice cloning
Copy metadata for requests, transformers, huggingface-hub, tokenizers, safetensors, tqdm — fixes importlib.metadata lookups in frozen binary
Add hidden imports for chatterbox, chatterbox_turbo, luxtts, zipvoice backends
Add multiprocessing.freeze_support() to fix resource_tracker subprocess crash in frozen binary
--noconsole now only applied on Windows — macOS/Linux need stdout/stderr for Tauri sidecar log capture
Hardened sys.stdout/sys.stderr devnull redirect to test writability, not just None check

Updater

Fixed updater artifact generation with v1Compatible for tauri-action signature files
Updated tauri-action to v0.6 to fix updater JSON and .sig generation

Other Fixes

Full traceback logging on all backend model loading errors (was just str(e) before)

Assets 21

15 Mar 16:55

github-actions

v0.2.2

e7f749f

v0.2.2

UPDATE: I'm working on a rewrite of the model downloading, it's absolute hell and takes a while to test as it always works in dev and never in prod builds. Will have a solution up ASAP. If you're eager to test 0.2.x please compile from source. Next update will solve model downloading and the updater issue for good.

v0.2.2

Fix Chatterbox model support in bundled builds [SIKE fixed in 0.2.3]
Fix LuxTTS/ZipVoice support in bundled builds [SIKE fixed in 0.2.3]
Auto-update CUDA binary when app version changes
CUDA download progress bar
Fix server process staying alive on macOS (SIGHUP handling, watchdog grace period)
Hide console window when running CUDA binary on Windows

Assets 12

15 Mar 15:27

github-actions

v0.2.1

a5269d2

v0.2.1

The best local voice cloning tool, just got better...

See the new website: https://voicebox.sh

Released 2026-03-15 — v0.2.1 on GitHub (version bump due to an immutable release tag on GitHub)

Voicebox v0.1.x was a single-engine voice cloning app built around Qwen3-TTS. v0.2.0 is a ground-up rethink: four TTS engines, 23 languages, paralinguistic emotion controls, a post-processing effects pipeline, unlimited generation length, an async generation queue, and support for every major GPU vendor. Plus Docker.

New TTS Engines

Multi-Engine Architecture

Voicebox now runs four independent TTS engines behind a thread-safe per-engine backend registry. Switch engines per-generation from a single dropdown — no restart required.

Engine	Languages	Size	Key Strengths
Qwen3-TTS 1.7B	10	~3.5 GB	Highest quality, delivery instructions ("speak slowly", "whisper")
Qwen3-TTS 0.6B	10	~1.2 GB	Lighter, faster variant
LuxTTS	English	~300 MB	CPU-friendly, 48 kHz output, 150x realtime
Chatterbox Multilingual	23	~3.2 GB	Broadest language coverage, zero-shot cloning
Chatterbox Turbo	English	~1.5 GB	350M params, low latency, paralinguistic tags

Chatterbox Multilingual — 23 Languages (#257)

Zero-shot voice cloning in Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, and Turkish. The language dropdown dynamically filters to show only languages supported by the selected engine.

LuxTTS — Lightweight English TTS (#254)

A fast, CPU-friendly English engine. ~300 MB download, 48 kHz output, runs at 150x realtime on CPU. Good for quick drafts and machines without a GPU.

Chatterbox Turbo — Expressive English (#258)

A fast 350M-parameter English model with inline paralinguistic tags.

Paralinguistic Tags Autocomplete (#265)

Type / in the text input with Chatterbox Turbo selected to open an autocomplete for 9 expressive tags that the model synthesizes inline with speech:

[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]

Tags render as inline badges in a rich text editor and serialize cleanly to the API.

Generation

Unlimited Generation Length — Auto-Chunking (#266)

Long text is now automatically split at sentence boundaries, generated per-chunk, and crossfaded back together. Engine-agnostic — works with all four engines.

Auto-chunking limit slider — 100–5,000 chars (default 800)
Crossfade slider — 0–200ms (default 50ms), or 0 for a hard cut
Max text length raised to 50,000 characters
Smart splitting respects abbreviations (Dr., e.g., a.m.), CJK punctuation, and never breaks inside [tags]

Asynchronous Generation Queue (#269)

Generation is now fully non-blocking. Submit a generation and start typing the next one immediately.

Serial execution queue prevents GPU contention
Real-time SSE status streaming (generating → completed / failed)
Failed generations can be retried without re-entering text
Stale generations from crashes are auto-recovered on startup
Generating status pill shown inline in the story editor

Generation Versions

Every generation now supports multiple versions with provenance tracking:

Original — the unprocessed TTS output, always preserved
Effects versions — apply different effects chains to create new versions from any source
Takes — regenerate with the same text/voice but a new seed
Source tracking — each version records which version it was derived from
Version pinning in stories — pin a specific version to a story track clip
Favorites — star generations for quick access

Language Parameter Fix

Qwen TTS models now correctly receive the selected language. The generation form syncs with the voice profile's language setting.

Post-Processing Effects (#271)

A full audio effects system powered by Spotify's pedalboard library. Apply effects after generation, preview in real time, and build reusable presets.

Effect	Description
Pitch Shift	±12 semitones
Reverb	Room size, damping, wet/dry mix
Delay	Adjustable time, feedback, mix
Chorus / Flanger	Modulated delay — short for metallic, long for lush
Compressor	Threshold, ratio, attack, release
Gain	-40 to +40 dB
High-Pass Filter	Configurable cutoff frequency
Low-Pass Filter	Configurable cutoff frequency

4 built-in presets — Robotic, Radio, Echo Chamber, Deep Voice
Custom presets — create unlimited drag-and-drop effect chains
Per-profile default effects — assign a chain to a voice profile, auto-applies to every generation
Live preview — audition effects against existing audio before committing
Source version selection — apply effects to any version of a generation, not just the latest

Platform Support

Windows Support (#272)

Full Windows support with CUDA GPU detection, cross-platform justfile, and clean server shutdown using taskkill /T for the process tree.

Linux (#262)

Pre-built Linux binaries are not available for this release — the release CI is still broken on Linux and we're working on fixing it. However, this release includes significant Linux improvements that make compiling from source much easier:

AMD ROCm GPU acceleration with automatic HSA_OVERRIDE_GFX_VERSION for unlisted GPUs
NVIDIA GBM buffer crash fix (#210)
WebKitGTK microphone access for voice sample recording
Cross-platform justfile with Linux-specific setup targets
See the README for build-from-source instructions — we'll ship Linux CI builds as soon as we can

NVIDIA CUDA Backend Swap (#252)

The CPU-only release can download and swap in a CUDA-accelerated backend from within the app. Downloads split parts to work around GitHub's 2GB asset limit, verifies SHA-256 checksums, and restarts the server automatically.

Intel Arc (XPU) and DirectML

PyTorch backend supports Intel Arc GPUs via IPEX/XPU and any-GPU on Windows via DirectML.

Docker + Web Deployment (#161)

Run Voicebox headless:

docker compose up

3-stage build, non-root runtime, health checks, persistent model cache. Binds to localhost only by default.

Whisper Turbo

Added openai/whisper-large-v3-turbo as a transcription model option.

Model Management (#268)

Per-model unload — free GPU memory without deleting downloaded models
Custom models directory — set VOICEBOX_MODELS_DIR to store models anywhere
Model folder migration — move all models to a new location with progress tracking
Download cancel/clear UI — cancel in-progress downloads, VS Code-style problems panel for errors (#238)
Restructured settings UI — server settings and model management split into cleaner sections

Security & Reliability

CORS hardening — explicit allowlist of local origins instead of wildcard *; extensible via VOICEBOX_CORS_ORIGINS (#88)
Network access toggle — fully disable outbound requests for air-gapped deployments (#133)
Offline crash fix — Voicebox no longer crashes when HuggingFace is unreachable (#152)
Atomic audio saves — two-phase write prevents corrupted files on crash or disk-full (#263)
Filesystem health endpoint — proactive disk space and directory writability checks
Errno-specific error messages — clear feedback for permission denied, disk full, missing directory
Chatterbox float64 dtype fix — patches S3Tokenizer and VoiceEncoder to cast float64→float32, preventing crashes on certain audio inputs (#264)
Watchdog respects keep-server-running — /watchdog/disable endpoint prevents the server from shutting down when the app window closes, if configured
Server shutdown on Windows — clean process tree termination with taskkill /T and os._exit fallback

Accessibility (#243)

Screen reader support (tested with NVDA/Narrator) across all major UI surfaces
Keyboard navigation for voice cards, history rows, model management, and story editor
State-aware aria-label attributes on all interactive controls

UI Polish

Redesigned landing page with animated ControlUI hero, multi-engine copy, model cards, and voice creator section (#274)
Glassmorphic active state for sidebar buttons with accent border shine
Voices tab overhaul with inline inspector
Re...

Assets 12

23 Feb 22:01

github-actions

v0.1.13

38bf96f

v0.1.13

What's Changed

Stability and reliability

#95 Fix: selecting 0.6B model still downloads and uses 1.7B
#93 fix(mlx): bundle native libs and broaden error handling for Apple Silicon
#79 fix: handle non-ASCII filenames in Content-Disposition headers
#78 fix: guard getUserMedia call against undefined mediaDevices in non-secure contexts
#77 fix: await for confirmation before deleting voices and channels
#128 fix: resolve multiple issues (#96, #119, #111, #108, #121, #125, #127)
#40 Fix: audio export path resolution

Build and packaging

#122 fix(web): add @tailwindcss/vite plugin to web config
#126 Create requirements.txt

UX and docs

#44 Enhances floating generate box UX
#57 chore: updates repo URL in README
#146 Add Spacebot banner to landing page
#1 Improvements

Assets 8

31 Jan 10:10

github-actions

v0.1.12

0209008

v0.1.12

Model Download UX Overhaul

Real-time download progress tracking with accurate percentage and speed info
No more downloading notifications during generation even when its not downloading
Better error handling and status reporting throughout the download process

Other Improvements

Enhanced health check endpoint with GPU type information
Improved model caching verification
More reliable SSE progress updates
Actual update notifications, you don't need to go to settings and manually check anymore

Note: CUDA support for windows coming next update see issue and my plan.

Assets 13

30 Jan 11:38

github-actions

v0.1.11

1b66a52

v0.1.11

Fixed transcriptions on MLX
Fixed model download progress (finally)

Assets 13

30 Jan 09:10

github-actions

v0.1.10

eba1244

v0.1.10

Faster generation on Apple Silicon

Massive speed gains, from around 20s per generation to 2-3s!

Added native MLX backend support for Apple Silicon, providing significantly faster TTS and STT generation on M-series macOS machines.

Note: this update broke transcriptions on Apple Silicon only, the patch is in the oven as we speak, 0.1.11 will follow.

Features

MLX Backend: New backend implementation optimized for Apple Silicon using MLX framework
Dynamic Backend Selection: Automatically detects platform and selects between MLX (macOS) and PyTorch (other platforms)
Improved Performance: Leverages Apple's unified memory architecture for faster model inference

Backend Changes

Refactored TTS and STT logic into modular backend implementations (mlx_backend.py, pytorch_backend.py)
Added platform detection system to handle backend selection at runtime
Updated model loading and caching to support both backend types
Enhanced health check endpoints to report active backend type

Build & Release

Updated build process to include MLX-specific dependencies for macOS builds
Modified release workflow to handle platform-specific backend bundling
Added requirements-mlx.txt for MLX dependencies

Documentation

Updated setup and building guides with MLX-specific instructions
Added troubleshooting guidance for MLX-related issues
Enhanced architecture documentation to explain backend selection

Assets 13

30 Jan 01:44

github-actions

v0.1.9

236e464

v0.1.9

Improved voice profile creation flow,

Voice create drafts: No longer lose work if you close the model
Fixed whisper only transcribing English or Chinese, now has support for all languages

Improved Stories editor:

Added spacebar for play/pause
Timeline now auto-scrolls to follow playhead during playback
Fixed misalignment of the items with mouse when picking up
Fixed hitbox for selecting an item
Fixed playhead jumping forward when pressing play (the timing anchors bug)

Generation box improvements

Instruct mode no longer wipes prompt text
Improved UI cleanliness

Misc

Fixed "Model downloading" toast during generation when model is already downloaded

Assets 13

29 Jan 12:03

github-actions

v0.1.8

3c89b06

v0.1.8

🐛 Bug Fixes

Model Download Timeout Issues

Fixed critical issue where model downloads would fail with "Failed to fetch" errors on Windows:

Root Cause: Multi-GB model downloads exceeded HTTP request timeout (30-60s), causing frontend to show errors even though downloads were continuing in background
Solution: Refactored download endpoints to return immediately and continue downloads in background
/models/download endpoint now returns instantly with download starting in background
/generate and /transcribe endpoints now auto-start model downloads when needed
Returns 202 Accepted status with download progress information for better UX
Frontend can track download progress via SSE endpoint and retry when complete

Cross-Platform Cache Path Issues

Fixed hardcoded ~/.cache/huggingface/hub paths that don't work on Windows
All cache paths now use hf_constants.HF_HUB_CACHE for proper cross-platform support
Windows: Uses %USERPROFILE%\.cache\huggingface\hub or %LOCALAPPDATA%
macOS/Linux: Uses ~/.cache/huggingface/hub
Ensures HuggingFace cache directory exists on startup (defensive fix)

✨ Features

Windows Process Management

Added /shutdown endpoint for graceful server shutdown on Windows
Improved process lifecycle management for bundled server binary

GPU Detection Improvements

Added gpu_type field to health check response
Now shows specific GPU type: "CUDA (GPU Name)", "MPS (Apple Silicon)", or None
Fixes UI showing "GPU: Not Available" when MPS/CUDA is actually detected

Assets 13

Releases: jamiepine/voicebox

v0.3.0

Settings Overhaul (#294)

Audio Player Fix (#293)

Backend Refactor (#285)

Documentation Rewrite (#288)

UI & Frontend

Bug Fixes (#295)

Platform Fixes

Release Tooling

Uh oh!

v0.2.3

Model Downloads Now Actually Work

PyInstaller Fixes

Updater

Other Fixes

Uh oh!

v0.2.2

v0.2.2

Uh oh!

v0.2.1

The best local voice cloning tool, just got better...

New TTS Engines

Multi-Engine Architecture

Chatterbox Multilingual — 23 Languages (#257)

LuxTTS — Lightweight English TTS (#254)

Chatterbox Turbo — Expressive English (#258)

Paralinguistic Tags Autocomplete (#265)

Generation

Unlimited Generation Length — Auto-Chunking (#266)

Asynchronous Generation Queue (#269)

Generation Versions

Language Parameter Fix

Post-Processing Effects (#271)

Platform Support

Windows Support (#272)

Linux (#262)

NVIDIA CUDA Backend Swap (#252)

Intel Arc (XPU) and DirectML

Docker + Web Deployment (#161)

Whisper Turbo

Model Management (#268)

Security & Reliability

Accessibility (#243)

UI Polish

Uh oh!

v0.1.13

What's Changed

Stability and reliability

Build and packaging

UX and docs

Uh oh!

v0.1.12

Model Download UX Overhaul

Other Improvements

Uh oh!

v0.1.11

Uh oh!

v0.1.10

Faster generation on Apple Silicon

Features

Backend Changes

Build & Release

Documentation

Uh oh!

v0.1.9

Improved voice profile creation flow,

Improved Stories editor:

Generation box improvements

Misc

Uh oh!

v0.1.8

🐛 Bug Fixes

Model Download Timeout Issues

Cross-Platform Cache Path Issues

✨ Features

Windows Process Management

GPU Detection Improvements

Uh oh!