feat: split CUDA backend into independently versioned server + libs archives#298
feat: split CUDA backend into independently versioned server + libs archives#298
Conversation
Enrich tts-engines.mdx with patterns discovered during TADA integration: - Phase 0.2: new greps for @torch.jit.script, torchaudio.load, gated repos - Phase 3.4: model naming inconsistency warning - Phase 5.2: TADA shim failure added to lessons table - Phase 6: four new workaround sections (gated repos, torchcodec, torch.jit.script, toxic dependency shim pattern) - Checklist: four new items matching the new scan patterns - Remove TADA from upcoming engines (now shipped) Add CUDA_LIBS_ADDON.md exploring --onedir split to avoid 2.4GB redownloads on every version bump.
…rchives Switch CUDA builds from PyInstaller --onefile to --onedir and split the output into two separately versioned archives: 1. Server core (~200-400MB) — versioned with the app, redownloaded on every app update 2. CUDA libs (~2GB) — versioned independently (cu126-v1), only redownloaded when the CUDA toolkit or torch version changes This eliminates the ~2.4GB full redownload on every version bump. After initial setup, most app updates only need ~200-400MB. Closes #297
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughConverts the CUDA build from a PyInstaller --onefile binary to an --onedir build split into two archives (server core + CUDA libs), updates CI packaging and release uploads, adds packaging and extraction tooling, revamps backend download/extract/manifest logic, and adapts runtime spawn/CWD and build tooling. Changes
Sequence Diagram(s)sequenceDiagram
participant App as App Client
participant Service as CUDA Service
participant Downloader as HTTP Downloader
participant Extractor as Archive Extractor
participant FS as File System
App->>Service: check_and_update_cuda_binary()
rect rgba(100,150,200,0.5)
Note over Service: Server component check
Service->>FS: get_cuda_binary_path() / get installed server version (cwd=onedir)
Service->>Service: _needs_server_download()?
end
alt Server needs update
Service->>Downloader: download server archive + .sha256
Downloader->>Downloader: verify SHA-256
Downloader->>Extractor: extract server archive -> cuda_dir
Extractor->>FS: write server files
end
rect rgba(150,100,200,0.5)
Note over Service: CUDA libs component check
Service->>FS: read cuda-libs.json / get_installed_cuda_libs_version()
Service->>Service: _needs_cuda_libs_download()?
end
alt Libs need update
Service->>Downloader: download cuda-libs archive + .sha256
Downloader->>Downloader: verify SHA-256
Downloader->>Extractor: extract libs archive -> cuda_dir
Extractor->>FS: write NVIDIA libs
end
rect rgba(200,150,100,0.5)
Note over Service: Finalize install
Service->>FS: set executable permissions (Unix)
Service->>FS: write/update cuda-libs.json manifest
end
Service->>App: return updated status (includes cuda_libs_version)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tauri/src-tauri/src/main.rs (1)
209-242:⚠️ Potential issue | 🟠 MajorValidate the full CUDA layout before preferring the downloaded backend.
--versiononly proves the executable can start fromcuda_dir; it does not prove the extractednvidia/tree or top-level CUDA DLLs are still present. With the new split packaging, a half-extracted install can pass this check and then fail during startup or first inference. GateSome(exe_path)on an explicit structure check for both the core archive and the CUDA-libs payload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tauri/src-tauri/src/main.rs` around lines 209 - 242, The version check currently only verifies the executable starts from cuda_dir but can miss a half-extracted payload; before returning Some(exe_path) inside the version_ok branch, add an explicit layout validation that checks for the presence of the expected extracted files (e.g. the nvidia/ directory tree and the top-level CUDA DLLs or shared libraries and the core archive + cuda-libs payload) under cuda_dir. Implement a helper like validate_cuda_layout(cuda_dir) (called right before Some(exe_path)) that returns false if any required files or directories are missing, and only return Some(exe_path) when both version_ok and validate_cuda_layout(cuda_dir) are true. Ensure the check uses the same cuda_dir and exe_path context so startup/inference won’t fail due to a half-extracted install.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/release.yml:
- Around line 217-219: The workflow step named "Upload archives to GitHub
Release" currently uses softprops/action-gh-release@v1 which depends on Node 16
and will fail on current runners; update the action reference to
softprops/action-gh-release@v2 (replace softprops/action-gh-release@v1 with
softprops/action-gh-release@v2 in that step) so the step runs on the Node
20-based runtime used by v2.
In `@backend/services/cuda.py`:
- Around line 208-223: The extraction currently unpacks archives directly into
dest_dir (in backend/services/cuda.py) which can leave stale files; instead
extract into a staging directory (e.g., create a unique temp staging dir in the
same parent as dest_dir using tempfile.mkdtemp), update progress using
PROGRESS_KEY and progress_offset while extracting to that staging dir, verify
the extract completed successfully, then atomically swap the staging dir into
place (remove or rename the old dest_dir and os.replace or shutil.move the
staging dir to dest_dir) and only then unlink temp_path and log success; ensure
cleanup on failure (remove staging dir) so partial extracts never overlay the
live CUDA directory.
- Around line 159-168: The current code silently skips checksum verification
when fetching/parsing the .sha256 fails (sha256_url, sha_resp, expected_sha) and
proceeds to extract an unverified archive; change this to fail fast: in the
try/except around client.get(sha256_url) (and parsing the response into
expected_sha) propagate the error and abort the install instead of logging a
warning—e.g., on any non-200 status or exception from client.get or from parsing
sha_resp.text, log an error via logger.error including label and the underlying
exception/message and raise an exception (or return a failure sentinel) so
extraction never proceeds without a verified expected_sha. Ensure all callers of
the function that rely on expected_sha (the code paths that extract the archive)
handle the raised exception/false return.
In `@scripts/package_cuda.py`:
- Around line 102-110: When nvidia_files is empty in scripts/package_cuda.py,
stop the packaging process instead of just printing warnings: replace the
current warning-only branch that prints to sys.stderr with a fail-fast exit
(e.g., call sys.exit(1) or raise an exception) so the script aborts and no
cuda-libs-*.tar.gz is created/uploaded; ensure this change is applied where
nvidia_files is checked so the subsequent archive creation logic is not
executed.
---
Outside diff comments:
In `@tauri/src-tauri/src/main.rs`:
- Around line 209-242: The version check currently only verifies the executable
starts from cuda_dir but can miss a half-extracted payload; before returning
Some(exe_path) inside the version_ok branch, add an explicit layout validation
that checks for the presence of the expected extracted files (e.g. the nvidia/
directory tree and the top-level CUDA DLLs or shared libraries and the core
archive + cuda-libs payload) under cuda_dir. Implement a helper like
validate_cuda_layout(cuda_dir) (called right before Some(exe_path)) that returns
false if any required files or directories are missing, and only return
Some(exe_path) when both version_ok and validate_cuda_layout(cuda_dir) are true.
Ensure the check uses the same cuda_dir and exe_path context so
startup/inference won’t fail due to a half-extracted install.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3a3deb32-ad11-4644-b2f6-93f7ad574c26
📒 Files selected for processing (8)
.github/workflows/release.ymlbackend/build_binary.pybackend/services/cuda.pydocs/content/docs/developer/tts-engines.mdxdocs/plans/CUDA_LIBS_ADDON.mdscripts/package_cuda.pyscripts/split_binary.pytauri/src-tauri/src/main.rs
backend/services/cuda.py
Outdated
| # Extract (use data filter for path traversal protection on Python 3.12+) | ||
| progress.update_progress( | ||
| PROGRESS_KEY, | ||
| current=progress_offset + downloaded, | ||
| total=total_size, | ||
| filename=f"Extracting {label}...", | ||
| status="downloading", | ||
| ) | ||
| with tarfile.open(temp_path, "r:gz") as tar: | ||
| if sys.version_info >= (3, 12): | ||
| tar.extractall(path=dest_dir, filter="data") | ||
| else: | ||
| tar.extractall(path=dest_dir) | ||
|
|
||
| temp_path.unlink() | ||
| logger.info(f"{label}: extracted to {dest_dir}") |
There was a problem hiding this comment.
Avoid overlay extraction into the live CUDA directory.
This helper unpacks directly into backends/cuda/. When a new PyInstaller build removes or renames files, or extraction fails midway, stale files from the previous server/libs version survive and create a mixed onedir tree that can still look “installed”. Stage each archive and only swap or prune the component after a successful extract.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 217-217: Version block is outdated for minimum Python version
Remove outdated version block
(UP036)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/services/cuda.py` around lines 208 - 223, The extraction currently
unpacks archives directly into dest_dir (in backend/services/cuda.py) which can
leave stale files; instead extract into a staging directory (e.g., create a
unique temp staging dir in the same parent as dest_dir using tempfile.mkdtemp),
update progress using PROGRESS_KEY and progress_offset while extracting to that
staging dir, verify the extract completed successfully, then atomically swap the
staging dir into place (remove or rename the old dest_dir and os.replace or
shutil.move the staging dir to dest_dir) and only then unlink temp_path and log
success; ensure cleanup on failure (remove staging dir) so partial extracts
never overlay the live CUDA directory.
…lit_binary.py - Fix is_nvidia_file() to match NVIDIA DLLs in _internal/torch/lib/ (PyInstaller 6.18 + torch 2.10 no longer uses nvidia/ subdirectories) - Remove deprecated split_binary.py (both archives are under 2GB) - Update torch_compat range to >=2.6.0,<2.11.0 - Update build docs for new dual-archive packaging flow
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/package_cuda.py (1)
166-176: Manifest fieldstorch_compat,archive, andsha256are currently unused.The manifest includes these fields, but
backend/services/cuda.py:get_installed_cuda_libs_version()only reads the"version"field. Consider whether these additional fields should be used for validation (e.g., verifying torch compatibility at runtime, or checking archive integrity during updates).This isn't a bug—the extra fields provide useful metadata for future enhancements or manual debugging—but documenting their intended use would clarify whether they're placeholders or expected to be consumed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/package_cuda.py` around lines 166 - 176, The manifest written in scripts/package_cuda.py includes fields "torch_compat", "archive", and "sha256" that are not consumed by backend/services/cuda.py:get_installed_cuda_libs_version(); either remove or document them as metadata or update the consumer to validate using them. Fix by choosing one: (A) if they are intended as metadata, add a brief comment in scripts/package_cuda.py describing their purpose and intended future use so reviewers know they are placeholders; or (B) if they should be used, update get_installed_cuda_libs_version() to read and validate "torch_compat" against the runtime torch version and verify "archive"/"sha256" (or at least checksum presence) during install/update; reference the manifest keys "torch_compat", "archive", "sha256" and the function get_installed_cuda_libs_version to locate where to apply the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/package_cuda.py`:
- Around line 53-84: The nvidia/ subtree check in is_nvidia_file incorrectly
tests membership against a list slice; replace the faulty condition ("nvidia/"
in rel_lower.split("/", 1)[-1:]) with a proper substring check such as
("nvidia/" in rel_lower.split("/", 1)[-1]) or simply ("nvidia/" in rel_lower")
so nested nvidia/ directories (e.g. _internal/nvidia/...) are detected; update
the branch in is_nvidia_file that also references NVIDIA_KEEP_IN_CORE and
NVIDIA_DLL_PREFIXES accordingly.
---
Nitpick comments:
In `@scripts/package_cuda.py`:
- Around line 166-176: The manifest written in scripts/package_cuda.py includes
fields "torch_compat", "archive", and "sha256" that are not consumed by
backend/services/cuda.py:get_installed_cuda_libs_version(); either remove or
document them as metadata or update the consumer to validate using them. Fix by
choosing one: (A) if they are intended as metadata, add a brief comment in
scripts/package_cuda.py describing their purpose and intended future use so
reviewers know they are placeholders; or (B) if they should be used, update
get_installed_cuda_libs_version() to read and validate "torch_compat" against
the runtime torch version and verify "archive"/"sha256" (or at least checksum
presence) during install/update; reference the manifest keys "torch_compat",
"archive", "sha256" and the function get_installed_cuda_libs_version to locate
where to apply the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78fbc69c-fc65-4059-bb50-380478dc6126
📒 Files selected for processing (6)
.github/workflows/release.yml.gitignoredocs/content/docs/developer/building.mdxdocs/plans/CUDA_LIBS_ADDON.mdscripts/package_cuda.pyscripts/split_binary.py
💤 Files with no reviewable changes (1)
- scripts/split_binary.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@justfile`:
- Around line 211-214: The Copy-Item call is creating an extra top-level folder
so the executable ends up at
.../backends/cuda/voicebox-server-cuda/voicebox-server-cuda.exe, but the
launcher in main.rs expects .../backends/cuda/voicebox-server-cuda.exe; to fix,
ensure $dest is created first (e.g., New-Item -ItemType Directory -Force $dest)
and then copy the contents of backend/dist/voicebox-server-cuda into $dest
(e.g., Copy-Item "backend/dist/voicebox-server-cuda/*" $dest -Recurse -Force or
pipe Get-ChildItem "backend/dist/voicebox-server-cuda" | Copy-Item -Destination
$dest -Recurse -Force) so the binary lands directly at the expected path rather
than nested under an extra directory.
- Upgrade softprops/action-gh-release@v1 to @v2 (Node 16 EOL) - Fail-fast on checksum fetch failure instead of extracting unverified archives - Abort packaging if no NVIDIA files found (prevents empty cuda-libs archive) - Fix nvidia/ path detection bug (list membership vs substring check) - Fix justfile Copy-Item nesting (copy contents, not the directory itself)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/services/cuda.py (1)
208-223:⚠️ Potential issue | 🟠 MajorDon't extract directly into the live CUDA tree.
Extracting each archive straight into
backends/cuda/means a server/libs update cannot remove files that disappeared from the previous component version, and a mid-extract failure leaves a mixed onedir tree that still looks installed. Please stage the component update and swap it into place only after a successful extract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/cuda.py` around lines 208 - 223, The code currently extracts archives directly into dest_dir which can leave a partially-updated tree; change the flow to extract into a temporary staging directory (e.g., dest_dir.with_name(f"{dest_dir.name}.staging.{uuid4()}") or similar) instead of dest_dir, using the same tarfile.open(temp_path, "r:gz") and the existing progress.update_progress calls, then after successful extract and cleanup of temp_path, atomically replace the live directory by moving/staging_dir -> dest_dir with os.replace/pathlib.Path.replace (or rename on same filesystem) so the swap is atomic; ensure any existing dest_dir is removed or moved to a backup before the replace, and clean up the staging directory on failure. Use the existing symbols temp_path, dest_dir, label, PROGRESS_KEY, and progress.update_progress to locate and update the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/services/cuda.py`:
- Around line 153-223: The staged temp file (.download-*.tmp) can be left behind
on any exception after creation; wrap the download/verify/extract sequence (the
async with client.stream(...) block, the integrity check using hashlib.sha256(),
and tarfile.open(...).extractall()) in a try/finally so the finally always
checks temp_path.exists() and unlink()s it before re-raising/returning; remove
the existing temp_path.unlink() at the end and instead rely on the finally to
delete the file on both success and failure (or delete conditionally in finally
if you prefer), referencing temp_path, client.stream, hashlib.sha256, and
tarfile.open to locate the code to change.
---
Duplicate comments:
In `@backend/services/cuda.py`:
- Around line 208-223: The code currently extracts archives directly into
dest_dir which can leave a partially-updated tree; change the flow to extract
into a temporary staging directory (e.g.,
dest_dir.with_name(f"{dest_dir.name}.staging.{uuid4()}") or similar) instead of
dest_dir, using the same tarfile.open(temp_path, "r:gz") and the existing
progress.update_progress calls, then after successful extract and cleanup of
temp_path, atomically replace the live directory by moving/staging_dir ->
dest_dir with os.replace/pathlib.Path.replace (or rename on same filesystem) so
the swap is atomic; ensure any existing dest_dir is removed or moved to a backup
before the replace, and clean up the staging directory on failure. Use the
existing symbols temp_path, dest_dir, label, PROGRESS_KEY, and
progress.update_progress to locate and update the logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b5322fa8-3f34-4169-ad5d-96e7bb20b717
📒 Files selected for processing (4)
.github/workflows/release.ymlbackend/services/cuda.pyjustfilescripts/package_cuda.py
🚧 Files skipped from review as they are similar to previous changes (1)
- justfile
…r path - Wrap download/verify/extract in try/finally so .download-*.tmp is always deleted, even on mid-download or extraction failures - Fix justfile build-server-cuda to use sh.voicebox.app (production path)
Summary
Splits the CUDA backend from a single ~2.4GB
--onefilebinary into two independently versioned archives, eliminating full redownloads on every app version bump.--onefileto--onedir, then package into two.tar.gzarchives: server core (~200-400MB, versioned with app) and CUDA libs (~2GB, versioned independently ascu126-v1)cuda.pydownload service for dual-archive extraction with independent version checks — only downloads what actually changedmain.rsto launch CUDA backend frombackends/cuda/directory with.current_dir()so PyInstaller finds NVIDIA DLLs relative to the exeDownload impact
Files changed
backend/build_binary.py— CUDA builds use--onedirinstead of--onefilescripts/package_cuda.py— New script to split onedir output into server-core + cuda-libs archives withcuda-libs.jsonmanifestbackend/services/cuda.py— Rewritten for dual-archive download, dual version checking, tar.gz extractiontauri/src-tauri/src/main.rs— CUDA binary detection atbackends/cuda/,.current_dir()on spawn.github/workflows/release.yml— CI produces two archives instead of split binary partsscripts/split_binary.py— Deprecated (replaced bypackage_cuda.py)Closes #297
Summary by CodeRabbit
New Features
Chores
Documentation