Skip to content

Conversation

@AnthonyRonning
Copy link
Contributor

@AnthonyRonning AnthonyRonning commented Dec 4, 2025

Summary

Adds text-to-speech functionality to Maple desktop app using Supertonic, a fast on-device TTS engine.

Features

  • On-device TTS: All processing happens locally, no cloud API calls
  • One-time download: ~265MB models downloaded from HuggingFace on first use
  • Speaker button: Added to assistant messages (desktop only)
  • High quality: F2 voice, 10 inference steps, 1.2x speed, 50ms pause between chunks

Technical Details

  • Rust backend using ONNX Runtime for inference
  • React context for state management
  • Downloads models to app data directory
  • Supports 4 voice styles (F1, F2, M1, M2) - currently using F2

Testing

  • Desktop build works
  • iOS build (need to test)
  • Android build (need to test)

Notes

  • TTS button only appears in desktop app (Tauri environment)
  • Models stored in ~/Library/Application Support/cloud.opensecret.maple/tts_models/ on macOS

Summary by CodeRabbit

  • New Features

    • On-device Text-to-Speech: synthesize WAV (base64), play audio, and manage playback.
    • Model lifecycle controls: check status, download with progress, load, unload, and delete local models.
    • UI integration: in-app TTS setup/download dialog, per-message read-aloud controls, and app-wide TTS provider/hook.
  • Documentation

    • Added comprehensive TTS research and integration guidance.
  • Chores

    • Added desktop debug build targets and Linux bundling tweak for media framework.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

Adds an on-device TTS subsystem: Rust/Tauri backend with ONNX-based synthesis and Tauri commands, a React TTS context/hook for lifecycle and playback, UI components for model download/setup and per-message playback, debug desktop build targets, an AppImage bundling flag, and a TTS research doc.

Changes

Cohort / File(s) Summary
Rust TTS Backend
frontend/src-tauri/Cargo.toml, frontend/src-tauri/src/lib.rs, frontend/src-tauri/src/tts.rs
New desktop-target dependencies and a full TTS module: model discovery/download, load/unload, ONNX multi-stage synthesis pipeline, TTS state (TTSState), and Tauri commands: tts_get_status, tts_download_models, tts_load_models, tts_synthesize, tts_unload_models, tts_delete_models.
Frontend TTS Context
frontend/src/services/tts/TTSContext.tsx, frontend/src/services/tts/index.ts
New TTSProvider, useTTS hook and TTSStatus type; manages env detection, IPC invokes/events, status lifecycle (checking → downloading → loading → ready → deleting/error), download progress, synthesize invocation, WebAudio playback, and cleanup.
UI Components & Chat Integration
frontend/src/components/TTSDownloadDialog.tsx, frontend/src/components/UnifiedChat.tsx, frontend/src/app.tsx
Adds TTSDownloadDialog modal and per-message TTSButton; extends message-list props with TTS callbacks (onTTSSetupOpen, onTTSManage); wraps app with TTSProvider and wires TTS flows into chat UI.
Build & Bundling
justfile, frontend/src-tauri/tauri.conf.json
Adds desktop-build-debug and desktop-build-debug-no-cc just targets; enables linux.appimage.bundleMediaFramework: true in Tauri bundling config.
Documentation
docs/tts-research.md
New research document comparing local TTS model options, architectures, deployment strategies, mobile constraints, and integration guidance.

Sequence Diagram(s)

sequenceDiagram
    participant UI as TTSDownloadDialog
    participant React as TTS Context\n(useTTS)
    participant Tauri as Tauri IPC
    participant Rust as TTS Backend\n(tts.rs)
    participant FS as File System

    UI->>React: startDownload()
    React->>Tauri: invoke tts_download_models()
    Tauri->>Rust: request download
    Rust->>FS: stream & save model files
    loop progress events
      Rust-->>Tauri: emit download progress
      Tauri-->>React: tts-download-progress event
      React-->>UI: update progress UI
    end
    Rust-->>Tauri: download complete
    React->>Tauri: invoke tts_load_models()
    Tauri->>Rust: load ONNX sessions / init engine
    Rust-->>Tauri: load complete
    Tauri-->>React: ready
    React-->>UI: status = ready
Loading
sequenceDiagram
    participant User as User
    participant UI as TTSButton
    participant React as TTS Context
    participant Tauri as Tauri IPC
    participant Rust as TTS Backend
    participant Audio as Browser Audio

    User->>UI: click play(text)
    UI->>React: speak(text, id)
    React->>Tauri: invoke tts_synthesize(text)
    Tauri->>Rust: synthesize request
    Rust->>Rust: preprocess → duration → encode → denoise → vocoder
    Rust-->>Tauri: TTSSynthesizeResponse (audio_base64)
    Tauri-->>React: deliver response
    React->>Audio: create Blob/URL and play
    Audio-->>React: onended
    React-->>UI: update playing state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through crates and ONNX nights,

Models stored in bundles, tiny bytes,
I taught the app to hum and sing,
Click play — the desktop hears the spring,
A rabbit stitched the audio wings.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add on-device TTS using Supertonic' accurately and clearly describes the primary change: adding on-device text-to-speech functionality using the Supertonic engine to the application.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 016e4e4 and 503318f.

📒 Files selected for processing (1)
  • frontend/src-tauri/tauri.conf.json
✅ Files skipped from review due to trivial changes (1)
  • frontend/src-tauri/tauri.conf.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 4, 2025

Deploying maple with  Cloudflare Pages  Cloudflare Pages

Latest commit: 503318f
Status: ✅  Deploy successful!
Preview URL: https://bf79f6bb.maple-ca8.pages.dev
Branch Preview URL: https://feature-tts-supertonic.maple-ca8.pages.dev

View logs

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 4, 2025

Greptile Summary

This PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine. The implementation downloads ~265MB of models from HuggingFace on first use and performs all synthesis locally on the desktop app.

Key Changes:

  • Rust backend (tts.rs): Complete TTS pipeline with ONNX Runtime for inference, Unicode text processing, model downloading with progress tracking, and WAV encoding
  • React context (TTSContext.tsx): State management for TTS lifecycle, download progress, and Web Audio API playback
  • UI integration: Download dialog, per-message speaker buttons with long-press for settings
  • Desktop-only: Dependencies properly scoped to desktop platforms; gracefully degrades on web/mobile

Previous Review Feedback Addressed:

  • Regex compilation optimized using Lazy static (no longer recompiled on every call)
  • Memory leaks fixed: object URLs properly revoked on stop and unmount
  • File integrity: explicit flush before renaming temp files to final paths
  • Sentence splitting preserves punctuation correctly

Minor Issues:

  • State inconsistency when preprocessed text is empty (one logic issue found)
  • Voice selection hardcoded to F2 (style improvement suggested)
  • Duplicate markdown removal split between frontend and backend (acceptable design choice)

Confidence Score: 4/5

  • This PR is safe to merge with one minor logic fix recommended
  • The implementation is solid with comprehensive error handling, proper resource cleanup, and desktop-only scoping. Previous review concerns have been addressed. One logic issue exists (state inconsistency on empty text) but won't cause crashes. The ONNX models are downloaded from the official Supertonic HuggingFace repo with integrity checks.
  • frontend/src/services/tts/TTSContext.tsx - fix state inconsistency when preprocessed text is empty (line 217)

Important Files Changed

Filename Overview
frontend/src-tauri/src/tts.rs Core TTS implementation with ONNX inference, text preprocessing, model downloading, and WAV encoding. Previous review feedback addressed (regex compilation, URL revocation, flush on download).
frontend/src/services/tts/TTSContext.tsx React context managing TTS state, download progress, and audio playback via Web Audio API. All memory leak issues from previous reviews have been fixed.
frontend/src/components/TTSDownloadDialog.tsx Modal dialog for TTS setup with download progress, status indicators, and model management. Clean implementation with proper state handling.
frontend/src-tauri/Cargo.toml Added desktop-only TTS dependencies (ort, ndarray, hound, etc.) properly scoped to desktop platforms only.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as React UI
    participant TTSContext
    participant Tauri as Tauri Bridge
    participant Rust as Rust Backend
    participant HF as HuggingFace
    participant ONNX as ONNX Runtime

    Note over User,ONNX: Initial Setup Flow
    User->>UI: Opens TTS Dialog
    UI->>TTSContext: checkStatus()
    TTSContext->>Tauri: invoke("tts_get_status")
    Tauri->>Rust: tts_get_status()
    Rust->>Rust: Check if models exist
    Rust-->>Tauri: TTSStatusResponse
    Tauri-->>TTSContext: {models_downloaded, models_loaded}
    TTSContext-->>UI: Update status

    alt Models not downloaded
        User->>UI: Click Download
        UI->>TTSContext: startDownload()
        TTSContext->>Tauri: invoke("tts_download_models")
        Tauri->>Rust: tts_download_models()
        loop For each model file
            Rust->>HF: HTTP GET model file
            HF-->>Rust: Stream chunks
            Rust->>Rust: Write to temp file
            Rust->>Tauri: emit("tts-download-progress")
            Tauri-->>TTSContext: Progress event
            TTSContext-->>UI: Update progress bar
        end
        Rust->>Rust: Flush and rename temp files
        Rust-->>Tauri: Success
    end

    Note over User,ONNX: Model Loading Flow
    TTSContext->>Tauri: invoke("tts_load_models")
    Tauri->>Rust: tts_load_models()
    Rust->>Rust: Load tts.json config
    Rust->>Rust: Load unicode_indexer.json
    Rust->>ONNX: Load duration_predictor.onnx
    Rust->>ONNX: Load text_encoder.onnx
    Rust->>ONNX: Load vector_estimator.onnx
    Rust->>ONNX: Load vocoder.onnx
    Rust->>Rust: Load F2.json voice style
    Rust->>Rust: Store in TTSState
    Rust-->>Tauri: Success
    Tauri-->>TTSContext: Models loaded
    TTSContext-->>UI: status = "ready"

    Note over User,ONNX: Text-to-Speech Synthesis
    User->>UI: Click speaker icon
    UI->>TTSContext: speak(text, messageId)
    TTSContext->>TTSContext: preprocessTextForTTS()
    TTSContext->>Tauri: invoke("tts_synthesize", {text})
    Tauri->>Rust: tts_synthesize(text)
    Rust->>Rust: chunk_text(text, 300)
    loop For each chunk
        Rust->>Rust: preprocess_text(chunk)
        Rust->>Rust: UnicodeProcessor.call()
        Rust->>ONNX: Run duration_predictor
        ONNX-->>Rust: Duration predictions
        Rust->>ONNX: Run text_encoder
        ONNX-->>Rust: Text embeddings
        Rust->>Rust: sample_noisy_latent()
        loop 10 denoising steps
            Rust->>ONNX: Run vector_estimator
            ONNX-->>Rust: Denoised latent
        end
        Rust->>ONNX: Run vocoder
        ONNX-->>Rust: Audio waveform
        Rust->>Rust: Concatenate with silence
    end
    Rust->>Rust: wav_to_base64()
    Rust-->>Tauri: TTSSynthesizeResponse
    Tauri-->>TTSContext: {audio_base64, sample_rate}
    TTSContext->>TTSContext: base64ToBlob()
    TTSContext->>TTSContext: Create AudioContext
    TTSContext->>TTSContext: Decode audio buffer
    TTSContext->>User: Play audio via Web Audio API
    TTSContext-->>UI: isPlaying = true

    Note over User,ONNX: Stop/Cleanup
    User->>UI: Click stop or unmount
    UI->>TTSContext: stop()
    TTSContext->>TTSContext: sourceNode.stop()
    TTSContext->>TTSContext: audioContext.close()
    TTSContext->>TTSContext: URL.revokeObjectURL()
    TTSContext-->>UI: isPlaying = false
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
frontend/src/components/UnifiedChat.tsx (1)

179-233: TTSButton implementation looks solid with proper environment check and state handling.

The component correctly:

  • Returns null when not in Tauri environment (line 193-195)
  • Handles different TTS states appropriately
  • Provides proper aria-labels for accessibility

However, there's a potential issue with the async handleClick function:

The speak call on line 207 is awaited but errors are not handled. Consider adding error handling:

  const handleClick = async () => {
    if (status === "not_downloaded" || status === "error") {
      onNeedsSetup();
      return;
    }

    if (status === "ready") {
      if (isThisPlaying) {
        stop();
      } else {
-        await speak(text, messageId);
+        try {
+          await speak(text, messageId);
+        } catch (error) {
+          console.error("TTS playback failed:", error);
+        }
      }
    }
  };
frontend/src-tauri/src/tts.rs (3)

115-219: Consider compiling regexes once to improve performance.

The preprocess_text function creates multiple Regex objects on every call. Since this function is called for each text chunk during synthesis, this causes repeated compilation overhead.

Consider using lazy_static or once_cell::sync::Lazy to compile regexes once:

use once_cell::sync::Lazy;

static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap()
});

static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\u{0302}...]").unwrap()
});

// ... other regexes

fn preprocess_text(text: &str) -> String {
    let mut text: String = text.nfkd().collect();
    text = EMOJI_PATTERN.replace_all(&text, "").to_string();
    // ...
}

This is especially important since chunk_text may split long messages into many chunks, each processed separately.


734-771: Mutex held during CPU-intensive synthesis may cause contention.

The mutex guard is held at line 739 and only released at line 759, after the potentially long-running synthesize() call completes. For longer texts, ONNX inference across multiple chunks could take several seconds, blocking all other TTS commands.

Consider whether the ONNX sessions truly need mutable access. If Session::run only requires &self, you could restructure to minimize lock duration:

// Clone what's needed, release lock quickly
let (tts_data, style) = {
    let guard = state.lock().map_err(|e| e.to_string())?;
    let style = guard.style.as_ref().ok_or("Voice style not loaded")?.clone();
    // If possible, clone or Arc the session references
    // ...
};

// Synthesize outside the lock
let audio = /* ... */;

Alternatively, if mutable access is truly required by ONNX Runtime, this is acceptable for now but document the limitation. For production, consider using tokio::task::spawn_blocking to avoid blocking the async runtime.


536-566: Consider configuring ONNX session options for better performance.

The ONNX sessions are created with default settings. For a better user experience, you might want to configure the execution:

use ort::session::SessionBuilder;

let dp_ort = SessionBuilder::new()?
    .with_intra_threads(4)?  // Limit thread usage
    .commit_from_file(models_dir.join("duration_predictor.onnx"))?;

This is optional but could help with:

  • Controlling CPU usage during inference
  • Potentially enabling GPU acceleration in the future
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad67576 and fc94dfa.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • frontend/src-tauri/Cargo.toml (1 hunks)
  • frontend/src-tauri/src/lib.rs (4 hunks)
  • frontend/src-tauri/src/pdf_extractor.rs (1 hunks)
  • frontend/src-tauri/src/tts.rs (1 hunks)
  • frontend/src/app.tsx (2 hunks)
  • frontend/src/components/TTSDownloadDialog.tsx (1 hunks)
  • frontend/src/components/UnifiedChat.tsx (8 hunks)
  • frontend/src/services/tts/TTSContext.tsx (1 hunks)
  • frontend/src/services/tts/index.ts (1 hunks)
  • justfile (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling

Files:

  • frontend/src/services/tts/index.ts
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/app.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/UnifiedChat.tsx
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use strict TypeScript typing and avoid any when possible

Files:

  • frontend/src/services/tts/index.ts
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/app.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/UnifiedChat.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (4)
frontend/src/components/TTSDownloadDialog.tsx (1)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/app.tsx (5)
frontend/src/services/tts/index.ts (1)
  • TTSProvider (1-1)
frontend/src/components/BillingServiceProvider.tsx (1)
  • BillingServiceProvider (5-14)
frontend/src/components/ProxyEventListener.tsx (1)
  • ProxyEventListener (7-58)
frontend/src/components/UpdateEventListener.tsx (1)
  • UpdateEventListener (11-65)
frontend/src/components/DeepLinkHandler.tsx (1)
  • DeepLinkHandler (9-109)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
  • TTSStatus (2-2)
  • TTSProvider (1-1)
  • useTTS (1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
  • new (86-88)
  • new (341-346)
  • tts_get_status (602-620)
  • tts_download_models (631-704)
  • tts_load_models (707-725)
  • tts_synthesize (735-771)
  • tts_unload_models (774-780)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-ios
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (16)
justfile (1)

46-57: Desktop debug build targets look consistent and useful

The new debug and debug-no-cc recipes mirror the existing release targets and should help diagnose desktop issues without impacting current workflows.

frontend/src/services/tts/TTSContext.tsx (1)

60-72: Overall TTS provider shape and status lifecycle look solid

The context cleanly encapsulates TTS state (status, errors, progress, playback) and exposes a small surface (checkStatus, startDownload, speak, stop). Checking isTauri() once, defaulting to "not_available" off-Tauri, and auto-checking status on mount is consistent with the existing platform utilities and React-context usage.

frontend/src/app.tsx (1)

17-17: TTSProvider placement in the provider tree looks appropriate

Wrapping BillingServiceProvider (and thus ProxyEventListener, UpdateEventListener, DeepLinkHandler, and InnerApp) with TTSProvider ensures all consumers of useTTS sit under the context without altering existing provider ordering. This is a good integration point.

Also applies to: 102-109

frontend/src-tauri/Cargo.toml (1)

42-53: TTS dependency set looks coherent; please verify cross-platform builds and versions

The added crates (ONNX Runtime via ort, ndarray with rayon, reqwest with streaming, dirs, etc.) are consistent with a local Supertonic TTS pipeline. Since they’re in the shared [dependencies] section (not cfg-gated), they’ll be compiled for all targets, including Android/iOS.

Please verify:

  • That ort = "2.0.0-rc.7" and the other new crates build cleanly for your mobile targets.
  • That there are no unexpected size or licensing issues introduced by this set.

Consider adding cfg-gating or feature flags later if you decide TTS should remain desktop-only.

frontend/src-tauri/src/pdf_extractor.rs (1)

19-27: PDF extractor changes are formatting-only and safe

The adjustments here are purely stylistic (parameter layout and whitespace) and do not affect decoding, file-type branching, or response construction.

Also applies to: 33-38, 43-51

frontend/src/services/tts/index.ts (1)

1-2: Barrel re-exports for TTS are clean and useful

Re-exporting TTSProvider, useTTS, and TTSStatus from the TTS module simplifies imports for consumers and keeps the public surface centralized.

frontend/src-tauri/src/lib.rs (2)

5-8: TTS state management and command wiring on desktop look correct

Registering mod tts;, managing tts::TTSState::new(), and adding the TTS commands to the desktop invoke_handler cleanly integrates the backend TTS pipeline. Scoping these commands to the desktop builder matches the “desktop-only TTS” goal.

Also applies to: 39-54


387-399: Update-ready event emission and logging structure are clear

The app_handle.emit("update-ready", UpdateReadyPayload { ... }) block, along with the success/failure logging, is straightforward and keeps the updater behavior unchanged while improving observability.

frontend/src/components/TTSDownloadDialog.tsx (1)

1-160: TTS download/setup dialog behavior matches the context state machine well

The dialog cleanly maps TTSStatus into UX states (not available, error, processing, ready, initial), wires startDownload appropriately, and prevents accidental closure while work is in progress. The progress bar and messaging around the one-time ~MB download and local-only processing are clear.

frontend/src/components/UnifiedChat.tsx (3)

412-427: LGTM!

The MessageList signature update correctly adds the onTTSSetupOpen callback with proper typing. The integration follows the existing pattern for handling dialogs in this component.


626-633: LGTM!

The TTSButton integration mirrors the existing CopyButton pattern, reusing the same text extraction logic. The button is properly placed within the hover-reveal action group for assistant messages.


3023-3024: LGTM!

The TTS setup dialog follows the same pattern as other dialogs in the component (e.g., WebSearchInfoDialog, ContextLimitDialog). State management is consistent with the rest of the codebase.

frontend/src-tauri/src/tts.rs (4)

335-347: LGTM!

The TTSState design with Mutex<Self> is appropriate here. While using std::sync::Mutex in async contexts can be problematic if held across .await points, the current implementation correctly performs all mutex-guarded operations synchronously before any awaits.


281-333: Text chunking logic is functional but has the same regex compilation overhead.

The function works correctly for splitting text into manageable chunks. The regex objects (para_re, sentence_re) are recreated on each call - same recommendation as preprocess_text applies.

Note: The sentence boundary regex ([.!?])\s+ may not handle all edge cases (e.g., abbreviations like "Dr. Smith"), but this is acceptable for TTS where minor imperfections in chunk boundaries won't significantly affect output quality.


568-588: LGTM!

The WAV encoding function correctly clamps audio samples to prevent overflow and uses proper 16-bit PCM encoding. The in-memory buffer approach is efficient for the expected audio sizes.


706-725: LGTM!

The model loading function is straightforward. While it doesn't explicitly check if models are downloaded first, the error messages from load_tts_engine and load_voice_style will be informative if files are missing. The frontend's TTSContext guards against calling this before download completes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
frontend/src/services/tts/TTSContext.tsx (2)

10-10: Use isTauriDesktop() instead of isTauri() to prevent mobile TTS errors.

According to past review comments, TTS commands are only registered in the desktop builder. Using isTauri() returns true for both desktop and mobile Tauri environments, which would cause "unknown command" errors on mobile when invoking tts_get_status.

Replace isTauri with isTauriDesktop throughout this file (lines 10, 62, 78, 112, 118, 163).

-import { isTauri } from "@/utils/platform";
+import { isTauriDesktop } from "@/utils/platform";

And update usage:

-  const isTauriEnv = isTauri();
+  const isTauriEnv = isTauriDesktop();

208-215: Memory leak: Audio object URL not revoked on stop() or unmount.

When stop() is called or the provider unmounts during playback, the object URL created in speak() is never revoked, causing memory leaks over repeated play/stop cycles.

Add a ref to track the URL and revoke it in stop() and the unmount effect:

  const audioRef = useRef<HTMLAudioElement | null>(null);
+  const audioUrlRef = useRef<string | null>(null);

In stop():

  const stop = useCallback(() => {
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
+    if (audioUrlRef.current) {
+      URL.revokeObjectURL(audioUrlRef.current);
+      audioUrlRef.current = null;
+    }
    setIsPlaying(false);
    setCurrentPlayingId(null);
  }, []);

And store the URL in speak():

  const audioUrl = URL.createObjectURL(audioBlob);
+  audioUrlRef.current = audioUrl;
frontend/src-tauri/src/tts.rs (2)

295-347: Sentence splitting loses punctuation marks.

The regex pattern r"([.!?])\s+" uses a capture group, but split() discards captured groups, causing sentences to lose their ending punctuation.

Consider using split_inclusive or adjusting the pattern:

-    let sentence_re = Regex::new(r"([.!?])\s+").unwrap();
+    let sentence_re = Regex::new(r"(?<=[.!?])\s+").unwrap();

Or manually preserve punctuation by iterating matches instead of splitting. This affects the naturalness of TTS output since sentences will be synthesized without proper endings.


650-668: Add request timeout and handle partial downloads.

Two reliability concerns from past reviews still apply:

  1. No HTTP timeout: If the server hangs, the download blocks indefinitely.

  2. Partial file on failure: If download fails mid-stream, a partial file remains. Next tts_get_status will report models as downloaded.

+ use std::time::Duration;

  let client = reqwest::Client::builder()
+     .timeout(Duration::from_secs(300))
+     .connect_timeout(Duration::from_secs(30))
      .build()
-     .map_err(|e| format!("Failed to create HTTP client: {}", e))?;
+     .unwrap_or_else(|_| reqwest::Client::new());

For partial files, download to a .part file and rename on success:

+ let temp_path = file_path.with_extension("part");
- let mut file = File::create(&file_path)
+ let mut file = File::create(&temp_path)
      ...
+ // After successful download:
+ fs::rename(&temp_path, &file_path)?;
🧹 Nitpick comments (1)
frontend/src/components/UnifiedChat.tsx (1)

626-633: Consider extracting the text content logic to avoid duplication.

The same text extraction logic is duplicated between CopyButton and TTSButton. While acceptable for now, consider extracting this to a helper if more buttons are added.

const messageText = message.content
  .filter((p) => "text" in p && p.text)
  .map((p) => ("text" in p ? p.text : ""))
  .join("");
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc94dfa and e607185.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • frontend/src-tauri/Cargo.toml (1 hunks)
  • frontend/src-tauri/src/lib.rs (7 hunks)
  • frontend/src-tauri/src/pdf_extractor.rs (1 hunks)
  • frontend/src-tauri/src/tts.rs (1 hunks)
  • frontend/src/app.tsx (2 hunks)
  • frontend/src/components/TTSDownloadDialog.tsx (1 hunks)
  • frontend/src/components/UnifiedChat.tsx (8 hunks)
  • frontend/src/services/tts/TTSContext.tsx (1 hunks)
  • frontend/src/services/tts/index.ts (1 hunks)
  • justfile (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • frontend/src/app.tsx
  • frontend/src/services/tts/index.ts
  • justfile
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/pdf_extractor.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use strict TypeScript typing and avoid any when possible

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
  • TTSStatus (2-2)
  • TTSProvider (1-1)
  • useTTS (1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
  • init_tts_backend (20-30)
  • new (100-102)
  • new (355-360)
  • tts_get_status (616-634)
  • tts_download_models (645-718)
  • tts_load_models (721-739)
  • tts_synthesize (749-785)
  • tts_unload_models (788-794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (11)
frontend/src/components/UnifiedChat.tsx (2)

179-233: TTSButton implementation looks solid.

The component properly:

  • Guards rendering with isTauriEnv check
  • Handles all TTS status states appropriately
  • Provides clear visual feedback (spinner during loading states, stop icon during playback)
  • Uses the correct disabled states during transitional operations

3023-3024: TTS dialog integration is clean.

The dialog is properly controlled via state and placed appropriately in the component tree.

frontend/src/services/tts/TTSContext.tsx (1)

259-267: Helper function implementation is correct.

The base64ToBlob function correctly decodes base64 and creates a Blob with the appropriate MIME type.

frontend/src/components/TTSDownloadDialog.tsx (3)

18-24: Dialog implementation is well-structured.

The component properly handles all TTS status states with appropriate UI feedback. The handleDownload wrapper is a simple passthrough, which is fine but could be inlined if no additional logic is needed.


70-78: Error display is correctly guarded.

The condition hasError && error ensures the error panel only shows when there's an actual error message to display, preventing empty error states.


93-105: Progress bar behavior is well-designed.

The progress bar correctly shows 100% during the loading phase (after download completes), and the percentage text is appropriately hidden during non-download states.

frontend/src-tauri/src/lib.rs (2)

396-407: Minor formatting changes to emit call.

These are style/formatting adjustments with no functional impact.


264-273: > Likely an incorrect or invalid review comment.

frontend/src-tauri/src/tts.rs (3)

765-767: Synthesis parameters are hardcoded as per design.

The hardcoded values (10 inference steps, 1.2x speed) match the PR description. Consider exposing these as optional parameters in the future for user customization.


550-580: TTS engine loading is straightforward and correct.

The ONNX session creation uses default builder settings. Consider adding thread configuration or optimization options if performance tuning is needed later.


18-30: Platform-specific TTS backend initialization is correct.

The conditional compilation properly selects tract for iOS/Android and ONNX Runtime for desktop. This aligns with the cross-platform architecture.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from fc94dfa to ca87d46 Compare January 7, 2026 18:27
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Around line 1-30: The Gemini section has many run-on paragraphs and missing
blank lines (e.g., "2026The trajectory", "SuccessOn desktop") and improperly
formatted code and URLs; fix by inserting blank lines between all headings (like
"## Gemini"), subheadings (e.g., "2. Architectural Deconstruction"), and
paragraph blocks so each logical paragraph is separated, correct concatenated
sentences into proper sentences, convert bare URLs into markdown links with
descriptive text (replace bare URLs with [link text](url)), and ensure every
code block uses fenced code blocks with language identifiers and closing ```
markers (e.g., ```rust, ```json) where the diff shows example snippets (the
Cargo.toml and Rust/JavaScript examples) so the document is valid Markdown and
readable.

In @frontend/src-tauri/src/tts.rs:
- Around line 281-333: The sentence-splitting regex in chunk_text discards
terminal punctuation because sentence_re = r"([.!?])\s+" is used with split();
update chunk_text to preserve sentence-ending punctuation by either using a
lookahead regex like r"(?<=[.!?])\s+" for splitting or use
split_inclusive/iterate matches so the punctuation stays part of each part; keep
the surrounding logic (current accumulation, trimming, max_len checks, and
fallback behavior) the same but operate on parts that include their terminal
punctuation so TTS prosody is preserved.

In @frontend/src/services/tts/TTSContext.tsx:
- Around line 161-215: The speak function creates an object URL (audioUrl) but
only revokes it in onended/onerror, causing leaks when stop() is called,
audio.play() throws, or the component unmounts; fix by tracking the created URL
in a ref (e.g., audioUrlRef) and centralize cleanup in a helper (e.g.,
revokeAudioUrl) that revokes URL, clears audioRef and audioUrlRef, and updates
state; call that helper from onended, onerror, the catch block in speak, stop(),
and a useEffect cleanup on unmount so all exit paths revoke the object URL.
🧹 Nitpick comments (3)
docs/tts-research.md (2)

3-27: Trim verbosity in the Gemini section for better readability.

The opening Gemini section (lines 3–27) reads as lengthy, marketing-oriented prose with excessive punctuation and repetitive framing. The content is technically sound, but 25 lines of introductory preamble before substantive technical details may obscure the key findings. Consider condensing the executive summary to 3–5 sentences and moving deeper narrative to subsections if detail is necessary.

Example: Replace the verbose opening with a concise summary:

-Definitive Comparative Analysis of Local TTS Architectures for Cross-Platform Tauri Integration: Supertonic 2 vs. Chatterbox Turbo1. Executive Strategic Overview: The Local Inference Convergence in 2026The trajectory of computational linguistics... [excessive framing]
+## Executive Summary

+For Tauri developers building cross-platform local TTS, Supertonic 2 (ONNX-native, 44M params) outweighs Chatterbox Turbo (PyTorch-based, 350M params) due to superior mobile compatibility, smaller footprint, and direct Rust integration. Chatterbox excels only when voice cloning or paralinguistic features are essential requirements.

29-180: Consolidate or clearly distinguish the three AI research perspectives.

The document contains three separate comprehensive analyses (Gemini, Claude, Grok) covering largely overlapping ground: architecture, model sizes, cross-platform support, performance, and Tauri integration. While multiple perspectives can add depth, the repetition may reduce clarity. If all three are retained for documentation completeness, add a brief preamble explaining their purpose (e.g., "The following sections represent independent research analyses to validate findings across sources").

Alternatively, extract consensus findings into a single unified recommendation, placing detailed comparative tables and architecture discussions in a "Detailed Comparisons" section, and move the three analyses to an appendix or separate file if historical record is important.

frontend/src-tauri/src/tts.rs (1)

734-771: Synthesis command holds mutex during potentially slow inference.

The tts_synthesize command acquires a mutex lock at line 739 and holds it through the entire synthesis operation (lines 751-753), which involves multiple ONNX model runs. This blocks other potential TTS operations.

The lock is released before base64 encoding (line 759), which is good, but consider whether the synthesis itself could be done with a shorter lock duration by cloning necessary state upfront.

For the current single-user desktop app context, this is acceptable. If concurrent TTS requests become possible in the future, consider restructuring to minimize lock duration.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e607185 and ca87d46.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/tts-research.md
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/app.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • justfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • frontend/src-tauri/Cargo.toml
  • frontend/src/services/tts/index.ts
  • justfile
  • frontend/src/app.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/components/TTSDownloadDialog.tsx (1)
  • TTSDownloadDialog (18-160)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
  • new (86-88)
  • new (341-346)
  • tts_get_status (602-620)
  • tts_download_models (631-704)
  • tts_load_models (707-725)
  • tts_synthesize (735-771)
  • tts_unload_models (774-780)
frontend/src/components/TTSDownloadDialog.tsx (2)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
🪛 LanguageTool
docs/tts-research.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...

(QB_NEW_EN_DASH_RULE_EN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md

26-26: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


169-169: Bare URL used

(MD034, no-bare-urls)


175-175: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
docs/tts-research.md (2)

200-220: Verify performance benchmarks and resource claims against primary sources.

The document cites specific metrics (e.g., RTF of 0.006 on M4 Pro WebGPU at line 119, ~5GB VRAM for Chatterbox at line 199, 0.3 RTF on Raspberry Pi at line 120). While generally reasonable, these claims lack inline citations to benchmark reports or official documentation. For a research document, adding references or footnotes (e.g., linking to the GitHub repos' own benchmarks or published evaluations) strengthens credibility.


233-246: Ensure all citations are accurate and up-to-date.

The "Key Citations" section (lines 233–246) lists GitHub repos, blog posts, and videos. These are helpful, but verify:

  1. GitHub repo URLs match the latest stable branches (e.g., Supertone Inc. repo path).
  2. Blog and video dates align with the document's "January 2026" context (e.g., "Supertonic TTS is incredible" post should be recent).
  3. All links remain active; add a note if any are known to be archived or updated.
frontend/src-tauri/src/lib.rs (1)

6-6: TTS module integration looks correct for desktop-only functionality.

The TTS module, state management, and command handlers are appropriately scoped to the desktop build path. The mobile invoke handler (lines 261-263) correctly excludes TTS commands, aligning with the PR objective that TTS is desktop-only.

Also applies to: 38-38, 48-52

frontend/src/components/UnifiedChat.tsx (2)

192-246: Well-structured TTSButton component with proper state handling.

The component correctly:

  • Guards against non-Tauri environments by returning null early (line 206-208)
  • Handles all status states appropriately in handleClick
  • Disables the button during transitional states (checking, downloading, loading)
  • Shows appropriate icons for playing/stopped/loading states

The implementation follows React best practices with proper hook usage and conditional rendering.


756-764: TTS integration wired correctly through the component tree.

The TTS setup flow is properly connected:

  • ttsSetupDialogOpen state manages dialog visibility
  • onTTSSetupOpen callback is passed through MessageList to TTSButton
  • TTSDownloadDialog is rendered with appropriate props

Also applies to: 880-880, 3346-3348

frontend/src/components/TTSDownloadDialog.tsx (1)

18-159: Clean implementation of the TTS download dialog with comprehensive state handling.

The dialog properly handles all TTS lifecycle states:

  • Prevents dismissal during processing (line 34)
  • Shows appropriate messaging for each state (not_available, error, checking, downloading, loading, ready)
  • Progress bar and percentage display are conditionally rendered based on state
  • Footer buttons adapt to the current status

The component follows coding guidelines with proper path aliases, TypeScript types, and React patterns.

frontend/src/services/tts/TTSContext.tsx (2)

259-266: base64ToBlob helper is correctly implemented.

The function properly decodes base64 to bytes and creates a Blob with the specified MIME type.


62-62: Use isTauriDesktop() instead of isTauri() to prevent mobile invoke errors.

TTS commands are only registered in the desktop invoke handler (lib.rs lines 48-52), not in the mobile handler (lines 261-263). The current isTauri() check returns true for both desktop and mobile Tauri environments, which causes "unknown command" errors when tts_get_status is invoked on mobile.

Replace isTauri() with isTauriDesktop() at all usage points (lines 62, 78, 112, 118, 163) to ensure TTS operations are only attempted on desktop.

Also applies to: 78-78, 112-112, 118-118, 163-163

⛔ Skipped due to learnings
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to src-tauri/**/*.rs : Use `#[tauri::command]` attribute for Tauri command handlers
frontend/src-tauri/src/tts.rs (1)

1-588: Solid TTS engine implementation with proper ONNX integration.

The TTS pipeline is well-structured:

  • Text preprocessing handles Unicode normalization, emoji removal, and punctuation cleanup
  • Multi-stage inference (duration prediction → text encoding → denoising loop → vocoder)
  • Proper chunking for long texts with silence between chunks
  • WAV encoding with correct sample format

The ONNX session management and ndarray operations follow established patterns.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from eab4fbc to 4898520 Compare January 7, 2026 19:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)

115-219: Consider caching compiled regexes for performance.

Multiple regexes are compiled on each call to preprocess_text(). For better performance, especially with longer texts or frequent calls, consider using lazy_static or once_cell::sync::Lazy to compile regexes once.

♻️ Example using once_cell
use once_cell::sync::Lazy;

static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap()
});

static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\u{0302}...]").unwrap()
});

// Then use &*EMOJI_PATTERN in the function
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eab4fbc and 4898520.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (17)
frontend/src/services/tts/TTSContext.tsx (8)

1-13: LGTM!

Imports are clean and follow the coding guidelines with path aliases (@/utils/platform).


14-56: LGTM!

Type definitions are well-structured and align with the Rust backend response types.


60-74: LGTM!

Good pattern to check isTauriDesktop() once at mount and derive initial status accordingly.


76-108: LGTM!

The status checking logic correctly handles the various model states with appropriate error handling.


117-159: LGTM!

Download flow correctly manages the event listener lifecycle and handles both success and error paths.


161-213: Memory leak fixes properly implemented.

The speak callback now correctly revokes object URLs in all exit paths:

  • When stopping current audio to play new (lines 168-170)
  • On playback end/error (lines 190, 197)
  • In the catch block (lines 206-208)

This addresses the previous review feedback.


215-240: LGTM!

The stop() callback and unmount cleanup effect properly revoke object URLs, addressing the previous memory leak concerns.


264-280: LGTM!

The useTTS hook follows the standard context hook pattern with proper error handling for missing provider, and the base64ToBlob helper is clear and functional.

frontend/src-tauri/src/tts.rs (9)

1-41: LGTM!

Imports and constants are well-organized. Model file definitions with expected sizes enable proper progress tracking.


43-113: LGTM!

Configuration structs and UnicodeProcessor are well-structured. The processor gracefully handles out-of-range Unicode characters by assigning -1.


281-350: Punctuation preservation fix confirmed.

The sentence splitting now correctly preserves terminal punctuation by taking para[last_end..m.start() + 1] which includes the punctuation character. This addresses the previous review feedback about lost sentence endings.


352-511: LGTM!

The TTS engine implementation is well-structured with proper state management. The synthesis flow correctly:

  • Chunks text for manageable processing
  • Adds inter-chunk silence for natural pacing
  • Runs the multi-stage ONNX pipeline (duration prediction → text encoding → denoising → vocoder)

521-551: Voice style is hardcoded to F2.

The voice style is hardcoded to "F2.json" (line 522). Per the PR summary, this is intentional ("default/currently using F2"). If voice selection is planned for the future, consider accepting the style name as a parameter or adding a TODO comment.


647-738: Download reliability improvements confirmed.

The download implementation now includes all the previously requested reliability improvements:

  • HTTP timeouts (300s total, 30s connect) at lines 655-658
  • Partial file cleanup (.part extension) at lines 664, 682
  • Atomic file finalization (flush + rename) at lines 727-731

This prevents corrupted model files from being treated as valid.


768-805: Lock is held during synthesis.

The mutex guard is held at line 773 through the synthesis call at lines 785-787, which could take a noticeable amount of time for longer texts. While the guard is correctly dropped before encoding (line 793), this means concurrent TTS commands (e.g., status checks) will block during synthesis.

For a single-user desktop app this is likely acceptable, but consider documenting this behavior or adding a comment noting the intentional lock scope.


618-637: LGTM!

The status command correctly reports download and load state. The file existence check combined with the .part file mechanism in downloads should be sufficient to detect incomplete downloads.


740-759: LGTM!

Model loading and unloading commands are straightforward with proper error handling and logging.

Also applies to: 807-814

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)

618-637: Consider: Optional file integrity verification.

The status check uses simple file existence (line 626). While the .part temp-file download pattern mitigates partial-file issues, you could optionally verify file sizes match expected_size from MODEL_FILES for additional robustness against corruption.

Optional enhancement for file integrity
 let models_downloaded = MODEL_FILES
     .iter()
-    .all(|(name, _, _)| models_dir.join(name).exists());
+    .all(|(name, _, expected_size)| {
+        if let Ok(metadata) = models_dir.join(name).metadata() {
+            metadata.len() == *expected_size
+        } else {
+            false
+        }
+    });
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4898520 and 0214737.

📒 Files selected for processing (5)
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
  • new (86-88)
  • new (358-363)
  • tts_get_status (619-637)
  • tts_download_models (648-738)
  • tts_load_models (741-759)
  • tts_synthesize (769-805)
  • tts_unload_models (808-814)
  • tts_delete_models (817-834)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-android
  • GitHub Check: build-ios
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
frontend/src/services/tts/TTSContext.tsx (4)

1-58: LGTM: Clean type definitions and imports.

The imports follow the project's path alias conventions, and the TypeScript type definitions are strict and well-structured. The exported TTSStatus type provides a clear public API surface.


62-77: Well-addressed: Correct desktop-only detection.

The use of isTauriDesktop() (line 64) correctly ensures TTS functionality is only available on desktop Tauri environments, preventing the "unknown command" errors that would occur on mobile. This addresses the concern raised in previous reviews.


190-269: Excellent: Comprehensive audio resource cleanup.

The audio playback implementation now properly revokes object URLs in all exit paths:

  • When stopping previous audio (line 198)
  • On playback end (line 219) and error (line 226)
  • In the error catch block (line 236)
  • In the explicit stop() function (line 248)
  • On component unmount (line 265)

This comprehensively addresses the memory leak concerns raised in previous reviews.


302-310: LGTM: Correct base64-to-blob conversion.

The helper function uses standard browser APIs correctly to convert the base64-encoded audio data from the backend into a playable Blob.

frontend/src-tauri/src/tts.rs (4)

1-79: LGTM: Well-structured configuration and dependencies.

The model definitions, constants, and configuration types are clearly organized. The TOTAL_MODEL_SIZE of ~265 MB aligns with the PR description, and the serde annotations correctly support JSON serialization for config files and voice styles.


281-350: Well-fixed: Sentence chunking now preserves punctuation.

The chunk_text function now uses find_iter with manual slice construction (line 308: para[last_end..m.start() + 1]) rather than split(), ensuring terminal punctuation is preserved in each sentence. This addresses the concern raised in previous reviews about lost punctuation affecting TTS prosody.


647-738: Excellent: Robust download implementation with timeouts and atomic completion.

The download function now includes:

  • HTTP timeouts (300s total, 30s connect) to prevent indefinite hangs (lines 656-657)
  • Atomic file completion using a .part temp file pattern (line 664)
  • Explicit flush and rename on success (lines 727-731)

This prevents both indefinite blocking and partial-file corruption issues raised in previous reviews. The existence check (line 667) correctly tests the final file_path, not the temp file.


740-834: LGTM: Clean command implementations with proper locking.

The Tauri commands are well-structured:

  • tts_load_models correctly loads both engine and style into shared state
  • tts_synthesize properly clones the style (line 780) to avoid borrow conflicts and drops the guard early (line 793) to minimize lock contention during encoding
  • tts_delete_models correctly unloads models before filesystem removal

The hardcoded synthesis parameters (10 steps, 1.2× speed at line 786) align with the PR description's configuration.

frontend/src-tauri/src/lib.rs (1)

6-6: LGTM: Correct desktop-only TTS integration.

The TTS module is properly integrated into the desktop builder only:

  • Module declaration (line 6)
  • State initialization with TTSState::new() (line 38)
  • Six TTS commands registered in the desktop invoke_handler (lines 48-53)

The mobile builder (lines 262-264) correctly excludes TTS commands, preventing "unknown command" errors when the frontend's isTauriDesktop() check correctly gates TTS usage.

Also applies to: 38-53

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
frontend/src-tauri/src/tts.rs (7)

114-241: Consider refactoring into smaller helper functions.

This 127-line function handles normalization, markdown removal, emoji removal, symbol replacement, punctuation fixing, and more. Breaking it into focused helpers (e.g., remove_markdown, normalize_symbols, fix_punctuation) would improve readability and testability.


434-537: Optional: Extract pipeline stages into helper methods.

The 103-line infer method orchestrates a multi-stage pipeline (duration prediction, text encoding, latent sampling, denoising loop, vocoding). Consider extracting stages like predict_duration, encode_text, denoise_latent, and generate_waveform to improve readability.


548-578: Make voice style configurable.

The voice style is hardcoded to F2.json (line 549), but the PR summary mentions four available styles (F1, F2, M1, M2). Consider adding a parameter to allow users to select their preferred voice, or document why F2 is the default-only choice.

Example: Add voice parameter
-fn load_voice_style(models_dir: &Path) -> Result<Style> {
-    let style_path = models_dir.join("F2.json");
+fn load_voice_style(models_dir: &Path, voice: &str) -> Result<Style> {
+    let style_path = models_dir.join(format!("{}.json", voice));
     let file = File::open(&style_path).context("Failed to open voice style file")?;
     // ... rest of function

Then expose voice selection in Tauri commands or config.


651-653: Consider verifying file size in addition to existence.

The status check only verifies file existence. If a previous download was interrupted despite the temp file safeguards, or if files become corrupted on disk, models_downloaded will return true but loading will fail.

Optional: Add size verification
 let models_downloaded = MODEL_FILES
     .iter()
-    .all(|(name, _, _)| models_dir.join(name).exists());
+    .all(|(name, _, expected_size)| {
+        let path = models_dir.join(name);
+        path.exists() && path.metadata().map(|m| m.len() == *expected_size).unwrap_or(false)
+    });

795-838: Consider making synthesis parameters configurable.

The total_step (10) and speed (1.2) parameters are hardcoded at line 815. While these match the PR's documented defaults, exposing them as optional parameters would allow users to trade quality/speed or adjust playback rate.

Example: Add optional parameters
 #[tauri::command]
 pub async fn tts_synthesize(
     text: String,
+    speed: Option<f32>,
+    steps: Option<usize>,
     state: tauri::State<'_, Mutex<TTSState>>,
 ) -> Result<TTSSynthesizeResponse, String> {
     // ... existing code ...
     
+    let speed = speed.unwrap_or(1.2);
+    let steps = steps.unwrap_or(10);
+    
     let audio = tts
-        .synthesize(&text, &style, 10, 1.2)
+        .synthesize(&text, &style, steps, speed)
         .map_err(|e| format!("TTS synthesis failed: {}", e))?;

674-765: Optional: Add cancellation support for long downloads.

Downloading 265 MB can take significant time on slower connections. Consider adding cancellation support so users can abort if needed. Tauri supports abort signals through event listeners.


767-786: Consider checking available memory before loading models.

Loading ~265 MB of ONNX models could cause issues on memory-constrained devices. While this is an on-device feature for desktop, checking available memory and providing a clear error message would improve user experience.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0214737 and 803408d.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-ios
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages

@marksftw
Copy link
Contributor

marksftw commented Jan 7, 2026

two small nits:

  1. I think the "copy" and "speaker" buttons should always be on the screen. I didn't know where to find the speaker icon at first until I moved the mouse. It looks like ChatGPT has settled on having them ever-present on the screen underneath each response.
  2. The colors on the model download progress bar felt like they should be flipped. in dark mode, blue filling up the white. In light mode, purple filling up black.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
frontend/src-tauri/src/tts.rs (2)

549-550: Consider making voice selection configurable.

The voice style is hardcoded to F2.json, but all four voice models (F1, F2, M1, M2) are downloaded. Consider exposing voice selection as a parameter in the tts_load_models or tts_synthesize commands to allow users to choose their preferred voice style.

Example: Add voice parameter to load command
 #[tauri::command]
-pub async fn tts_load_models(state: tauri::State<'_, Mutex<TTSState>>) -> Result<(), String> {
+pub async fn tts_load_models(
+    state: tauri::State<'_, Mutex<TTSState>>,
+    voice: Option<String>
+) -> Result<(), String> {
     let models_dir = get_tts_models_dir().map_err(|e| e.to_string())?;
+    let voice_name = voice.unwrap_or_else(|| "F2".to_string());
     
-    let style = load_voice_style(&models_dir)
+    let style = load_voice_style(&models_dir, &voice_name)
         .map_err(|e| format!("Failed to load voice style: {}", e))?;

Then update load_voice_style signature:

-fn load_voice_style(models_dir: &Path) -> Result<Style> {
-    let style_path = models_dir.join("F2.json");
+fn load_voice_style(models_dir: &Path, voice_name: &str) -> Result<Style> {
+    let style_path = models_dir.join(format!("{}.json", voice_name));

815-816: Consider exposing synthesis parameters for user customization.

The inference steps (10) and speed multiplier (1.2) are hardcoded, but users might want to adjust the quality/speed tradeoff. Consider adding optional parameters to tts_synthesize:

Example: Add optional parameters
 #[tauri::command]
 pub async fn tts_synthesize(
     text: String,
     state: tauri::State<'_, Mutex<TTSState>>,
+    inference_steps: Option<usize>,
+    speed: Option<f32>
 ) -> Result<TTSSynthesizeResponse, String> {
+    let steps = inference_steps.unwrap_or(10);
+    let speed_multiplier = speed.unwrap_or(1.2);
+    
     let audio = tts
-        .synthesize(&text, &style, 10, 1.2)
+        .synthesize(&text, &style, steps, speed_multiplier)
         .map_err(|e| format!("TTS synthesis failed: {}", e))?;

Typical ranges: inference_steps: 5-20 (higher = better quality, slower), speed: 0.8-1.5 (lower = slower speech, more natural).

frontend/src/components/TTSDownloadDialog.tsx (2)

56-66: Simplify nested ternary logic for better readability.

The deeply nested ternary operators (4 levels) make this code harder to read and maintain. Consider extracting the description text into a helper function or using early returns.

Refactor suggestion
+  const getDialogDescription = () => {
+    if (isNotAvailable) {
+      return "TTS is only available in the desktop app.";
+    }
+    if (isReady) {
+      return "TTS is ready! You can now listen to assistant messages.";
+    }
+    if (hasError) {
+      return "There was an error setting up TTS.";
+    }
+    if (isProcessing) {
+      return "Setting up TTS. Please keep this window open.";
+    }
+    return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`;
+  };
+
   <DialogDescription className="text-base">
-    {isNotAvailable
-      ? "TTS is only available in the desktop app."
-      : isReady
-        ? "TTS is ready! You can now listen to assistant messages."
-        : hasError
-          ? "There was an error setting up TTS."
-          : isProcessing
-            ? "Setting up TTS. Please keep this window open."
-            : `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`}
+    {getDialogDescription()}
   </DialogDescription>

170-199: Footer button logic follows similar nested ternary pattern.

While this section is somewhat more manageable than the description text, consider applying a similar extraction pattern for consistency and maintainability if refactoring the DialogDescription (lines 56-66).

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcbbd55 and c83843c.

📒 Files selected for processing (3)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
🧬 Code graph analysis (2)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (304-310)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/components/TTSDownloadDialog.tsx (1)
  • TTSDownloadDialog (18-203)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (304-310)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (8)
frontend/src-tauri/src/tts.rs (3)

100-104: LGTM! OOV handling now uses padding token.

The change from -1 to 0 for out-of-vocabulary characters correctly uses the padding token instead of an invalid sentinel value, which prevents potential embedding lookup failures in the ONNX models.


304-373: Sentence boundary handling preserves punctuation correctly.

The implementation at line 331 uses para[last_end..m.start() + 1] which correctly includes the terminal punctuation character in each sentence chunk, addressing the previous concern about lost punctuation.


675-766: LGTM! Download reliability improvements properly implemented.

The implementation now includes:

  • HTTP timeouts (300s request, 30s connect) to prevent indefinite hangs
  • Temp file with .part extension to avoid treating partial downloads as complete
  • Cleanup of partial downloads before retry (line 710)
  • Explicit flush and atomic rename (lines 754-759) to finalize downloads

These changes address all previous reliability concerns.

frontend/src/components/UnifiedChat.tsx (5)

208-215: LGTM! Timer cleanup properly implemented.

The useEffect cleanup function now properly clears the longPressTimer on component unmount, preventing the timeout from firing after unmount and attempting to call onManage(). This addresses the previous review concern.


217-220: Correct platform check for TTS availability.

The early return for non-Tauri environments properly ensures the TTS button only renders in the desktop app, where the TTS backend is available.


222-235: Click handler logic correctly manages TTS states.

The implementation properly handles three cases:

  1. Not ready/error → trigger setup flow
  2. Playing current message → stop playback
  3. Ready and idle → start playback

790-800: LGTM! Copy and TTS buttons now always visible.

The buttons are rendered without hover states, making them consistently visible as requested in the PR feedback. The flex layout cleanly presents both action buttons together.


3382-3383: TTS dialog properly integrated into component hierarchy.

The dialog is correctly wired with state managed at the UnifiedChat level and callbacks passed to child components. The integration follows React best practices for component composition.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 230-266: The code creates an object URL (audioUrl) from audioBlob
but only revokes it in the source.onended handler, so if an error occurs before
playback the URL leaks; update the try/catch flow in the function that uses
base64ToBlob, audioUrl, audioContextRef, and sourceNodeRef so that audioUrl is
revoked in the catch block (and any early-return/error paths) — ensure you track
audioUrl in a variable scoped outside the try and call
URL.revokeObjectURL(audioUrl) in the catch before cleaning up
audioContextRef.current and sourceNodeRef.current, guarding against a
null/undefined audioUrl.
- Around line 271-286: The stop() function can leak object URLs because it stops
the AudioBufferSourceNode directly and onended may not fire, so revoke the audio
URL created in speak(); add an audioUrlRef (e.g., audioUrlRef.current set in
speak() where audioUrl is created) and in stop() check audioUrlRef.current, call
URL.revokeObjectURL(audioUrlRef.current) and clear it, then proceed to
stop/close and null refs (sourceNodeRef, audioContextRef) and reset state; also
ensure the onended handler clears and revokes audioUrlRef.current as well to
avoid double-revoke issues.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2944a8 and 903ad6d.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Greptile Review
  • GitHub Check: build-android
  • GitHub Check: build-ios
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
frontend/src/services/tts/TTSContext.tsx (6)

1-58: LGTM on imports and type definitions.

The imports correctly use path aliases, and the type definitions are well-structured with proper TypeScript strictness. The use of isTauriDesktop is correct for desktop-only TTS functionality.


62-77: Appropriate state and ref initialization.

The Web Audio API refs (audioContextRef, sourceNodeRef) correctly replace the previous HTMLAudioElement approach to avoid hijacking macOS media controls. State initialization is clean.


79-118: Status checking logic is well-implemented.

The status flow correctly handles all states (ready → loading → not_downloaded) with proper error handling. The auto-load when models are downloaded but not loaded provides good UX.


120-162: Download flow handles event listener cleanup correctly.

The progress event listener is properly cleaned up in both success (line 141) and error (lines 157-159) paths.


338-361: Helper functions are well-implemented.

The preprocessTextForTTS correctly handles both complete and streaming (unclosed) think blocks. The separation is appropriate — frontend strips reasoning content, backend handles TTS-specific text normalization. base64ToBlob implementation is standard and correct.


288-305: Unmount cleanup should also revoke the object URL.

Same issue as stop() — if the component unmounts while audio is playing, the audioUrl is never revoked.

Likely an incorrect or invalid review comment.

frontend/src-tauri/src/tts.rs (8)

1-49: Regex pre-compilation is well-implemented.

All regexes are now compiled once using once_cell::sync::Lazy, addressing the previous performance concern. Good coverage of markdown patterns, emojis, diacritics, and punctuation normalization.


50-74: Model file definitions are adequate for progress tracking.

The expected sizes are used for progress estimation rather than validation. Minor discrepancies between actual and expected sizes will only affect progress bar accuracy, not functionality.


131-137: OOV handling correctly uses padding token.

Using 0 (padding token) for out-of-vocabulary characters is the right approach, as it avoids potential embedding lookup failures that could occur with invalid indices like -1.


304-387: Text chunking properly preserves sentence punctuation.

The sentence splitting at line 330 now correctly includes the punctuation mark (m.start() + 1). The split_by_words fallback handles long sentences gracefully. The remaining edge case (single word exceeding max_len) is acceptable since it's rare and preferable to truncating words.


691-782: Download implementation is robust with proper safeguards.

The implementation correctly addresses all previous concerns:

  • HTTP timeouts (300s overall, 30s connect) prevent indefinite blocking
  • Temp file pattern (.part extension) with atomic rename prevents partial file corruption
  • Explicit flush before rename ensures data integrity
  • Previous partial downloads are cleaned up before retry

812-855: Synthesis command has good lock handling.

Cloning style to avoid borrow conflicts and dropping the guard before CPU-intensive base64 encoding minimizes lock contention. The synthesis parameters (10 steps, 1.2× speed) align with PR objectives.


563-566: Voice selection acknowledged as future work.

The TODO comment appropriately documents that voice selection is currently hardcoded to F2. This is acceptable for the initial implementation.


866-884: Model deletion correctly unloads before removing files.

The sequence (unload from memory → delete directory) ensures no file handles are held when deletion occurs.

AnthonyRonning and others added 7 commits January 7, 2026 19:24
- Add Rust TTS module with Supertonic ONNX inference
- Download models (~265MB) from HuggingFace on first use
- Add TTSContext for React state management
- Add TTSDownloadDialog for model setup
- Add speaker button to assistant messages (desktop only)
- Settings: F2 voice, 10 inference steps, 1.2x speed

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Preserve sentence punctuation when chunking text
- Add download timeout (5min) and partial file cleanup with .part temp files
- Fix audio URL memory leaks in stop/unmount/error handlers
- Use isTauriDesktop instead of isTauri (TTS is desktop-only)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Long-press speaker button to open TTS management dialog
- Add delete option to free up storage (~264 MB)
- Add tts_delete_models Rust command
- Show storage info when TTS is ready

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add TypeScript preprocessing to remove <think>...</think> blocks
- Add Rust preprocessing to strip markdown formatting (**bold**, *italic*, etc.)
- Remove code blocks, headers, emojis from TTS input
- Handle unclosed think tags (streaming edge case)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add #[cfg(desktop)] to tts module declaration
- Move TTS dependencies to desktop-only target in Cargo.toml
- Prevents TTS code from being compiled for iOS/Android

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add useEffect cleanup for longPressTimer on unmount
- Use 0 (padding token) instead of -1 for OOV characters
- Make copy/speaker buttons always visible (not hover-only)
- Fix progress bar colors (use primary/20 background)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Clear mediaSession metadata and action handlers so TTS audio
doesn't intercept system play/pause and track skip controls.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
AnthonyRonning and others added 3 commits January 7, 2026 19:24
- Use lazy_static regexes (once_cell) for one-time compilation
- Replace while loops with single-pass regex for duplicate quotes
- Add split_by_words() to handle text longer than max_len in chunks
- Add TODO for voice selection (currently hardcoded to F2)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace HTMLAudioElement with Web Audio API (AudioContext + AudioBufferSourceNode)
which doesn't register with the Media Session API.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from 903ad6d to 016e4e4 Compare January 8, 2026 01:44
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
frontend/src-tauri/src/tts.rs (1)

832-832: Consider making TTS parameters configurable.

The synthesis currently uses hardcoded values: 10 inference steps and 1.2× speed. Consider adding optional parameters to tts_synthesize to allow users to adjust quality (inference steps) and playback speed based on their preferences.

frontend/src/components/TTSDownloadDialog.tsx (1)

56-66: Consider refactoring nested ternary for readability.

The description text uses deeply nested ternary operators. While functional, extracting this logic into a helper function would improve readability:

♻️ Optional refactor
const getDescription = () => {
  if (isNotAvailable) return "TTS is only available in the desktop app.";
  if (isReady) return "TTS is ready! You can now listen to assistant messages.";
  if (hasError) return "There was an error setting up TTS.";
  if (isProcessing) return "Setting up TTS. Please keep this window open.";
  return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`;
};

// Then in JSX:
<DialogDescription className="text-base">
  {getDescription()}
</DialogDescription>
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 903ad6d and 016e4e4.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/tts-research.md
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/app.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • justfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • frontend/src/app.tsx
  • justfile
  • frontend/src-tauri/src/lib.rs
  • frontend/src/services/tts/index.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (317-323)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
🪛 LanguageTool
docs/tts-research.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md

26-26: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


169-169: Bare URL used

(MD034, no-bare-urls)


175-175: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Greptile Review
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: build-ios
🔇 Additional comments (2)
frontend/src-tauri/Cargo.toml (1)

44-44: Update ort from 2.0.0-rc.7 to a stable release or confirm pre-release acceptability.

ort 2.0.0 has not been stabilized—it remains in pre-release with the latest version at 2.0.0-rc.10. Using a pre-release version in production carries stability and support risks. Either upgrade to rc.10 if staying on pre-release is acceptable, or switch to a stable version if available.

frontend/src/components/UnifiedChat.tsx (1)

192-278: Well-implemented TTS button component.

The implementation demonstrates several good practices:

  • Proper cleanup: The useEffect at lines 208-215 correctly cleans up the long-press timer on unmount, addressing the concern from the previous review
  • Long-press handling: The pointer event handlers properly clear the timer on both pointerUp and pointerLeave, providing good UX
  • Platform awareness: Returns null for non-Tauri environments, ensuring the feature is desktop-only
  • Accessibility: Includes descriptive aria-label for screen readers
  • State management: Correctly uses the TTS context hook for shared state

The logic flow is clear and handles all the necessary states (not downloaded, loading, playing, etc.).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds on-device text-to-speech functionality using Supertonic, a lightweight TTS engine that runs inference via ONNX Runtime. The implementation is desktop-only, downloading ~265MB of models from HuggingFace on first use and processing all audio locally.

Key Changes

  • Rust TTS Engine (frontend/src-tauri/src/tts.rs): Complete ONNX-based inference pipeline with text preprocessing, chunking, denoising loop, and WAV encoding
  • React Integration (TTSContext.tsx): Context provider managing download, model loading, and audio playback via Web Audio API
  • UI Components: Download dialog with progress tracking and speaker buttons on assistant messages (desktop only)
  • Platform Detection: Properly scoped to desktop environments using isTauriDesktop() checks

Issues Found

  • Race condition in tts_synthesize: sample_rate read while holding lock but used after lock released - could mismatch with audio data if TTS unloaded concurrently
  • Performance: Multiple large array clones inside 10-iteration denoising loop (line 515-520) - allocates significant memory
  • File handling: .with_extension("part") may produce unexpected results for multi-dot filenames
  • Cleanup edge case: Event listener cleanup not guaranteed in error path (line 152-161)
  • RC dependency: Using ort = "2.0.0-rc.7" release candidate in production

Architecture Notes

The implementation follows a sound architecture pattern: desktop-only feature with proper platform detection, atomic file downloads with .part extension, and clean separation between Rust inference and React playback. The 10-step denoising with 1.2x speed and 50ms chunk pause delivers high-quality output as documented.

Confidence Score: 3/5

  • This PR has multiple performance and correctness issues that should be addressed before merging
  • The implementation introduces a race condition in audio synthesis that could cause crashes or corrupted audio, uses an unstable RC dependency, and has performance concerns with repeated large allocations in hot paths. While the architecture is sound and the feature works, these issues warrant fixes before production deployment.
  • Pay close attention to frontend/src-tauri/src/tts.rs (race condition on line 839-843, performance issues in denoising loop) and frontend/src/services/tts/TTSContext.tsx (event listener cleanup)

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 3/5 TTS engine implementation with multiple clones in hot paths and potential race condition in audio data handling
frontend/src/services/tts/TTSContext.tsx 4/5 TTS React context with proper cleanup but potential memory leak in error paths
frontend/src-tauri/Cargo.toml 3/5 Dependencies added for TTS - using release candidate version which may have stability concerns
frontend/src/components/TTSDownloadDialog.tsx 5/5 Clean download UI component with proper state management
frontend/src/components/UnifiedChat.tsx 5/5 TTS button integration into chat UI, properly scoped to desktop only
frontend/src/app.tsx 5/5 TTSProvider added to React component tree

Sequence Diagram

sequenceDiagram
    participant User
    participant React as React Frontend<br/>(TTSContext)
    participant Tauri as Tauri Backend<br/>(Rust)
    participant HF as HuggingFace<br/>(Models)
    participant ONNX as ONNX Runtime<br/>(Inference)

    User->>React: Open TTS Dialog
    React->>Tauri: tts_get_status()
    Tauri->>Tauri: Check models_downloaded
    Tauri->>Tauri: Check models_loaded
    Tauri-->>React: Status response

    alt Models not downloaded
        User->>React: Click Download
        React->>Tauri: tts_download_models()
        loop For each model file
            Tauri->>HF: HTTP GET model file
            HF-->>Tauri: Stream chunks
            Tauri->>Tauri: Write to .part file
            Tauri->>React: Emit progress event
            React->>User: Update progress bar
        end
        Tauri->>Tauri: Rename .part to final
        Tauri-->>React: Download complete
        
        React->>Tauri: tts_load_models()
        Tauri->>ONNX: Load 4 ONNX sessions
        Tauri->>Tauri: Load voice style JSON
        Tauri-->>React: Models loaded
    end

    User->>React: Click speaker icon
    React->>Tauri: tts_synthesize(text)
    Tauri->>Tauri: Preprocess text
    Tauri->>Tauri: Chunk text (300 chars)
    loop For each chunk
        Tauri->>ONNX: Duration prediction
        ONNX-->>Tauri: Duration values
        Tauri->>ONNX: Text encoding
        ONNX-->>Tauri: Text embeddings
        Tauri->>Tauri: Sample noisy latent
        loop 10 denoising steps
            Tauri->>ONNX: Vector estimation
            ONNX-->>Tauri: Denoised latent
        end
        Tauri->>ONNX: Vocoder (latent→wav)
        ONNX-->>Tauri: Audio samples
    end
    Tauri->>Tauri: Concatenate chunks
    Tauri->>Tauri: Encode to base64 WAV
    Tauri-->>React: Audio base64 + metadata
    React->>React: Decode to AudioBuffer
    React->>React: Play via Web Audio API
    React-->>User: Audio playback
Loading


[target.'cfg(any(target_os = "macos", target_os = "linux", target_os = "windows"))'.dependencies]
# TTS dependencies (Supertonic) - desktop only
ort = "2.0.0-rc.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using release candidate version 2.0.0-rc.7 in production may introduce stability issues - consider using a stable release

Comment on lines +515 to +520
let xt_value = Value::from_array(xt.clone())?;
let text_emb_value = Value::from_array(text_emb.clone())?;
let latent_mask_value = Value::from_array(latent_mask.clone())?;
let text_mask_value2 = Value::from_array(text_mask.clone())?;
let current_step_value = Value::from_array(current_step_array)?;
let total_step_value = Value::from_array(total_step_array.clone())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple large array clones inside inference loop (10 iterations) - xt.clone(), text_emb.clone(), latent_mask.clone(), text_mask.clone() - consider reusing Values or using references to reduce allocations

Comment on lines +839 to +843
let duration_seconds = audio.len() as f32 / tts.sample_rate as f32;
let sample_rate = tts.sample_rate;

// Drop the guard before encoding to release the lock
drop(guard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential race condition - sample_rate is read from tts.sample_rate while holding lock, but audio vector is used after drop(guard) - if another thread modifies or unloads TTS between these operations, the sample_rate may not match the audio data

Comment on lines +466 to +467
let text_mask_value = Value::from_array(text_mask.clone())?;
let style_dp_value = Value::from_array(style.dp.clone())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloning large arrays for ONNX value creation - text_mask.clone() and style.dp.clone() - consider if references can be used to avoid allocations

Comment on lines +706 to +708
for (file_name, url_path, expected_size) in MODEL_FILES {
let file_path = models_dir.join(file_name);
let temp_path = file_path.with_extension("part");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing file extension validation - with_extension("part") could produce unexpected results if file_name contains multiple dots (e.g., model.v2.onnx becomes model.v2.part but user might expect model.v2.onnx.part)

Comment on lines +704 to +777
let mut total_downloaded: u64 = 0;

for (file_name, url_path, expected_size) in MODEL_FILES {
let file_path = models_dir.join(file_name);
let temp_path = file_path.with_extension("part");

// Skip if already downloaded
if file_path.exists() {
total_downloaded += expected_size;
let _ = app.emit(
"tts-download-progress",
DownloadProgress {
downloaded: total_downloaded,
total: TOTAL_MODEL_SIZE,
file_name: file_name.to_string(),
percent: (total_downloaded as f64 / TOTAL_MODEL_SIZE as f64) * 100.0,
},
);
continue;
}

// Clean up any partial download from previous attempt
let _ = fs::remove_file(&temp_path);

let url = format!("{}/{}", HUGGINGFACE_BASE_URL, url_path);
log::info!("Downloading TTS model: {}", file_name);

let response = client
.get(&url)
.send()
.await
.map_err(|e| format!("Failed to download {}: {}", file_name, e))?;

if !response.status().is_success() {
return Err(format!(
"Failed to download {}: HTTP {}",
file_name,
response.status()
));
}

let mut file = File::create(&temp_path)
.map_err(|e| format!("Failed to create file {}: {}", file_name, e))?;

let mut stream = response.bytes_stream();
let mut file_downloaded: u64 = 0;

while let Some(chunk) = stream.next().await {
let chunk = chunk.map_err(|e| format!("Download error: {}", e))?;
file.write_all(&chunk)
.map_err(|e| format!("Write error: {}", e))?;

file_downloaded += chunk.len() as u64;
let current_total = total_downloaded + file_downloaded;

let _ = app.emit(
"tts-download-progress",
DownloadProgress {
downloaded: current_total,
total: TOTAL_MODEL_SIZE,
file_name: file_name.to_string(),
percent: (current_total as f64 / TOTAL_MODEL_SIZE as f64) * 100.0,
},
);
}

// Flush and rename temp file to final path
file.flush()
.map_err(|e| format!("Failed to flush file {}: {}", file_name, e))?;
drop(file);
fs::rename(&temp_path, &file_path)
.map_err(|e| format!("Failed to finalize {}: {}", file_name, e))?;

total_downloaded += expected_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no atomic write protection - if download is interrupted between file.flush() and fs::rename(), partial file may be left in inconsistent state - consider using a transaction pattern or storing checksums

Comment on lines +152 to +161
} catch (err) {
console.error("TTS download failed:", err);
setStatus("error");
setError(err instanceof Error ? err.message : "Failed to download TTS models");

// Clean up listener on error
if (unlistenRef.current) {
unlistenRef.current();
unlistenRef.current = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event listener cleanup not guaranteed on error - if invoke("tts_download_models") throws, unlisten() is called but unlistenRef.current may have already been set to null in the catch block - consider using try/finally

Comment on lines +233 to +262
// Use Web Audio API instead of HTMLAudioElement to avoid hijacking media controls
const audioContext = new AudioContext();
const arrayBuffer = await audioBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);

// Store context and source for stop functionality
audioContextRef.current = audioContext;
sourceNodeRef.current = source;

source.onended = () => {
if (sourceNodeRef.current !== source) {
return;
}
setIsPlaying(false);
setCurrentPlayingId(null);

if (audioUrlRef.current === audioUrl) {
URL.revokeObjectURL(audioUrlRef.current);
audioUrlRef.current = null;
}
audioContext.close();
audioContextRef.current = null;
sourceNodeRef.current = null;
};

source.start(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audio context cleanup may fail silently - audioContext.close() at line 257 can throw if context is already closed, but it's inside source.onended callback where errors aren't caught

Comment on lines +217 to +222
const processedText = preprocessTextForTTS(text);
if (!processedText) {
setIsPlaying(false);
setCurrentPlayingId(null);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty text preprocessing result is silently ignored - if preprocessTextForTTS(text) returns empty string, playback state is set then immediately unset without user feedback

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 8, 2026

Additional Comments (1)

frontend/src/tauri/src/tts.rs
hardcoded HuggingFace URL without HTTPS verification configuration - ensure TLS certificate validation is enabled in reqwest::Client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants