-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add on-device TTS using Supertonic #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds an on-device TTS subsystem: Rust/Tauri backend with ONNX-based synthesis and Tauri commands, a React TTS context/hook for lifecycle and playback, UI components for model download/setup and per-message playback, debug desktop build targets, an AppImage bundling flag, and a TTS research doc. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as TTSDownloadDialog
participant React as TTS Context\n(useTTS)
participant Tauri as Tauri IPC
participant Rust as TTS Backend\n(tts.rs)
participant FS as File System
UI->>React: startDownload()
React->>Tauri: invoke tts_download_models()
Tauri->>Rust: request download
Rust->>FS: stream & save model files
loop progress events
Rust-->>Tauri: emit download progress
Tauri-->>React: tts-download-progress event
React-->>UI: update progress UI
end
Rust-->>Tauri: download complete
React->>Tauri: invoke tts_load_models()
Tauri->>Rust: load ONNX sessions / init engine
Rust-->>Tauri: load complete
Tauri-->>React: ready
React-->>UI: status = ready
sequenceDiagram
participant User as User
participant UI as TTSButton
participant React as TTS Context
participant Tauri as Tauri IPC
participant Rust as TTS Backend
participant Audio as Browser Audio
User->>UI: click play(text)
UI->>React: speak(text, id)
React->>Tauri: invoke tts_synthesize(text)
Tauri->>Rust: synthesize request
Rust->>Rust: preprocess → duration → encode → denoise → vocoder
Rust-->>Tauri: TTSSynthesizeResponse (audio_base64)
Tauri-->>React: deliver response
React->>Audio: create Blob/URL and play
Audio-->>React: onended
React-->>UI: update playing state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Comment |
Deploying maple with
|
| Latest commit: |
503318f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://bf79f6bb.maple-ca8.pages.dev |
| Branch Preview URL: | https://feature-tts-supertonic.maple-ca8.pages.dev |
Greptile SummaryThis PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine. The implementation downloads ~265MB of models from HuggingFace on first use and performs all synthesis locally on the desktop app. Key Changes:
Previous Review Feedback Addressed:
Minor Issues:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant UI as React UI
participant TTSContext
participant Tauri as Tauri Bridge
participant Rust as Rust Backend
participant HF as HuggingFace
participant ONNX as ONNX Runtime
Note over User,ONNX: Initial Setup Flow
User->>UI: Opens TTS Dialog
UI->>TTSContext: checkStatus()
TTSContext->>Tauri: invoke("tts_get_status")
Tauri->>Rust: tts_get_status()
Rust->>Rust: Check if models exist
Rust-->>Tauri: TTSStatusResponse
Tauri-->>TTSContext: {models_downloaded, models_loaded}
TTSContext-->>UI: Update status
alt Models not downloaded
User->>UI: Click Download
UI->>TTSContext: startDownload()
TTSContext->>Tauri: invoke("tts_download_models")
Tauri->>Rust: tts_download_models()
loop For each model file
Rust->>HF: HTTP GET model file
HF-->>Rust: Stream chunks
Rust->>Rust: Write to temp file
Rust->>Tauri: emit("tts-download-progress")
Tauri-->>TTSContext: Progress event
TTSContext-->>UI: Update progress bar
end
Rust->>Rust: Flush and rename temp files
Rust-->>Tauri: Success
end
Note over User,ONNX: Model Loading Flow
TTSContext->>Tauri: invoke("tts_load_models")
Tauri->>Rust: tts_load_models()
Rust->>Rust: Load tts.json config
Rust->>Rust: Load unicode_indexer.json
Rust->>ONNX: Load duration_predictor.onnx
Rust->>ONNX: Load text_encoder.onnx
Rust->>ONNX: Load vector_estimator.onnx
Rust->>ONNX: Load vocoder.onnx
Rust->>Rust: Load F2.json voice style
Rust->>Rust: Store in TTSState
Rust-->>Tauri: Success
Tauri-->>TTSContext: Models loaded
TTSContext-->>UI: status = "ready"
Note over User,ONNX: Text-to-Speech Synthesis
User->>UI: Click speaker icon
UI->>TTSContext: speak(text, messageId)
TTSContext->>TTSContext: preprocessTextForTTS()
TTSContext->>Tauri: invoke("tts_synthesize", {text})
Tauri->>Rust: tts_synthesize(text)
Rust->>Rust: chunk_text(text, 300)
loop For each chunk
Rust->>Rust: preprocess_text(chunk)
Rust->>Rust: UnicodeProcessor.call()
Rust->>ONNX: Run duration_predictor
ONNX-->>Rust: Duration predictions
Rust->>ONNX: Run text_encoder
ONNX-->>Rust: Text embeddings
Rust->>Rust: sample_noisy_latent()
loop 10 denoising steps
Rust->>ONNX: Run vector_estimator
ONNX-->>Rust: Denoised latent
end
Rust->>ONNX: Run vocoder
ONNX-->>Rust: Audio waveform
Rust->>Rust: Concatenate with silence
end
Rust->>Rust: wav_to_base64()
Rust-->>Tauri: TTSSynthesizeResponse
Tauri-->>TTSContext: {audio_base64, sample_rate}
TTSContext->>TTSContext: base64ToBlob()
TTSContext->>TTSContext: Create AudioContext
TTSContext->>TTSContext: Decode audio buffer
TTSContext->>User: Play audio via Web Audio API
TTSContext-->>UI: isPlaying = true
Note over User,ONNX: Stop/Cleanup
User->>UI: Click stop or unmount
UI->>TTSContext: stop()
TTSContext->>TTSContext: sourceNode.stop()
TTSContext->>TTSContext: audioContext.close()
TTSContext->>TTSContext: URL.revokeObjectURL()
TTSContext-->>UI: isPlaying = false
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
frontend/src/components/UnifiedChat.tsx (1)
179-233: TTSButton implementation looks solid with proper environment check and state handling.The component correctly:
- Returns null when not in Tauri environment (line 193-195)
- Handles different TTS states appropriately
- Provides proper aria-labels for accessibility
However, there's a potential issue with the async
handleClickfunction:The
speakcall on line 207 is awaited but errors are not handled. Consider adding error handling:const handleClick = async () => { if (status === "not_downloaded" || status === "error") { onNeedsSetup(); return; } if (status === "ready") { if (isThisPlaying) { stop(); } else { - await speak(text, messageId); + try { + await speak(text, messageId); + } catch (error) { + console.error("TTS playback failed:", error); + } } } };frontend/src-tauri/src/tts.rs (3)
115-219: Consider compiling regexes once to improve performance.The
preprocess_textfunction creates multipleRegexobjects on every call. Since this function is called for each text chunk during synthesis, this causes repeated compilation overhead.Consider using
lazy_staticoronce_cell::sync::Lazyto compile regexes once:use once_cell::sync::Lazy; static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap() }); static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\u{0302}...]").unwrap() }); // ... other regexes fn preprocess_text(text: &str) -> String { let mut text: String = text.nfkd().collect(); text = EMOJI_PATTERN.replace_all(&text, "").to_string(); // ... }This is especially important since
chunk_textmay split long messages into many chunks, each processed separately.
734-771: Mutex held during CPU-intensive synthesis may cause contention.The mutex guard is held at line 739 and only released at line 759, after the potentially long-running
synthesize()call completes. For longer texts, ONNX inference across multiple chunks could take several seconds, blocking all other TTS commands.Consider whether the ONNX sessions truly need mutable access. If
Session::runonly requires&self, you could restructure to minimize lock duration:// Clone what's needed, release lock quickly let (tts_data, style) = { let guard = state.lock().map_err(|e| e.to_string())?; let style = guard.style.as_ref().ok_or("Voice style not loaded")?.clone(); // If possible, clone or Arc the session references // ... }; // Synthesize outside the lock let audio = /* ... */;Alternatively, if mutable access is truly required by ONNX Runtime, this is acceptable for now but document the limitation. For production, consider using
tokio::task::spawn_blockingto avoid blocking the async runtime.
536-566: Consider configuring ONNX session options for better performance.The ONNX sessions are created with default settings. For a better user experience, you might want to configure the execution:
use ort::session::SessionBuilder; let dp_ort = SessionBuilder::new()? .with_intra_threads(4)? // Limit thread usage .commit_from_file(models_dir.join("duration_predictor.onnx"))?;This is optional but could help with:
- Controlling CPU usage during inference
- Potentially enabling GPU acceleration in the future
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
frontend/src-tauri/Cargo.toml(1 hunks)frontend/src-tauri/src/lib.rs(4 hunks)frontend/src-tauri/src/pdf_extractor.rs(1 hunks)frontend/src-tauri/src/tts.rs(1 hunks)frontend/src/app.tsx(2 hunks)frontend/src/components/TTSDownloadDialog.tsx(1 hunks)frontend/src/components/UnifiedChat.tsx(8 hunks)frontend/src/services/tts/TTSContext.tsx(1 hunks)frontend/src/services/tts/index.ts(1 hunks)justfile(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling
Files:
frontend/src/services/tts/index.tsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/app.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/UnifiedChat.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use strict TypeScript typing and avoid
anywhen possible
Files:
frontend/src/services/tts/index.tsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/app.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/UnifiedChat.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (4)
frontend/src/components/TTSDownloadDialog.tsx (1)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)
frontend/src/app.tsx (5)
frontend/src/services/tts/index.ts (1)
TTSProvider(1-1)frontend/src/components/BillingServiceProvider.tsx (1)
BillingServiceProvider(5-14)frontend/src/components/ProxyEventListener.tsx (1)
ProxyEventListener(7-58)frontend/src/components/UpdateEventListener.tsx (1)
UpdateEventListener(11-65)frontend/src/components/DeepLinkHandler.tsx (1)
DeepLinkHandler(9-109)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
TTSStatus(2-2)TTSProvider(1-1)useTTS(1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
new(86-88)new(341-346)tts_get_status(602-620)tts_download_models(631-704)tts_load_models(707-725)tts_synthesize(735-771)tts_unload_models(774-780)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-ios
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (16)
justfile (1)
46-57: Desktop debug build targets look consistent and usefulThe new debug and debug-no-cc recipes mirror the existing release targets and should help diagnose desktop issues without impacting current workflows.
frontend/src/services/tts/TTSContext.tsx (1)
60-72: Overall TTS provider shape and status lifecycle look solidThe context cleanly encapsulates TTS state (status, errors, progress, playback) and exposes a small surface (checkStatus, startDownload, speak, stop). Checking
isTauri()once, defaulting to"not_available"off-Tauri, and auto-checking status on mount is consistent with the existing platform utilities and React-context usage.frontend/src/app.tsx (1)
17-17: TTSProvider placement in the provider tree looks appropriateWrapping
BillingServiceProvider(and thusProxyEventListener,UpdateEventListener,DeepLinkHandler, andInnerApp) withTTSProviderensures all consumers ofuseTTSsit under the context without altering existing provider ordering. This is a good integration point.Also applies to: 102-109
frontend/src-tauri/Cargo.toml (1)
42-53: TTS dependency set looks coherent; please verify cross-platform builds and versionsThe added crates (ONNX Runtime via
ort,ndarraywithrayon,reqwestwith streaming,dirs, etc.) are consistent with a local Supertonic TTS pipeline. Since they’re in the shared[dependencies]section (not cfg-gated), they’ll be compiled for all targets, including Android/iOS.Please verify:
- That
ort = "2.0.0-rc.7"and the other new crates build cleanly for your mobile targets.- That there are no unexpected size or licensing issues introduced by this set.
Consider adding cfg-gating or feature flags later if you decide TTS should remain desktop-only.
frontend/src-tauri/src/pdf_extractor.rs (1)
19-27: PDF extractor changes are formatting-only and safeThe adjustments here are purely stylistic (parameter layout and whitespace) and do not affect decoding, file-type branching, or response construction.
Also applies to: 33-38, 43-51
frontend/src/services/tts/index.ts (1)
1-2: Barrel re-exports for TTS are clean and usefulRe-exporting
TTSProvider,useTTS, andTTSStatusfrom the TTS module simplifies imports for consumers and keeps the public surface centralized.frontend/src-tauri/src/lib.rs (2)
5-8: TTS state management and command wiring on desktop look correctRegistering
mod tts;, managingtts::TTSState::new(), and adding the TTS commands to the desktopinvoke_handlercleanly integrates the backend TTS pipeline. Scoping these commands to the desktop builder matches the “desktop-only TTS” goal.Also applies to: 39-54
387-399: Update-ready event emission and logging structure are clearThe
app_handle.emit("update-ready", UpdateReadyPayload { ... })block, along with the success/failure logging, is straightforward and keeps the updater behavior unchanged while improving observability.frontend/src/components/TTSDownloadDialog.tsx (1)
1-160: TTS download/setup dialog behavior matches the context state machine wellThe dialog cleanly maps
TTSStatusinto UX states (not available, error, processing, ready, initial), wiresstartDownloadappropriately, and prevents accidental closure while work is in progress. The progress bar and messaging around the one-time ~MB download and local-only processing are clear.frontend/src/components/UnifiedChat.tsx (3)
412-427: LGTM!The MessageList signature update correctly adds the
onTTSSetupOpencallback with proper typing. The integration follows the existing pattern for handling dialogs in this component.
626-633: LGTM!The TTSButton integration mirrors the existing CopyButton pattern, reusing the same text extraction logic. The button is properly placed within the hover-reveal action group for assistant messages.
3023-3024: LGTM!The TTS setup dialog follows the same pattern as other dialogs in the component (e.g., WebSearchInfoDialog, ContextLimitDialog). State management is consistent with the rest of the codebase.
frontend/src-tauri/src/tts.rs (4)
335-347: LGTM!The
TTSStatedesign withMutex<Self>is appropriate here. While usingstd::sync::Mutexin async contexts can be problematic if held across.awaitpoints, the current implementation correctly performs all mutex-guarded operations synchronously before any awaits.
281-333: Text chunking logic is functional but has the same regex compilation overhead.The function works correctly for splitting text into manageable chunks. The regex objects (
para_re,sentence_re) are recreated on each call - same recommendation aspreprocess_textapplies.Note: The sentence boundary regex
([.!?])\s+may not handle all edge cases (e.g., abbreviations like "Dr. Smith"), but this is acceptable for TTS where minor imperfections in chunk boundaries won't significantly affect output quality.
568-588: LGTM!The WAV encoding function correctly clamps audio samples to prevent overflow and uses proper 16-bit PCM encoding. The in-memory buffer approach is efficient for the expected audio sizes.
706-725: LGTM!The model loading function is straightforward. While it doesn't explicitly check if models are downloaded first, the error messages from
load_tts_engineandload_voice_stylewill be informative if files are missing. The frontend'sTTSContextguards against calling this before download completes.
fc94dfa to
e607185
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
frontend/src/services/tts/TTSContext.tsx (2)
10-10: UseisTauriDesktop()instead ofisTauri()to prevent mobile TTS errors.According to past review comments, TTS commands are only registered in the desktop builder. Using
isTauri()returns true for both desktop and mobile Tauri environments, which would cause "unknown command" errors on mobile when invokingtts_get_status.Replace
isTauriwithisTauriDesktopthroughout this file (lines 10, 62, 78, 112, 118, 163).-import { isTauri } from "@/utils/platform"; +import { isTauriDesktop } from "@/utils/platform";And update usage:
- const isTauriEnv = isTauri(); + const isTauriEnv = isTauriDesktop();
208-215: Memory leak: Audio object URL not revoked on stop() or unmount.When
stop()is called or the provider unmounts during playback, the object URL created inspeak()is never revoked, causing memory leaks over repeated play/stop cycles.Add a ref to track the URL and revoke it in
stop()and the unmount effect:const audioRef = useRef<HTMLAudioElement | null>(null); + const audioUrlRef = useRef<string | null>(null);In
stop():const stop = useCallback(() => { if (audioRef.current) { audioRef.current.pause(); audioRef.current = null; } + if (audioUrlRef.current) { + URL.revokeObjectURL(audioUrlRef.current); + audioUrlRef.current = null; + } setIsPlaying(false); setCurrentPlayingId(null); }, []);And store the URL in
speak():const audioUrl = URL.createObjectURL(audioBlob); + audioUrlRef.current = audioUrl;frontend/src-tauri/src/tts.rs (2)
295-347: Sentence splitting loses punctuation marks.The regex pattern
r"([.!?])\s+"uses a capture group, butsplit()discards captured groups, causing sentences to lose their ending punctuation.Consider using
split_inclusiveor adjusting the pattern:- let sentence_re = Regex::new(r"([.!?])\s+").unwrap(); + let sentence_re = Regex::new(r"(?<=[.!?])\s+").unwrap();Or manually preserve punctuation by iterating matches instead of splitting. This affects the naturalness of TTS output since sentences will be synthesized without proper endings.
650-668: Add request timeout and handle partial downloads.Two reliability concerns from past reviews still apply:
No HTTP timeout: If the server hangs, the download blocks indefinitely.
Partial file on failure: If download fails mid-stream, a partial file remains. Next
tts_get_statuswill report models as downloaded.+ use std::time::Duration; let client = reqwest::Client::builder() + .timeout(Duration::from_secs(300)) + .connect_timeout(Duration::from_secs(30)) .build() - .map_err(|e| format!("Failed to create HTTP client: {}", e))?; + .unwrap_or_else(|_| reqwest::Client::new());For partial files, download to a
.partfile and rename on success:+ let temp_path = file_path.with_extension("part"); - let mut file = File::create(&file_path) + let mut file = File::create(&temp_path) ... + // After successful download: + fs::rename(&temp_path, &file_path)?;
🧹 Nitpick comments (1)
frontend/src/components/UnifiedChat.tsx (1)
626-633: Consider extracting the text content logic to avoid duplication.The same text extraction logic is duplicated between CopyButton and TTSButton. While acceptable for now, consider extracting this to a helper if more buttons are added.
const messageText = message.content .filter((p) => "text" in p && p.text) .map((p) => ("text" in p ? p.text : "")) .join("");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
frontend/src-tauri/Cargo.toml(1 hunks)frontend/src-tauri/src/lib.rs(7 hunks)frontend/src-tauri/src/pdf_extractor.rs(1 hunks)frontend/src-tauri/src/tts.rs(1 hunks)frontend/src/app.tsx(2 hunks)frontend/src/components/TTSDownloadDialog.tsx(1 hunks)frontend/src/components/UnifiedChat.tsx(8 hunks)frontend/src/services/tts/TTSContext.tsx(1 hunks)frontend/src/services/tts/index.ts(1 hunks)justfile(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- frontend/src/app.tsx
- frontend/src/services/tts/index.ts
- justfile
- frontend/src-tauri/Cargo.toml
- frontend/src-tauri/src/pdf_extractor.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use strict TypeScript typing and avoid
anywhen possible
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
TTSStatus(2-2)TTSProvider(1-1)useTTS(1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
init_tts_backend(20-30)new(100-102)new(355-360)tts_get_status(616-634)tts_download_models(645-718)tts_load_models(721-739)tts_synthesize(749-785)tts_unload_models(788-794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: Cloudflare Pages
🔇 Additional comments (11)
frontend/src/components/UnifiedChat.tsx (2)
179-233: TTSButton implementation looks solid.The component properly:
- Guards rendering with
isTauriEnvcheck- Handles all TTS status states appropriately
- Provides clear visual feedback (spinner during loading states, stop icon during playback)
- Uses the correct disabled states during transitional operations
3023-3024: TTS dialog integration is clean.The dialog is properly controlled via state and placed appropriately in the component tree.
frontend/src/services/tts/TTSContext.tsx (1)
259-267: Helper function implementation is correct.The
base64ToBlobfunction correctly decodes base64 and creates a Blob with the appropriate MIME type.frontend/src/components/TTSDownloadDialog.tsx (3)
18-24: Dialog implementation is well-structured.The component properly handles all TTS status states with appropriate UI feedback. The
handleDownloadwrapper is a simple passthrough, which is fine but could be inlined if no additional logic is needed.
70-78: Error display is correctly guarded.The condition
hasError && errorensures the error panel only shows when there's an actual error message to display, preventing empty error states.
93-105: Progress bar behavior is well-designed.The progress bar correctly shows 100% during the loading phase (after download completes), and the percentage text is appropriately hidden during non-download states.
frontend/src-tauri/src/lib.rs (2)
396-407: Minor formatting changes to emit call.These are style/formatting adjustments with no functional impact.
264-273: > Likely an incorrect or invalid review comment.frontend/src-tauri/src/tts.rs (3)
765-767: Synthesis parameters are hardcoded as per design.The hardcoded values (10 inference steps, 1.2x speed) match the PR description. Consider exposing these as optional parameters in the future for user customization.
550-580: TTS engine loading is straightforward and correct.The ONNX session creation uses default builder settings. Consider adding thread configuration or optimization options if performance tuning is needed later.
18-30: Platform-specific TTS backend initialization is correct.The conditional compilation properly selects tract for iOS/Android and ONNX Runtime for desktop. This aligns with the cross-platform architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
e607185 to
fc94dfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
fc94dfa to
ca87d46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Around line 1-30: The Gemini section has many run-on paragraphs and missing
blank lines (e.g., "2026The trajectory", "SuccessOn desktop") and improperly
formatted code and URLs; fix by inserting blank lines between all headings (like
"## Gemini"), subheadings (e.g., "2. Architectural Deconstruction"), and
paragraph blocks so each logical paragraph is separated, correct concatenated
sentences into proper sentences, convert bare URLs into markdown links with
descriptive text (replace bare URLs with [link text](url)), and ensure every
code block uses fenced code blocks with language identifiers and closing ```
markers (e.g., ```rust, ```json) where the diff shows example snippets (the
Cargo.toml and Rust/JavaScript examples) so the document is valid Markdown and
readable.
In @frontend/src-tauri/src/tts.rs:
- Around line 281-333: The sentence-splitting regex in chunk_text discards
terminal punctuation because sentence_re = r"([.!?])\s+" is used with split();
update chunk_text to preserve sentence-ending punctuation by either using a
lookahead regex like r"(?<=[.!?])\s+" for splitting or use
split_inclusive/iterate matches so the punctuation stays part of each part; keep
the surrounding logic (current accumulation, trimming, max_len checks, and
fallback behavior) the same but operate on parts that include their terminal
punctuation so TTS prosody is preserved.
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 161-215: The speak function creates an object URL (audioUrl) but
only revokes it in onended/onerror, causing leaks when stop() is called,
audio.play() throws, or the component unmounts; fix by tracking the created URL
in a ref (e.g., audioUrlRef) and centralize cleanup in a helper (e.g.,
revokeAudioUrl) that revokes URL, clears audioRef and audioUrlRef, and updates
state; call that helper from onended, onerror, the catch block in speak, stop(),
and a useEffect cleanup on unmount so all exit paths revoke the object URL.
🧹 Nitpick comments (3)
docs/tts-research.md (2)
3-27: Trim verbosity in the Gemini section for better readability.The opening Gemini section (lines 3–27) reads as lengthy, marketing-oriented prose with excessive punctuation and repetitive framing. The content is technically sound, but 25 lines of introductory preamble before substantive technical details may obscure the key findings. Consider condensing the executive summary to 3–5 sentences and moving deeper narrative to subsections if detail is necessary.
Example: Replace the verbose opening with a concise summary:
-Definitive Comparative Analysis of Local TTS Architectures for Cross-Platform Tauri Integration: Supertonic 2 vs. Chatterbox Turbo1. Executive Strategic Overview: The Local Inference Convergence in 2026The trajectory of computational linguistics... [excessive framing] +## Executive Summary +For Tauri developers building cross-platform local TTS, Supertonic 2 (ONNX-native, 44M params) outweighs Chatterbox Turbo (PyTorch-based, 350M params) due to superior mobile compatibility, smaller footprint, and direct Rust integration. Chatterbox excels only when voice cloning or paralinguistic features are essential requirements.
29-180: Consolidate or clearly distinguish the three AI research perspectives.The document contains three separate comprehensive analyses (Gemini, Claude, Grok) covering largely overlapping ground: architecture, model sizes, cross-platform support, performance, and Tauri integration. While multiple perspectives can add depth, the repetition may reduce clarity. If all three are retained for documentation completeness, add a brief preamble explaining their purpose (e.g., "The following sections represent independent research analyses to validate findings across sources").
Alternatively, extract consensus findings into a single unified recommendation, placing detailed comparative tables and architecture discussions in a "Detailed Comparisons" section, and move the three analyses to an appendix or separate file if historical record is important.
frontend/src-tauri/src/tts.rs (1)
734-771: Synthesis command holds mutex during potentially slow inference.The
tts_synthesizecommand acquires a mutex lock at line 739 and holds it through the entire synthesis operation (lines 751-753), which involves multiple ONNX model runs. This blocks other potential TTS operations.The lock is released before base64 encoding (line 759), which is good, but consider whether the synthesis itself could be done with a shorter lock duration by cloning necessary state upfront.
For the current single-user desktop app context, this is acceptable. If concurrent TTS requests become possible in the future, consider restructuring to minimize lock duration.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/tts-research.mdfrontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/app.tsxfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsjustfile
🚧 Files skipped from review as they are similar to previous changes (4)
- frontend/src-tauri/Cargo.toml
- frontend/src/services/tts/index.ts
- justfile
- frontend/src/app.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)frontend/src/components/ui/button.tsx (1)
Button(62-62)frontend/src/components/TTSDownloadDialog.tsx (1)
TTSDownloadDialog(18-160)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
new(86-88)new(341-346)tts_get_status(602-620)tts_download_models(631-704)tts_load_models(707-725)tts_synthesize(735-771)tts_unload_models(774-780)
frontend/src/components/TTSDownloadDialog.tsx (2)
frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
🪛 LanguageTool
docs/tts-research.md
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...
(QB_NEW_EN_DASH_RULE_EN)
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...
(QB_NEW_EN_HYPHEN)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md
26-26: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
169-169: Bare URL used
(MD034, no-bare-urls)
175-175: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
docs/tts-research.md (2)
200-220: Verify performance benchmarks and resource claims against primary sources.The document cites specific metrics (e.g., RTF of 0.006 on M4 Pro WebGPU at line 119, ~5GB VRAM for Chatterbox at line 199, 0.3 RTF on Raspberry Pi at line 120). While generally reasonable, these claims lack inline citations to benchmark reports or official documentation. For a research document, adding references or footnotes (e.g., linking to the GitHub repos' own benchmarks or published evaluations) strengthens credibility.
233-246: Ensure all citations are accurate and up-to-date.The "Key Citations" section (lines 233–246) lists GitHub repos, blog posts, and videos. These are helpful, but verify:
- GitHub repo URLs match the latest stable branches (e.g., Supertone Inc. repo path).
- Blog and video dates align with the document's "January 2026" context (e.g., "Supertonic TTS is incredible" post should be recent).
- All links remain active; add a note if any are known to be archived or updated.
frontend/src-tauri/src/lib.rs (1)
6-6: TTS module integration looks correct for desktop-only functionality.The TTS module, state management, and command handlers are appropriately scoped to the desktop build path. The mobile invoke handler (lines 261-263) correctly excludes TTS commands, aligning with the PR objective that TTS is desktop-only.
Also applies to: 38-38, 48-52
frontend/src/components/UnifiedChat.tsx (2)
192-246: Well-structured TTSButton component with proper state handling.The component correctly:
- Guards against non-Tauri environments by returning null early (line 206-208)
- Handles all status states appropriately in
handleClick- Disables the button during transitional states (checking, downloading, loading)
- Shows appropriate icons for playing/stopped/loading states
The implementation follows React best practices with proper hook usage and conditional rendering.
756-764: TTS integration wired correctly through the component tree.The TTS setup flow is properly connected:
ttsSetupDialogOpenstate manages dialog visibilityonTTSSetupOpencallback is passed through MessageList to TTSButton- TTSDownloadDialog is rendered with appropriate props
Also applies to: 880-880, 3346-3348
frontend/src/components/TTSDownloadDialog.tsx (1)
18-159: Clean implementation of the TTS download dialog with comprehensive state handling.The dialog properly handles all TTS lifecycle states:
- Prevents dismissal during processing (line 34)
- Shows appropriate messaging for each state (not_available, error, checking, downloading, loading, ready)
- Progress bar and percentage display are conditionally rendered based on state
- Footer buttons adapt to the current status
The component follows coding guidelines with proper path aliases, TypeScript types, and React patterns.
frontend/src/services/tts/TTSContext.tsx (2)
259-266:base64ToBlobhelper is correctly implemented.The function properly decodes base64 to bytes and creates a Blob with the specified MIME type.
62-62: UseisTauriDesktop()instead ofisTauri()to prevent mobile invoke errors.TTS commands are only registered in the desktop invoke handler (
lib.rslines 48-52), not in the mobile handler (lines 261-263). The currentisTauri()check returns true for both desktop and mobile Tauri environments, which causes "unknown command" errors whentts_get_statusis invoked on mobile.Replace
isTauri()withisTauriDesktop()at all usage points (lines 62, 78, 112, 118, 163) to ensure TTS operations are only attempted on desktop.Also applies to: 78-78, 112-112, 118-118, 163-163
⛔ Skipped due to learnings
Learnt from: CR Repo: OpenSecretCloud/Maple PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-08T19:55:33.330Z Learning: Applies to src-tauri/**/*.rs : Use `#[tauri::command]` attribute for Tauri command handlersfrontend/src-tauri/src/tts.rs (1)
1-588: Solid TTS engine implementation with proper ONNX integration.The TTS pipeline is well-structured:
- Text preprocessing handles Unicode normalization, emoji removal, and punctuation cleanup
- Multi-stage inference (duration prediction → text encoding → denoising loop → vocoder)
- Proper chunking for long texts with silence between chunks
- WAV encoding with correct sample format
The ONNX session management and ndarray operations follow established patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
eab4fbc to
4898520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)
115-219: Consider caching compiled regexes for performance.Multiple regexes are compiled on each call to
preprocess_text(). For better performance, especially with longer texts or frequent calls, consider usinglazy_staticoronce_cell::sync::Lazyto compile regexes once.♻️ Example using once_cell
use once_cell::sync::Lazy; static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap() }); static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\u{0302}...]").unwrap() }); // Then use &*EMOJI_PATTERN in the function
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (17)
frontend/src/services/tts/TTSContext.tsx (8)
1-13: LGTM!Imports are clean and follow the coding guidelines with path aliases (
@/utils/platform).
14-56: LGTM!Type definitions are well-structured and align with the Rust backend response types.
60-74: LGTM!Good pattern to check
isTauriDesktop()once at mount and derive initial status accordingly.
76-108: LGTM!The status checking logic correctly handles the various model states with appropriate error handling.
117-159: LGTM!Download flow correctly manages the event listener lifecycle and handles both success and error paths.
161-213: Memory leak fixes properly implemented.The
speakcallback now correctly revokes object URLs in all exit paths:
- When stopping current audio to play new (lines 168-170)
- On playback end/error (lines 190, 197)
- In the catch block (lines 206-208)
This addresses the previous review feedback.
215-240: LGTM!The
stop()callback and unmount cleanup effect properly revoke object URLs, addressing the previous memory leak concerns.
264-280: LGTM!The
useTTShook follows the standard context hook pattern with proper error handling for missing provider, and thebase64ToBlobhelper is clear and functional.frontend/src-tauri/src/tts.rs (9)
1-41: LGTM!Imports and constants are well-organized. Model file definitions with expected sizes enable proper progress tracking.
43-113: LGTM!Configuration structs and
UnicodeProcessorare well-structured. The processor gracefully handles out-of-range Unicode characters by assigning -1.
281-350: Punctuation preservation fix confirmed.The sentence splitting now correctly preserves terminal punctuation by taking
para[last_end..m.start() + 1]which includes the punctuation character. This addresses the previous review feedback about lost sentence endings.
352-511: LGTM!The TTS engine implementation is well-structured with proper state management. The synthesis flow correctly:
- Chunks text for manageable processing
- Adds inter-chunk silence for natural pacing
- Runs the multi-stage ONNX pipeline (duration prediction → text encoding → denoising → vocoder)
521-551: Voice style is hardcoded to F2.The voice style is hardcoded to
"F2.json"(line 522). Per the PR summary, this is intentional ("default/currently using F2"). If voice selection is planned for the future, consider accepting the style name as a parameter or adding a TODO comment.
647-738: Download reliability improvements confirmed.The download implementation now includes all the previously requested reliability improvements:
- HTTP timeouts (300s total, 30s connect) at lines 655-658
- Partial file cleanup (
.partextension) at lines 664, 682- Atomic file finalization (flush + rename) at lines 727-731
This prevents corrupted model files from being treated as valid.
768-805: Lock is held during synthesis.The mutex guard is held at line 773 through the synthesis call at lines 785-787, which could take a noticeable amount of time for longer texts. While the guard is correctly dropped before encoding (line 793), this means concurrent TTS commands (e.g., status checks) will block during synthesis.
For a single-user desktop app this is likely acceptable, but consider documenting this behavior or adding a comment noting the intentional lock scope.
618-637: LGTM!The status command correctly reports download and load state. The file existence check combined with the
.partfile mechanism in downloads should be sufficient to detect incomplete downloads.
740-759: LGTM!Model loading and unloading commands are straightforward with proper error handling and logging.
Also applies to: 807-814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)
618-637: Consider: Optional file integrity verification.The status check uses simple file existence (line 626). While the
.parttemp-file download pattern mitigates partial-file issues, you could optionally verify file sizes matchexpected_sizefromMODEL_FILESfor additional robustness against corruption.Optional enhancement for file integrity
let models_downloaded = MODEL_FILES .iter() - .all(|(name, _, _)| models_dir.join(name).exists()); + .all(|(name, _, expected_size)| { + if let Ok(metadata) = models_dir.join(name).metadata() { + metadata.len() == *expected_size + } else { + false + } + });
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
frontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
- frontend/src/components/TTSDownloadDialog.tsx
- frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
new(86-88)new(358-363)tts_get_status(619-637)tts_download_models(648-738)tts_load_models(741-759)tts_synthesize(769-805)tts_unload_models(808-814)tts_delete_models(817-834)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-android
- GitHub Check: build-ios
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
frontend/src/services/tts/TTSContext.tsx (4)
1-58: LGTM: Clean type definitions and imports.The imports follow the project's path alias conventions, and the TypeScript type definitions are strict and well-structured. The exported
TTSStatustype provides a clear public API surface.
62-77: Well-addressed: Correct desktop-only detection.The use of
isTauriDesktop()(line 64) correctly ensures TTS functionality is only available on desktop Tauri environments, preventing the "unknown command" errors that would occur on mobile. This addresses the concern raised in previous reviews.
190-269: Excellent: Comprehensive audio resource cleanup.The audio playback implementation now properly revokes object URLs in all exit paths:
- When stopping previous audio (line 198)
- On playback end (line 219) and error (line 226)
- In the error catch block (line 236)
- In the explicit
stop()function (line 248)- On component unmount (line 265)
This comprehensively addresses the memory leak concerns raised in previous reviews.
302-310: LGTM: Correct base64-to-blob conversion.The helper function uses standard browser APIs correctly to convert the base64-encoded audio data from the backend into a playable Blob.
frontend/src-tauri/src/tts.rs (4)
1-79: LGTM: Well-structured configuration and dependencies.The model definitions, constants, and configuration types are clearly organized. The TOTAL_MODEL_SIZE of ~265 MB aligns with the PR description, and the serde annotations correctly support JSON serialization for config files and voice styles.
281-350: Well-fixed: Sentence chunking now preserves punctuation.The
chunk_textfunction now usesfind_iterwith manual slice construction (line 308:para[last_end..m.start() + 1]) rather thansplit(), ensuring terminal punctuation is preserved in each sentence. This addresses the concern raised in previous reviews about lost punctuation affecting TTS prosody.
647-738: Excellent: Robust download implementation with timeouts and atomic completion.The download function now includes:
- HTTP timeouts (300s total, 30s connect) to prevent indefinite hangs (lines 656-657)
- Atomic file completion using a
.parttemp file pattern (line 664)- Explicit flush and rename on success (lines 727-731)
This prevents both indefinite blocking and partial-file corruption issues raised in previous reviews. The existence check (line 667) correctly tests the final
file_path, not the temp file.
740-834: LGTM: Clean command implementations with proper locking.The Tauri commands are well-structured:
tts_load_modelscorrectly loads both engine and style into shared statetts_synthesizeproperly clones the style (line 780) to avoid borrow conflicts and drops the guard early (line 793) to minimize lock contention during encodingtts_delete_modelscorrectly unloads models before filesystem removalThe hardcoded synthesis parameters (10 steps, 1.2× speed at line 786) align with the PR description's configuration.
frontend/src-tauri/src/lib.rs (1)
6-6: LGTM: Correct desktop-only TTS integration.The TTS module is properly integrated into the desktop builder only:
- Module declaration (line 6)
- State initialization with
TTSState::new()(line 38)- Six TTS commands registered in the desktop
invoke_handler(lines 48-53)The mobile builder (lines 262-264) correctly excludes TTS commands, preventing "unknown command" errors when the frontend's
isTauriDesktop()check correctly gates TTS usage.Also applies to: 38-53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
frontend/src-tauri/src/tts.rs (7)
114-241: Consider refactoring into smaller helper functions.This 127-line function handles normalization, markdown removal, emoji removal, symbol replacement, punctuation fixing, and more. Breaking it into focused helpers (e.g.,
remove_markdown,normalize_symbols,fix_punctuation) would improve readability and testability.
434-537: Optional: Extract pipeline stages into helper methods.The 103-line
infermethod orchestrates a multi-stage pipeline (duration prediction, text encoding, latent sampling, denoising loop, vocoding). Consider extracting stages likepredict_duration,encode_text,denoise_latent, andgenerate_waveformto improve readability.
548-578: Make voice style configurable.The voice style is hardcoded to F2.json (line 549), but the PR summary mentions four available styles (F1, F2, M1, M2). Consider adding a parameter to allow users to select their preferred voice, or document why F2 is the default-only choice.
Example: Add voice parameter
-fn load_voice_style(models_dir: &Path) -> Result<Style> { - let style_path = models_dir.join("F2.json"); +fn load_voice_style(models_dir: &Path, voice: &str) -> Result<Style> { + let style_path = models_dir.join(format!("{}.json", voice)); let file = File::open(&style_path).context("Failed to open voice style file")?; // ... rest of functionThen expose voice selection in Tauri commands or config.
651-653: Consider verifying file size in addition to existence.The status check only verifies file existence. If a previous download was interrupted despite the temp file safeguards, or if files become corrupted on disk,
models_downloadedwill return true but loading will fail.Optional: Add size verification
let models_downloaded = MODEL_FILES .iter() - .all(|(name, _, _)| models_dir.join(name).exists()); + .all(|(name, _, expected_size)| { + let path = models_dir.join(name); + path.exists() && path.metadata().map(|m| m.len() == *expected_size).unwrap_or(false) + });
795-838: Consider making synthesis parameters configurable.The
total_step(10) andspeed(1.2) parameters are hardcoded at line 815. While these match the PR's documented defaults, exposing them as optional parameters would allow users to trade quality/speed or adjust playback rate.Example: Add optional parameters
#[tauri::command] pub async fn tts_synthesize( text: String, + speed: Option<f32>, + steps: Option<usize>, state: tauri::State<'_, Mutex<TTSState>>, ) -> Result<TTSSynthesizeResponse, String> { // ... existing code ... + let speed = speed.unwrap_or(1.2); + let steps = steps.unwrap_or(10); + let audio = tts - .synthesize(&text, &style, 10, 1.2) + .synthesize(&text, &style, steps, speed) .map_err(|e| format!("TTS synthesis failed: {}", e))?;
674-765: Optional: Add cancellation support for long downloads.Downloading 265 MB can take significant time on slower connections. Consider adding cancellation support so users can abort if needed. Tauri supports abort signals through event listeners.
767-786: Consider checking available memory before loading models.Loading ~265 MB of ONNX models could cause issues on memory-constrained devices. While this is an on-device feature for desktop, checking available memory and providing a clear error message would improve user experience.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-ios
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
|
two small nits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
frontend/src-tauri/src/tts.rs (2)
549-550: Consider making voice selection configurable.The voice style is hardcoded to
F2.json, but all four voice models (F1, F2, M1, M2) are downloaded. Consider exposing voice selection as a parameter in thetts_load_modelsortts_synthesizecommands to allow users to choose their preferred voice style.Example: Add voice parameter to load command
#[tauri::command] -pub async fn tts_load_models(state: tauri::State<'_, Mutex<TTSState>>) -> Result<(), String> { +pub async fn tts_load_models( + state: tauri::State<'_, Mutex<TTSState>>, + voice: Option<String> +) -> Result<(), String> { let models_dir = get_tts_models_dir().map_err(|e| e.to_string())?; + let voice_name = voice.unwrap_or_else(|| "F2".to_string()); - let style = load_voice_style(&models_dir) + let style = load_voice_style(&models_dir, &voice_name) .map_err(|e| format!("Failed to load voice style: {}", e))?;Then update
load_voice_stylesignature:-fn load_voice_style(models_dir: &Path) -> Result<Style> { - let style_path = models_dir.join("F2.json"); +fn load_voice_style(models_dir: &Path, voice_name: &str) -> Result<Style> { + let style_path = models_dir.join(format!("{}.json", voice_name));
815-816: Consider exposing synthesis parameters for user customization.The inference steps (
10) and speed multiplier (1.2) are hardcoded, but users might want to adjust the quality/speed tradeoff. Consider adding optional parameters totts_synthesize:Example: Add optional parameters
#[tauri::command] pub async fn tts_synthesize( text: String, state: tauri::State<'_, Mutex<TTSState>>, + inference_steps: Option<usize>, + speed: Option<f32> ) -> Result<TTSSynthesizeResponse, String> { + let steps = inference_steps.unwrap_or(10); + let speed_multiplier = speed.unwrap_or(1.2); + let audio = tts - .synthesize(&text, &style, 10, 1.2) + .synthesize(&text, &style, steps, speed_multiplier) .map_err(|e| format!("TTS synthesis failed: {}", e))?;Typical ranges:
inference_steps: 5-20(higher = better quality, slower),speed: 0.8-1.5(lower = slower speech, more natural).frontend/src/components/TTSDownloadDialog.tsx (2)
56-66: Simplify nested ternary logic for better readability.The deeply nested ternary operators (4 levels) make this code harder to read and maintain. Consider extracting the description text into a helper function or using early returns.
Refactor suggestion
+ const getDialogDescription = () => { + if (isNotAvailable) { + return "TTS is only available in the desktop app."; + } + if (isReady) { + return "TTS is ready! You can now listen to assistant messages."; + } + if (hasError) { + return "There was an error setting up TTS."; + } + if (isProcessing) { + return "Setting up TTS. Please keep this window open."; + } + return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`; + }; + <DialogDescription className="text-base"> - {isNotAvailable - ? "TTS is only available in the desktop app." - : isReady - ? "TTS is ready! You can now listen to assistant messages." - : hasError - ? "There was an error setting up TTS." - : isProcessing - ? "Setting up TTS. Please keep this window open." - : `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`} + {getDialogDescription()} </DialogDescription>
170-199: Footer button logic follows similar nested ternary pattern.While this section is somewhat more manageable than the description text, consider applying a similar extraction pattern for consistency and maintainability if refactoring the DialogDescription (lines 56-66).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
frontend/src-tauri/src/tts.rsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/UnifiedChat.tsxfrontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsx
🧬 Code graph analysis (2)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(304-310)frontend/src/components/ui/button.tsx (1)
Button(62-62)frontend/src/components/TTSDownloadDialog.tsx (1)
TTSDownloadDialog(18-203)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(304-310)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-linux
- GitHub Check: build-ios
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (8)
frontend/src-tauri/src/tts.rs (3)
100-104: LGTM! OOV handling now uses padding token.The change from
-1to0for out-of-vocabulary characters correctly uses the padding token instead of an invalid sentinel value, which prevents potential embedding lookup failures in the ONNX models.
304-373: Sentence boundary handling preserves punctuation correctly.The implementation at line 331 uses
para[last_end..m.start() + 1]which correctly includes the terminal punctuation character in each sentence chunk, addressing the previous concern about lost punctuation.
675-766: LGTM! Download reliability improvements properly implemented.The implementation now includes:
- HTTP timeouts (300s request, 30s connect) to prevent indefinite hangs
- Temp file with
.partextension to avoid treating partial downloads as complete- Cleanup of partial downloads before retry (line 710)
- Explicit flush and atomic rename (lines 754-759) to finalize downloads
These changes address all previous reliability concerns.
frontend/src/components/UnifiedChat.tsx (5)
208-215: LGTM! Timer cleanup properly implemented.The
useEffectcleanup function now properly clears thelongPressTimeron component unmount, preventing the timeout from firing after unmount and attempting to callonManage(). This addresses the previous review concern.
217-220: Correct platform check for TTS availability.The early return for non-Tauri environments properly ensures the TTS button only renders in the desktop app, where the TTS backend is available.
222-235: Click handler logic correctly manages TTS states.The implementation properly handles three cases:
- Not ready/error → trigger setup flow
- Playing current message → stop playback
- Ready and idle → start playback
790-800: LGTM! Copy and TTS buttons now always visible.The buttons are rendered without hover states, making them consistently visible as requested in the PR feedback. The flex layout cleanly presents both action buttons together.
3382-3383: TTS dialog properly integrated into component hierarchy.The dialog is correctly wired with state managed at the
UnifiedChatlevel and callbacks passed to child components. The integration follows React best practices for component composition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 6 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 230-266: The code creates an object URL (audioUrl) from audioBlob
but only revokes it in the source.onended handler, so if an error occurs before
playback the URL leaks; update the try/catch flow in the function that uses
base64ToBlob, audioUrl, audioContextRef, and sourceNodeRef so that audioUrl is
revoked in the catch block (and any early-return/error paths) — ensure you track
audioUrl in a variable scoped outside the try and call
URL.revokeObjectURL(audioUrl) in the catch before cleaning up
audioContextRef.current and sourceNodeRef.current, guarding against a
null/undefined audioUrl.
- Around line 271-286: The stop() function can leak object URLs because it stops
the AudioBufferSourceNode directly and onended may not fire, so revoke the audio
URL created in speak(); add an audioUrlRef (e.g., audioUrlRef.current set in
speak() where audioUrl is created) and in stop() check audioUrlRef.current, call
URL.revokeObjectURL(audioUrlRef.current) and clear it, then proceed to
stop/close and null refs (sourceNodeRef, audioContextRef) and reset state; also
ensure the onended handler clears and revokes audioUrlRef.current as well to
avoid double-revoke issues.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Greptile Review
- GitHub Check: build-android
- GitHub Check: build-ios
- GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
frontend/src/services/tts/TTSContext.tsx (6)
1-58: LGTM on imports and type definitions.The imports correctly use path aliases, and the type definitions are well-structured with proper TypeScript strictness. The use of
isTauriDesktopis correct for desktop-only TTS functionality.
62-77: Appropriate state and ref initialization.The Web Audio API refs (
audioContextRef,sourceNodeRef) correctly replace the previousHTMLAudioElementapproach to avoid hijacking macOS media controls. State initialization is clean.
79-118: Status checking logic is well-implemented.The status flow correctly handles all states (ready → loading → not_downloaded) with proper error handling. The auto-load when models are downloaded but not loaded provides good UX.
120-162: Download flow handles event listener cleanup correctly.The progress event listener is properly cleaned up in both success (line 141) and error (lines 157-159) paths.
338-361: Helper functions are well-implemented.The
preprocessTextForTTScorrectly handles both complete and streaming (unclosed) think blocks. The separation is appropriate — frontend strips reasoning content, backend handles TTS-specific text normalization.base64ToBlobimplementation is standard and correct.
288-305: Unmount cleanup should also revoke the object URL.Same issue as
stop()— if the component unmounts while audio is playing, theaudioUrlis never revoked.Likely an incorrect or invalid review comment.
frontend/src-tauri/src/tts.rs (8)
1-49: Regex pre-compilation is well-implemented.All regexes are now compiled once using
once_cell::sync::Lazy, addressing the previous performance concern. Good coverage of markdown patterns, emojis, diacritics, and punctuation normalization.
50-74: Model file definitions are adequate for progress tracking.The expected sizes are used for progress estimation rather than validation. Minor discrepancies between actual and expected sizes will only affect progress bar accuracy, not functionality.
131-137: OOV handling correctly uses padding token.Using
0(padding token) for out-of-vocabulary characters is the right approach, as it avoids potential embedding lookup failures that could occur with invalid indices like-1.
304-387: Text chunking properly preserves sentence punctuation.The sentence splitting at line 330 now correctly includes the punctuation mark (
m.start() + 1). Thesplit_by_wordsfallback handles long sentences gracefully. The remaining edge case (single word exceedingmax_len) is acceptable since it's rare and preferable to truncating words.
691-782: Download implementation is robust with proper safeguards.The implementation correctly addresses all previous concerns:
- HTTP timeouts (300s overall, 30s connect) prevent indefinite blocking
- Temp file pattern (
.partextension) with atomic rename prevents partial file corruption- Explicit flush before rename ensures data integrity
- Previous partial downloads are cleaned up before retry
812-855: Synthesis command has good lock handling.Cloning
styleto avoid borrow conflicts and dropping the guard before CPU-intensive base64 encoding minimizes lock contention. The synthesis parameters (10 steps, 1.2× speed) align with PR objectives.
563-566: Voice selection acknowledged as future work.The TODO comment appropriately documents that voice selection is currently hardcoded to F2. This is acceptable for the initial implementation.
866-884: Model deletion correctly unloads before removing files.The sequence (unload from memory → delete directory) ensures no file handles are held when deletion occurs.
- Add Rust TTS module with Supertonic ONNX inference - Download models (~265MB) from HuggingFace on first use - Add TTSContext for React state management - Add TTSDownloadDialog for model setup - Add speaker button to assistant messages (desktop only) - Settings: F2 voice, 10 inference steps, 1.2x speed Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Preserve sentence punctuation when chunking text - Add download timeout (5min) and partial file cleanup with .part temp files - Fix audio URL memory leaks in stop/unmount/error handlers - Use isTauriDesktop instead of isTauri (TTS is desktop-only) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Long-press speaker button to open TTS management dialog - Add delete option to free up storage (~264 MB) - Add tts_delete_models Rust command - Show storage info when TTS is ready Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add TypeScript preprocessing to remove <think>...</think> blocks - Add Rust preprocessing to strip markdown formatting (**bold**, *italic*, etc.) - Remove code blocks, headers, emojis from TTS input - Handle unclosed think tags (streaming edge case) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add #[cfg(desktop)] to tts module declaration - Move TTS dependencies to desktop-only target in Cargo.toml - Prevents TTS code from being compiled for iOS/Android Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add useEffect cleanup for longPressTimer on unmount - Use 0 (padding token) instead of -1 for OOV characters - Make copy/speaker buttons always visible (not hover-only) - Fix progress bar colors (use primary/20 background) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Clear mediaSession metadata and action handlers so TTS audio doesn't intercept system play/pause and track skip controls. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Use lazy_static regexes (once_cell) for one-time compilation - Replace while loops with single-pass regex for duplicate quotes - Add split_by_words() to handle text longer than max_len in chunks - Add TODO for voice selection (currently hardcoded to F2) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace HTMLAudioElement with Web Audio API (AudioContext + AudioBufferSourceNode) which doesn't register with the Media Session API. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
903ad6d to
016e4e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
frontend/src-tauri/src/tts.rs (1)
832-832: Consider making TTS parameters configurable.The synthesis currently uses hardcoded values: 10 inference steps and 1.2× speed. Consider adding optional parameters to
tts_synthesizeto allow users to adjust quality (inference steps) and playback speed based on their preferences.frontend/src/components/TTSDownloadDialog.tsx (1)
56-66: Consider refactoring nested ternary for readability.The description text uses deeply nested ternary operators. While functional, extracting this logic into a helper function would improve readability:
♻️ Optional refactor
const getDescription = () => { if (isNotAvailable) return "TTS is only available in the desktop app."; if (isReady) return "TTS is ready! You can now listen to assistant messages."; if (hasError) return "There was an error setting up TTS."; if (isProcessing) return "Setting up TTS. Please keep this window open."; return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`; }; // Then in JSX: <DialogDescription className="text-base"> {getDescription()} </DialogDescription>
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/tts-research.mdfrontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/app.tsxfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsjustfile
🚧 Files skipped from review as they are similar to previous changes (4)
- frontend/src/app.tsx
- justfile
- frontend/src-tauri/src/lib.rs
- frontend/src/services/tts/index.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(317-323)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
🪛 LanguageTool
docs/tts-research.md
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...
(QB_NEW_EN_HYPHEN)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md
26-26: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
169-169: Bare URL used
(MD034, no-bare-urls)
175-175: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Greptile Review
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: build-ios
🔇 Additional comments (2)
frontend/src-tauri/Cargo.toml (1)
44-44: Updateortfrom 2.0.0-rc.7 to a stable release or confirm pre-release acceptability.ort 2.0.0 has not been stabilized—it remains in pre-release with the latest version at 2.0.0-rc.10. Using a pre-release version in production carries stability and support risks. Either upgrade to rc.10 if staying on pre-release is acceptable, or switch to a stable version if available.
frontend/src/components/UnifiedChat.tsx (1)
192-278: Well-implemented TTS button component.The implementation demonstrates several good practices:
- Proper cleanup: The
useEffectat lines 208-215 correctly cleans up the long-press timer on unmount, addressing the concern from the previous review- Long-press handling: The pointer event handlers properly clear the timer on both
pointerUpandpointerLeave, providing good UX- Platform awareness: Returns
nullfor non-Tauri environments, ensuring the feature is desktop-only- Accessibility: Includes descriptive
aria-labelfor screen readers- State management: Correctly uses the TTS context hook for shared state
The logic flow is clear and handles all the necessary states (not downloaded, loading, playing, etc.).
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds on-device text-to-speech functionality using Supertonic, a lightweight TTS engine that runs inference via ONNX Runtime. The implementation is desktop-only, downloading ~265MB of models from HuggingFace on first use and processing all audio locally.
Key Changes
- Rust TTS Engine (
frontend/src-tauri/src/tts.rs): Complete ONNX-based inference pipeline with text preprocessing, chunking, denoising loop, and WAV encoding - React Integration (
TTSContext.tsx): Context provider managing download, model loading, and audio playback via Web Audio API - UI Components: Download dialog with progress tracking and speaker buttons on assistant messages (desktop only)
- Platform Detection: Properly scoped to desktop environments using
isTauriDesktop()checks
Issues Found
- Race condition in
tts_synthesize:sample_rateread while holding lock but used after lock released - could mismatch with audio data if TTS unloaded concurrently - Performance: Multiple large array clones inside 10-iteration denoising loop (line 515-520) - allocates significant memory
- File handling:
.with_extension("part")may produce unexpected results for multi-dot filenames - Cleanup edge case: Event listener cleanup not guaranteed in error path (line 152-161)
- RC dependency: Using
ort = "2.0.0-rc.7"release candidate in production
Architecture Notes
The implementation follows a sound architecture pattern: desktop-only feature with proper platform detection, atomic file downloads with .part extension, and clean separation between Rust inference and React playback. The 10-step denoising with 1.2x speed and 50ms chunk pause delivers high-quality output as documented.
Confidence Score: 3/5
- This PR has multiple performance and correctness issues that should be addressed before merging
- The implementation introduces a race condition in audio synthesis that could cause crashes or corrupted audio, uses an unstable RC dependency, and has performance concerns with repeated large allocations in hot paths. While the architecture is sound and the feature works, these issues warrant fixes before production deployment.
- Pay close attention to
frontend/src-tauri/src/tts.rs(race condition on line 839-843, performance issues in denoising loop) andfrontend/src/services/tts/TTSContext.tsx(event listener cleanup)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 3/5 | TTS engine implementation with multiple clones in hot paths and potential race condition in audio data handling |
| frontend/src/services/tts/TTSContext.tsx | 4/5 | TTS React context with proper cleanup but potential memory leak in error paths |
| frontend/src-tauri/Cargo.toml | 3/5 | Dependencies added for TTS - using release candidate version which may have stability concerns |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | Clean download UI component with proper state management |
| frontend/src/components/UnifiedChat.tsx | 5/5 | TTS button integration into chat UI, properly scoped to desktop only |
| frontend/src/app.tsx | 5/5 | TTSProvider added to React component tree |
Sequence Diagram
sequenceDiagram
participant User
participant React as React Frontend<br/>(TTSContext)
participant Tauri as Tauri Backend<br/>(Rust)
participant HF as HuggingFace<br/>(Models)
participant ONNX as ONNX Runtime<br/>(Inference)
User->>React: Open TTS Dialog
React->>Tauri: tts_get_status()
Tauri->>Tauri: Check models_downloaded
Tauri->>Tauri: Check models_loaded
Tauri-->>React: Status response
alt Models not downloaded
User->>React: Click Download
React->>Tauri: tts_download_models()
loop For each model file
Tauri->>HF: HTTP GET model file
HF-->>Tauri: Stream chunks
Tauri->>Tauri: Write to .part file
Tauri->>React: Emit progress event
React->>User: Update progress bar
end
Tauri->>Tauri: Rename .part to final
Tauri-->>React: Download complete
React->>Tauri: tts_load_models()
Tauri->>ONNX: Load 4 ONNX sessions
Tauri->>Tauri: Load voice style JSON
Tauri-->>React: Models loaded
end
User->>React: Click speaker icon
React->>Tauri: tts_synthesize(text)
Tauri->>Tauri: Preprocess text
Tauri->>Tauri: Chunk text (300 chars)
loop For each chunk
Tauri->>ONNX: Duration prediction
ONNX-->>Tauri: Duration values
Tauri->>ONNX: Text encoding
ONNX-->>Tauri: Text embeddings
Tauri->>Tauri: Sample noisy latent
loop 10 denoising steps
Tauri->>ONNX: Vector estimation
ONNX-->>Tauri: Denoised latent
end
Tauri->>ONNX: Vocoder (latent→wav)
ONNX-->>Tauri: Audio samples
end
Tauri->>Tauri: Concatenate chunks
Tauri->>Tauri: Encode to base64 WAV
Tauri-->>React: Audio base64 + metadata
React->>React: Decode to AudioBuffer
React->>React: Play via Web Audio API
React-->>User: Audio playback
|
|
||
| [target.'cfg(any(target_os = "macos", target_os = "linux", target_os = "windows"))'.dependencies] | ||
| # TTS dependencies (Supertonic) - desktop only | ||
| ort = "2.0.0-rc.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using release candidate version 2.0.0-rc.7 in production may introduce stability issues - consider using a stable release
| let xt_value = Value::from_array(xt.clone())?; | ||
| let text_emb_value = Value::from_array(text_emb.clone())?; | ||
| let latent_mask_value = Value::from_array(latent_mask.clone())?; | ||
| let text_mask_value2 = Value::from_array(text_mask.clone())?; | ||
| let current_step_value = Value::from_array(current_step_array)?; | ||
| let total_step_value = Value::from_array(total_step_array.clone())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple large array clones inside inference loop (10 iterations) - xt.clone(), text_emb.clone(), latent_mask.clone(), text_mask.clone() - consider reusing Values or using references to reduce allocations
| let duration_seconds = audio.len() as f32 / tts.sample_rate as f32; | ||
| let sample_rate = tts.sample_rate; | ||
|
|
||
| // Drop the guard before encoding to release the lock | ||
| drop(guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential race condition - sample_rate is read from tts.sample_rate while holding lock, but audio vector is used after drop(guard) - if another thread modifies or unloads TTS between these operations, the sample_rate may not match the audio data
| let text_mask_value = Value::from_array(text_mask.clone())?; | ||
| let style_dp_value = Value::from_array(style.dp.clone())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloning large arrays for ONNX value creation - text_mask.clone() and style.dp.clone() - consider if references can be used to avoid allocations
| for (file_name, url_path, expected_size) in MODEL_FILES { | ||
| let file_path = models_dir.join(file_name); | ||
| let temp_path = file_path.with_extension("part"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing file extension validation - with_extension("part") could produce unexpected results if file_name contains multiple dots (e.g., model.v2.onnx becomes model.v2.part but user might expect model.v2.onnx.part)
| let mut total_downloaded: u64 = 0; | ||
|
|
||
| for (file_name, url_path, expected_size) in MODEL_FILES { | ||
| let file_path = models_dir.join(file_name); | ||
| let temp_path = file_path.with_extension("part"); | ||
|
|
||
| // Skip if already downloaded | ||
| if file_path.exists() { | ||
| total_downloaded += expected_size; | ||
| let _ = app.emit( | ||
| "tts-download-progress", | ||
| DownloadProgress { | ||
| downloaded: total_downloaded, | ||
| total: TOTAL_MODEL_SIZE, | ||
| file_name: file_name.to_string(), | ||
| percent: (total_downloaded as f64 / TOTAL_MODEL_SIZE as f64) * 100.0, | ||
| }, | ||
| ); | ||
| continue; | ||
| } | ||
|
|
||
| // Clean up any partial download from previous attempt | ||
| let _ = fs::remove_file(&temp_path); | ||
|
|
||
| let url = format!("{}/{}", HUGGINGFACE_BASE_URL, url_path); | ||
| log::info!("Downloading TTS model: {}", file_name); | ||
|
|
||
| let response = client | ||
| .get(&url) | ||
| .send() | ||
| .await | ||
| .map_err(|e| format!("Failed to download {}: {}", file_name, e))?; | ||
|
|
||
| if !response.status().is_success() { | ||
| return Err(format!( | ||
| "Failed to download {}: HTTP {}", | ||
| file_name, | ||
| response.status() | ||
| )); | ||
| } | ||
|
|
||
| let mut file = File::create(&temp_path) | ||
| .map_err(|e| format!("Failed to create file {}: {}", file_name, e))?; | ||
|
|
||
| let mut stream = response.bytes_stream(); | ||
| let mut file_downloaded: u64 = 0; | ||
|
|
||
| while let Some(chunk) = stream.next().await { | ||
| let chunk = chunk.map_err(|e| format!("Download error: {}", e))?; | ||
| file.write_all(&chunk) | ||
| .map_err(|e| format!("Write error: {}", e))?; | ||
|
|
||
| file_downloaded += chunk.len() as u64; | ||
| let current_total = total_downloaded + file_downloaded; | ||
|
|
||
| let _ = app.emit( | ||
| "tts-download-progress", | ||
| DownloadProgress { | ||
| downloaded: current_total, | ||
| total: TOTAL_MODEL_SIZE, | ||
| file_name: file_name.to_string(), | ||
| percent: (current_total as f64 / TOTAL_MODEL_SIZE as f64) * 100.0, | ||
| }, | ||
| ); | ||
| } | ||
|
|
||
| // Flush and rename temp file to final path | ||
| file.flush() | ||
| .map_err(|e| format!("Failed to flush file {}: {}", file_name, e))?; | ||
| drop(file); | ||
| fs::rename(&temp_path, &file_path) | ||
| .map_err(|e| format!("Failed to finalize {}: {}", file_name, e))?; | ||
|
|
||
| total_downloaded += expected_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no atomic write protection - if download is interrupted between file.flush() and fs::rename(), partial file may be left in inconsistent state - consider using a transaction pattern or storing checksums
| } catch (err) { | ||
| console.error("TTS download failed:", err); | ||
| setStatus("error"); | ||
| setError(err instanceof Error ? err.message : "Failed to download TTS models"); | ||
|
|
||
| // Clean up listener on error | ||
| if (unlistenRef.current) { | ||
| unlistenRef.current(); | ||
| unlistenRef.current = null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event listener cleanup not guaranteed on error - if invoke("tts_download_models") throws, unlisten() is called but unlistenRef.current may have already been set to null in the catch block - consider using try/finally
| // Use Web Audio API instead of HTMLAudioElement to avoid hijacking media controls | ||
| const audioContext = new AudioContext(); | ||
| const arrayBuffer = await audioBlob.arrayBuffer(); | ||
| const audioBuffer = await audioContext.decodeAudioData(arrayBuffer); | ||
|
|
||
| const source = audioContext.createBufferSource(); | ||
| source.buffer = audioBuffer; | ||
| source.connect(audioContext.destination); | ||
|
|
||
| // Store context and source for stop functionality | ||
| audioContextRef.current = audioContext; | ||
| sourceNodeRef.current = source; | ||
|
|
||
| source.onended = () => { | ||
| if (sourceNodeRef.current !== source) { | ||
| return; | ||
| } | ||
| setIsPlaying(false); | ||
| setCurrentPlayingId(null); | ||
|
|
||
| if (audioUrlRef.current === audioUrl) { | ||
| URL.revokeObjectURL(audioUrlRef.current); | ||
| audioUrlRef.current = null; | ||
| } | ||
| audioContext.close(); | ||
| audioContextRef.current = null; | ||
| sourceNodeRef.current = null; | ||
| }; | ||
|
|
||
| source.start(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audio context cleanup may fail silently - audioContext.close() at line 257 can throw if context is already closed, but it's inside source.onended callback where errors aren't caught
| const processedText = preprocessTextForTTS(text); | ||
| if (!processedText) { | ||
| setIsPlaying(false); | ||
| setCurrentPlayingId(null); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty text preprocessing result is silently ignored - if preprocessTextForTTS(text) returns empty string, playback state is set then immediately unset without user feedback
Additional Comments (1)
|
Summary
Adds text-to-speech functionality to Maple desktop app using Supertonic, a fast on-device TTS engine.
Features
Technical Details
Testing
Notes
~/Library/Application Support/cloud.opensecret.maple/tts_models/on macOSSummary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.