-
Notifications
You must be signed in to change notification settings - Fork 6
feat: add text-to-speech audio decryption support #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Handle TTS responses with content_base64 and content_type fields - Decode base64 audio data to binary format - Preserve correct MIME types for audio playback - Add integration test for Kokoro TTS model 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughImplements decryption-aware handling for non-SSE JSON responses in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant ai as ai.ts (fetch wrapper)
participant Server
Client->>ai: fetch()
ai->>Server: HTTP request
Server-->>ai: HTTP response (JSON / SSE / TTS)
alt SSE stream
note right of ai #E6F7FF: SSE decryption flow (unchanged)
ai-->>Client: Streamed events
else Non-SSE
ai->>ai: read response text
alt JSON with encrypted field
ai->>ai: decrypt using sessionKey
alt Decrypted JSON has content_base64 & content_type
ai->>ai: base64 -> Uint8Array
note right of ai #F0FFF4: Build binary Response\nremove invalid headers
ai-->>Client: Binary Response (preserve status/headers)
else Other decrypted JSON/text
ai-->>Client: Response with decrypted text (preserve status/headers)
end
else Not JSON / no encryption
ai-->>Client: Response with original text (preserve status/headers)
end
note right of ai #FFF8E6: Log parse/decrypt errors
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Comment |
Deploying opensecret-sdk with
|
| Latest commit: |
9d16355
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://efb31496.opensecret-sdk.pages.dev |
| Branch Preview URL: | https://feat-tts-audio-support.opensecret-sdk.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds text-to-speech (TTS) audio support to the OpenSecret SDK by extending the existing custom fetch implementation to handle encrypted audio responses. The changes modify the createCustomFetch function in src/lib/ai.ts to detect TTS responses containing content_base64 and content_type fields, decrypt the encrypted JSON payload, decode the base64 audio data to binary format, and return it as a proper Response object with the correct MIME type.
The implementation fits seamlessly into the existing architecture by extending the response handling logic that already processes encrypted streaming text responses. When a response contains encrypted JSON with TTS fields, the new code path decodes the audio data and constructs a binary Response that can be consumed by the OpenAI client's audio APIs. This maintains the end-to-end encryption model that's central to OpenSecret's security approach while enabling audio functionality.
A comprehensive integration test was added to validate the functionality using the Kokoro TTS model. The test creates an OpenAI client with the custom fetch function, makes a TTS request, and validates that the returned audio data has the correct format and size characteristics.
Confidence score: 3/5
- This PR introduces complex audio handling logic that could have edge cases with different audio formats or malformed responses
- Score reflects concerns about error handling in the base64 decoding logic and the test's reliance on console logging rather than proper assertions
- Pay close attention to the audio decoding implementation in
src/lib/ai.tsand consider adding more robust error handling for malformed base64 data
2 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
src/lib/ai.ts (4)
146-151: Tighten base64→bytes conversion.Slightly faster and simpler; avoids the manual loop.
Apply:
- const binaryString = atob(decryptedData.content_base64); - const bytes = new Uint8Array(binaryString.length); - for (let i = 0; i < binaryString.length; i++) { - bytes[i] = binaryString.charCodeAt(i); - } + const binaryString = atob(decryptedData.content_base64); + const bytes = Uint8Array.from(binaryString, c => c.charCodeAt(0));
168-174: Sanitize headers when re-wrapping decrypted text.Preserve headers but strip encoding/length, which may be stale after re-wrapping.
Apply:
- return new Response(decrypted, { - headers: response.headers, - status: response.status, - statusText: response.statusText - }); + { + const headersOut = new Headers(response.headers); + headersOut.delete('content-encoding'); + headersOut.delete('content-length'); + headersOut.delete('transfer-encoding'); + return new Response(decrypted, { + headers: headersOut, + status: response.status, + statusText: response.statusText + }); + }
175-179: Clarify catch message and sanitize headers for the pass-through path.The log is misleading if decryption failed. Also strip stale encoding/length in the pass-through Response.
Apply:
- } catch (e) { - // If it's not JSON or doesn't have encrypted field, return original response - console.log("Response is not encrypted JSON, returning as-is"); - } + } catch (e) { + // Non-JSON, missing 'encrypted', or decryption failed — return response as-is + console.log("Non-JSON, missing 'encrypted', or decryption failed; returning as-is"); + } - // Return the original response text as a new Response - return new Response(responseText, { - headers: response.headers, - status: response.status, - statusText: response.statusText - }); + // Return the original response text as a new Response (with sanitized headers) + { + const headersOut = new Headers(response.headers); + headersOut.delete('content-encoding'); + headersOut.delete('content-length'); + headersOut.delete('transfer-encoding'); + return new Response(responseText, { + headers: headersOut, + status: response.status, + statusText: response.statusText + }); + }Also applies to: 181-185
141-164: Harden TTS payload detection.Guard against non-string types and empty values to avoid exceptions from
atob.Apply:
- if (decryptedData.content_base64 && decryptedData.content_type) { + if (typeof decryptedData.content_base64 === 'string' && decryptedData.content_base64.length > 0 + && typeof decryptedData.content_type === 'string' && decryptedData.content_type.length > 0) {src/lib/test/integration/ai.test.ts (3)
141-146: Also assert the response Content-Type is audio.Verifies that the binary re-wrap preserved/overrode MIME type as intended.
Apply:
const buffer = Buffer.from(await response.arrayBuffer()); // Verify we got back audio data expect(buffer.length).toBeGreaterThan(0); + + // And that it reports an audio content-type + const contentType = response.headers.get("content-type") || ""; + expect(contentType.startsWith("audio/")).toBe(true);
146-155: Reduce noisy debug logs or guard behind an env flag.Keeps CI output clean while preserving local diagnostics.
Apply:
- console.log(`TTS response size: ${buffer.length} bytes`); - - // Log the first 20 bytes to understand the format - const first20Bytes = buffer.slice(0, 20).toString('hex'); - console.log(`First 20 bytes (hex): ${first20Bytes}`); - - // Also log as string to see if it's JSON or text - const first100Chars = buffer.slice(0, 100).toString('utf-8'); - console.log(`First 100 chars (string): ${first100Chars}`); + if (process.env.DEBUG_TTS) { + console.log(`TTS response size: ${buffer.length} bytes`); + const first20Bytes = buffer.slice(0, 20).toString('hex'); + console.log(`First 20 bytes (hex): ${first20Bytes}`); + const first100Chars = buffer.slice(0, 100).toString('utf-8'); + console.log(`First 100 chars (string): ${first100Chars}`); + }
156-165: OK to keep heuristic MP3 checks; consider asserting on any audio signature if server varies.Non-blocking: your warning is fine given upstream variability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib/ai.ts(1 hunks)src/lib/test/integration/ai.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/lib/ai.ts (1)
src/lib/encryption.ts (1)
decryptMessage(20-34)
src/lib/test/integration/ai.test.ts (1)
src/lib/ai.ts (1)
createCustomFetch(9-191)
🔇 Additional comments (1)
src/lib/test/integration/ai.test.ts (1)
118-129: LGTM: correct setup of client with custom fetch and identity encoding.The test correctly exercises the new decryption path with the custom fetch wrapper.
- Headers object doesn't have enumerable properties, so spreading doesn't work - Use new Headers(response.headers) to properly clone headers - Remove invalid headers (content-encoding, content-length, transfer-encoding) for decoded response 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review, not the entire PR.
The latest changes implement the previously suggested header handling improvements for TTS audio responses. The code now properly clones the response headers using new Headers(response.headers) instead of object spreading, which was causing all headers to be dropped. The implementation correctly sets the audio content-type and removes invalid headers (content-encoding, content-length, transfer-encoding) that are no longer applicable after base64 decoding.
The TTS functionality works by detecting encrypted responses containing content_base64 and content_type fields, decoding the base64 audio data to binary using atob() and Uint8Array, then returning a proper Response object that browsers can handle for audio playback. This extends the existing createCustomFetch function in ai.ts to seamlessly support audio responses from TTS models while maintaining full compatibility with existing text-based AI responses.
The changes integrate well with OpenSecret's encryption architecture, where TTS responses are encrypted server-side and decrypted client-side, following the same security model as text responses but with additional handling for binary audio data and MIME type preservation.
Confidence score: 4/5
- This PR is safe to merge with minimal risk after addressing the MPEG sync detection
- Score reflects solid implementation of header handling and TTS support with one minor validation issue
- Pay close attention to the MPEG sync detection logic which may produce false positives
1 file reviewed, 1 comment
- Wrap atob() in try-catch to handle malformed base64 gracefully - Throw descriptive error if base64 decoding fails - Prevents uncaught exceptions from invalid audio data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. This update addresses all the previously identified issues with the text-to-speech audio decryption support implementation.
The developer has implemented comprehensive fixes to the TTS audio handling logic in src/lib/ai.ts. The key improvements include: wrapping the atob() base64 decoding operation in proper try-catch error handling to prevent crashes from malformed data, fixing the header propagation issue by properly cloning the response headers using new Headers(response.headers) and sanitizing hop-by-hop headers like content-encoding, content-length, and transfer-encoding, adding content-type checking to avoid corrupting non-JSON binary responses by early-returning when the response isn't JSON, and correcting the MPEG audio format detection logic to properly check for specific sync word values (fffa or fffb) instead of the overly broad startsWith('fff') pattern.
These changes ensure that the custom fetch function can safely handle TTS responses while maintaining backward compatibility with existing streaming and non-streaming text responses. The implementation now properly preserves response headers, handles edge cases gracefully, and avoids corrupting binary data from non-JSON endpoints.
Confidence score: 4/5
- This PR is safe to merge with minimal risk after addressing the previous review feedback
- Score reflects that all major issues from previous reviews have been addressed with appropriate error handling and edge case management
- Pay close attention to the audio format detection logic and header sanitization implementation
1 file reviewed, no comments
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests