feat: add text-to-speech audio decryption support #48

AnthonyRonning · 2025-09-13T02:10:01Z

Handle TTS responses with content_base64 and content_type fields
Decode base64 audio data to binary format
Preserve correct MIME types for audio playback
Add integration test for Kokoro TTS model

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Non-streaming API responses now support automatic decryption and are returned with preserved status/headers; TTS responses are returned as playable binary audio with correct content-type.
Bug Fixes
- Improved handling and logging for non-JSON and decryption failures to reduce unexpected errors.
Tests
- Added an integration test validating text-to-speech audio output and basic format heuristics.

- Handle TTS responses with content_base64 and content_type fields - Decode base64 audio data to binary format - Preserve correct MIME types for audio playback - Add integration test for Kokoro TTS model 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-09-13T02:10:07Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Implements decryption-aware handling for non-SSE JSON responses in src/lib/ai.ts, decoding TTS base64 payloads into binary Responses and preserving status/headers while keeping SSE decryption unchanged. Adds an integration test for kokoro text-to-speech that heuristically verifies MP3-like output.

Changes

Cohort / File(s)	Summary
Decryption and TTS response handling `src/lib/ai.ts`	Replaces early raw Response return with logic to read response text, parse JSON to detect encrypted payloads, decrypt using `sessionKey`, and re-wrap Responses. If decrypted payload contains `content_base64` and `content_type`, decodes to binary Uint8Array and returns a binary Response with correct `content-type`, preserving status/headers and removing invalid headers. Adds error handling and logging. SSE decryption flow unchanged.
Integration test for TTS (kokoro) `src/lib/test/integration/ai.test.ts`	Adds a test invoking `audio.speech.create` with model `"kokoro"` and voice `"af_sky"`, requesting `"mp3"`. Converts response to Buffer, asserts non-empty, logs size/sample, and heuristically checks for MP3 indicators (ID3 or MPEG sync); warns on atypical headers without failing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant ai as ai.ts (fetch wrapper)
  participant Server

  Client->>ai: fetch()
  ai->>Server: HTTP request
  Server-->>ai: HTTP response (JSON / SSE / TTS)

  alt SSE stream
    note right of ai #E6F7FF: SSE decryption flow (unchanged)
    ai-->>Client: Streamed events
  else Non-SSE
    ai->>ai: read response text
    alt JSON with encrypted field
      ai->>ai: decrypt using sessionKey
      alt Decrypted JSON has content_base64 & content_type
        ai->>ai: base64 -> Uint8Array
        note right of ai #F0FFF4: Build binary Response\nremove invalid headers
        ai-->>Client: Binary Response (preserve status/headers)
      else Other decrypted JSON/text
        ai-->>Client: Response with decrypted text (preserve status/headers)
      end
    else Not JSON / no encryption
      ai-->>Client: Response with original text (preserve status/headers)
    end
    note right of ai #FFF8E6: Log parse/decrypt errors
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble bytes beneath the moonlit tree,
Unwrap the crypt, set secret packets free.
A whisper turns to song—mp3 delight—🎵
Kokoro’s voice takes wing in velvet night.
Headers snug, streams hum, all bits aligned—
A happy hop through code, securely signed.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely and accurately summarizes the primary change: adding text-to-speech audio decryption support. It is short, specific, and directly matches the PR objectives and code changes (TTS decryption, base64 audio decoding, and MIME preservation). It avoids unnecessary detail or noise and uses a conventional "feat:" prefix appropriate for a feature addition.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/tts-audio-support

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0d81b1 and 9d16355.

📒 Files selected for processing (1)

src/lib/ai.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/ai.ts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2025-09-13T02:10:56Z

Deploying opensecret-sdk with Cloudflare Pages

Latest commit:	`9d16355`
Status:	✅ Deploy successful!
Preview URL:	https://efb31496.opensecret-sdk.pages.dev
Branch Preview URL:	https://feat-tts-audio-support.opensecret-sdk.pages.dev

View logs

greptile-apps

Greptile Summary

This PR adds text-to-speech (TTS) audio support to the OpenSecret SDK by extending the existing custom fetch implementation to handle encrypted audio responses. The changes modify the createCustomFetch function in src/lib/ai.ts to detect TTS responses containing content_base64 and content_type fields, decrypt the encrypted JSON payload, decode the base64 audio data to binary format, and return it as a proper Response object with the correct MIME type.

The implementation fits seamlessly into the existing architecture by extending the response handling logic that already processes encrypted streaming text responses. When a response contains encrypted JSON with TTS fields, the new code path decodes the audio data and constructs a binary Response that can be consumed by the OpenAI client's audio APIs. This maintains the end-to-end encryption model that's central to OpenSecret's security approach while enabling audio functionality.

A comprehensive integration test was added to validate the functionality using the Kokoro TTS model. The test creates an OpenAI client with the custom fetch function, makes a TTS request, and validates that the returned audio data has the correct format and size characteristics.

Confidence score: 3/5

This PR introduces complex audio handling logic that could have edge cases with different audio formats or malformed responses
Score reflects concerns about error handling in the base64 decoding logic and the test's reliance on console logging rather than proper assertions
Pay close attention to the audio decoding implementation in src/lib/ai.ts and consider adding more robust error handling for malformed base64 data

_{2 files reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

src/lib/test/integration/ai.test.ts

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

src/lib/ai.ts (4)

146-151: Tighten base64→bytes conversion.

Slightly faster and simpler; avoids the manual loop.

Apply:

-              const binaryString = atob(decryptedData.content_base64);
-              const bytes = new Uint8Array(binaryString.length);
-              for (let i = 0; i < binaryString.length; i++) {
-                bytes[i] = binaryString.charCodeAt(i);
-              }
+              const binaryString = atob(decryptedData.content_base64);
+              const bytes = Uint8Array.from(binaryString, c => c.charCodeAt(0));

168-174: Sanitize headers when re-wrapping decrypted text.

Preserve headers but strip encoding/length, which may be stale after re-wrapping.

Apply:

-          return new Response(decrypted, {
-            headers: response.headers,
-            status: response.status,
-            statusText: response.statusText
-          });
+          {
+            const headersOut = new Headers(response.headers);
+            headersOut.delete('content-encoding');
+            headersOut.delete('content-length');
+            headersOut.delete('transfer-encoding');
+            return new Response(decrypted, {
+              headers: headersOut,
+              status: response.status,
+              statusText: response.statusText
+            });
+          }

175-179: Clarify catch message and sanitize headers for the pass-through path.

The log is misleading if decryption failed. Also strip stale encoding/length in the pass-through Response.

Apply:

-      } catch (e) {
-        // If it's not JSON or doesn't have encrypted field, return original response
-        console.log("Response is not encrypted JSON, returning as-is");
-      }
+      } catch (e) {
+        // Non-JSON, missing 'encrypted', or decryption failed — return response as-is
+        console.log("Non-JSON, missing 'encrypted', or decryption failed; returning as-is");
+      }
 
-      // Return the original response text as a new Response
-      return new Response(responseText, {
-        headers: response.headers,
-        status: response.status,
-        statusText: response.statusText
-      });
+      // Return the original response text as a new Response (with sanitized headers)
+      {
+        const headersOut = new Headers(response.headers);
+        headersOut.delete('content-encoding');
+        headersOut.delete('content-length');
+        headersOut.delete('transfer-encoding');
+        return new Response(responseText, {
+          headers: headersOut,
+          status: response.status,
+          statusText: response.statusText
+        });
+      }

Also applies to: 181-185

141-164: Harden TTS payload detection.

Guard against non-string types and empty values to avoid exceptions from atob.

Apply:

-            if (decryptedData.content_base64 && decryptedData.content_type) {
+            if (typeof decryptedData.content_base64 === 'string' && decryptedData.content_base64.length > 0
+                && typeof decryptedData.content_type === 'string' && decryptedData.content_type.length > 0) {

src/lib/test/integration/ai.test.ts (3)

141-146: Also assert the response Content-Type is audio.

Verifies that the binary re-wrap preserved/overrode MIME type as intended.

Apply:

   const buffer = Buffer.from(await response.arrayBuffer());
   
   // Verify we got back audio data
   expect(buffer.length).toBeGreaterThan(0);
+
+  // And that it reports an audio content-type
+  const contentType = response.headers.get("content-type") || "";
+  expect(contentType.startsWith("audio/")).toBe(true);

146-155: Reduce noisy debug logs or guard behind an env flag.

Keeps CI output clean while preserving local diagnostics.

Apply:

-  console.log(`TTS response size: ${buffer.length} bytes`);
-  
-  // Log the first 20 bytes to understand the format
-  const first20Bytes = buffer.slice(0, 20).toString('hex');
-  console.log(`First 20 bytes (hex): ${first20Bytes}`);
-  
-  // Also log as string to see if it's JSON or text
-  const first100Chars = buffer.slice(0, 100).toString('utf-8');
-  console.log(`First 100 chars (string): ${first100Chars}`);
+  if (process.env.DEBUG_TTS) {
+    console.log(`TTS response size: ${buffer.length} bytes`);
+    const first20Bytes = buffer.slice(0, 20).toString('hex');
+    console.log(`First 20 bytes (hex): ${first20Bytes}`);
+    const first100Chars = buffer.slice(0, 100).toString('utf-8');
+    console.log(`First 100 chars (string): ${first100Chars}`);
+  }

156-165: OK to keep heuristic MP3 checks; consider asserting on any audio signature if server varies.

Non-blocking: your warning is fine given upstream variability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c531d81 and f4697fd.

📒 Files selected for processing (2)

src/lib/ai.ts (1 hunks)
src/lib/test/integration/ai.test.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/lib/ai.ts (1)

src/lib/encryption.ts (1)

decryptMessage (20-34)

src/lib/test/integration/ai.test.ts (1)

src/lib/ai.ts (1)

createCustomFetch (9-191)

🔇 Additional comments (1)

src/lib/test/integration/ai.test.ts (1)

118-129: LGTM: correct setup of client with custom fetch and identity encoding.

The test correctly exercises the new decryption path with the custom fetch wrapper.

src/lib/ai.ts

- Headers object doesn't have enumerable properties, so spreading doesn't work - Use new Headers(response.headers) to properly clone headers - Remove invalid headers (content-encoding, content-length, transfer-encoding) for decoded response 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

greptile-apps

Greptile Summary

This review covers only the changes made since the last review, not the entire PR.

The latest changes implement the previously suggested header handling improvements for TTS audio responses. The code now properly clones the response headers using new Headers(response.headers) instead of object spreading, which was causing all headers to be dropped. The implementation correctly sets the audio content-type and removes invalid headers (content-encoding, content-length, transfer-encoding) that are no longer applicable after base64 decoding.

The TTS functionality works by detecting encrypted responses containing content_base64 and content_type fields, decoding the base64 audio data to binary using atob() and Uint8Array, then returning a proper Response object that browsers can handle for audio playback. This extends the existing createCustomFetch function in ai.ts to seamlessly support audio responses from TTS models while maintaining full compatibility with existing text-based AI responses.

The changes integrate well with OpenSecret's encryption architecture, where TTS responses are encrypted server-side and decrypted client-side, following the same security model as text responses but with additional handling for binary audio data and MIME type preservation.

Confidence score: 4/5

This PR is safe to merge with minimal risk after addressing the MPEG sync detection
Score reflects solid implementation of header handling and TTS support with one minor validation issue
Pay close attention to the MPEG sync detection logic which may produce false positives

_{1 file reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

src/lib/ai.ts

- Wrap atob() in try-catch to handle malformed base64 gracefully - Throw descriptive error if base64 decoding fails - Prevents uncaught exceptions from invalid audio data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

greptile-apps

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. This update addresses all the previously identified issues with the text-to-speech audio decryption support implementation.

The developer has implemented comprehensive fixes to the TTS audio handling logic in src/lib/ai.ts. The key improvements include: wrapping the atob() base64 decoding operation in proper try-catch error handling to prevent crashes from malformed data, fixing the header propagation issue by properly cloning the response headers using new Headers(response.headers) and sanitizing hop-by-hop headers like content-encoding, content-length, and transfer-encoding, adding content-type checking to avoid corrupting non-JSON binary responses by early-returning when the response isn't JSON, and correcting the MPEG audio format detection logic to properly check for specific sync word values (fffa or fffb) instead of the overly broad startsWith('fff') pattern.

These changes ensure that the custom fetch function can safely handle TTS responses while maintaining backward compatibility with existing streaming and non-streaming text responses. The implementation now properly preserves response headers, handles edge cases gracefully, and avoids corrupting binary data from non-JSON endpoints.

Confidence score: 4/5

This PR is safe to merge with minimal risk after addressing the previous review feedback
Score reflects that all major issues from previous reviews have been addressed with appropriate error handling and edge case management
Pay close attention to the audio format detection logic and header sanitization implementation

_{1 file reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

src/lib/test/integration/ai.test.ts Show resolved Hide resolved

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

src/lib/ai.ts Show resolved Hide resolved

src/lib/ai.ts Show resolved Hide resolved

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

src/lib/ai.ts Outdated Show resolved Hide resolved

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

AnthonyRonning merged commit 1911291 into master Sep 13, 2025
7 of 8 checks passed

AnthonyRonning deleted the feat/tts-audio-support branch September 13, 2025 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add text-to-speech audio decryption support #48

feat: add text-to-speech audio decryption support #48

Uh oh!

AnthonyRonning commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

cloudflare-workers-and-pages bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add text-to-speech audio decryption support #48

feat: add text-to-speech audio decryption support #48

Uh oh!

Conversation

AnthonyRonning commented Sep 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

cloudflare-workers-and-pages bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying opensecret-sdk with Cloudflare Pages

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 3/5

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnthonyRonning commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Sep 13, 2025 •

edited

Loading