Skip to content

Conversation

@AnthonyRonning
Copy link
Contributor

@AnthonyRonning AnthonyRonning commented Sep 13, 2025

  • Add transcribeAudio function in api.ts with full encryption support
  • Export transcribeAudio from OpenSecret context in main.tsx
  • Support optional parameters: model, language, prompt, response_format, temperature
  • Add test that chains TTS → Whisper to verify functionality
  • Uses whisper-large-v3 model (routes to Tinfoil's whisper-large-v3-turbo)
  • Supports all common audio formats (MP3, WAV, MP4, M4A, FLAC, OGG, WEBM)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added audio transcription via Whisper: upload a File/Blob and receive text (default model provided).
    • Exposed transcribeAudio on the app context so components can call transcription directly.
    • Supports optional API-key override for transcription requests.
  • Documentation

    • Added usage guidance, parameter details, defaults, supported formats, and error notes for transcribeAudio.
  • Tests

    • Added an integration test validating end-to-end audio transcription.

- Add transcribeAudio function in api.ts with full encryption support
- Export transcribeAudio from OpenSecret context in main.tsx
- Support optional parameters: model, language, prompt, response_format, temperature
- Add test that chains TTS → Whisper to verify functionality
- Uses whisper-large-v3 model (routes to Tinfoil's whisper-large-v3-turbo)
- Supports all common audio formats (MP3, WAV, MP4, M4A, FLAC, OGG, WEBM)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Sep 13, 2025

Walkthrough

Adds Whisper transcription support: new request/response types and a transcribeAudio function that base64-encodes audio and POSTs to /v1/audio/transcriptions using authenticated call (JWT or API key). Exposes transcribeAudio via OpenSecret context and adds an integration test generating TTS audio and asserting transcription.

Changes

Cohort / File(s) Summary
Transcription API
src/lib/api.ts
Adds WhisperTranscriptionRequest and WhisperTranscriptionResponse types and exports `transcribeAudio(file: File
Context Exposure
src/lib/main.tsx
Extends OpenSecretContextType with transcribeAudio, wires context default and provider value to delegate to api.transcribeAudio, and provider wrapper injects apiKey when present. Also updates fetchModels wrappers to accept optional apiKey.
Integration Test
src/lib/test/integration/ai.test.ts
Adds an integration test importing transcribeAudio, generates MP3 via TTS, converts it to a File, calls transcribeAudio (using whisper-large-v3), and asserts the returned transcript contains "hello". Uses existing test helpers and a @ts-expect-error for an unsupported Kokoro voice.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor UI as Component
  participant Ctx as OpenSecretContext
  participant API as api.transcribeAudio
  participant Auth as openAiAuthenticatedApiCall
  participant OAI as POST /v1/audio/transcriptions

  UI->>Ctx: transcribeAudio(file, options)
  Ctx->>API: delegate (inject apiKey if present)
  API->>API: encode file → base64\ninfer filename & content_type\nbuild WhisperTranscriptionRequest
  alt apiKey provided
    API->>Auth: call with apiKey override
  else JWT flow
    API->>Auth: call with JWT
  end
  Auth->>OAI: POST payload
  OAI-->>Auth: { text }
  Auth-->>API: { text }
  API-->>Ctx: { text }
  Ctx-->>UI: { text }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hummed to the code in a whispering breeze,
Turned waves into words with nimble ease.
Context now speaks, “transcribe on the go!”
Keys or tokens—either will flow.
Tests sing “hello” — bunny’s ears aglow. 🐇🎧

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the primary change — adding Whisper transcription support to the TypeScript SDK — and uses a conventional "feat:" prefix; it directly reflects the changes in api.ts and main.tsx and is neither vague nor misleading. A teammate scanning history will understand the main intent without needing file lists or extra detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/whisper-transcription

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Sep 13, 2025

Deploying opensecret-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 05ed085
Status: ✅  Deploy successful!
Preview URL: https://45230d66.opensecret-sdk.pages.dev
Branch Preview URL: https://feat-whisper-transcription.opensecret-sdk.pages.dev

View logs

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds Whisper-based audio transcription capabilities to the OpenSecret TypeScript SDK, expanding the existing AI integration features. The implementation introduces a new transcribeAudio function in api.ts that follows the established SDK patterns by using the openAiAuthenticatedApiCall helper for encrypted API communication, supporting both JWT token and API key authentication methods.

The transcription function converts audio files (File or Blob objects) to base64 format for secure transmission, similar to the existing document upload functionality. It supports all standard Whisper API parameters including model selection (defaulting to whisper-large-v3), language hints, prompts, response format configuration, and temperature controls. The function handles common audio formats including MP3, WAV, MP4, M4A, FLAC, OGG, and WEBM.

The functionality is properly integrated into the OpenSecret React context through main.tsx, where the transcribeAudio method is added to the OpenSecretContextType interface and exposed through the provider. This follows the same architectural pattern used for other SDK features like AI chat and TTS, ensuring consistency across the codebase.

A comprehensive integration test validates the feature through a TTS → Whisper transcription chain, generating audio using the Kokoro TTS model and then transcribing it back to verify the complete audio processing pipeline works correctly within the SDK's encrypted environment.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it follows established patterns and adds new functionality without modifying existing code paths
  • Score reflects adherence to existing architectural patterns, comprehensive testing, and proper integration with the SDK's encryption and authentication systems
  • No files require special attention as the implementation consistently follows the codebase's established conventions

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
src/lib/api.ts (4)

1494-1499: Docs claim defaults for response_format/temperature, but code doesn’t set them

Either set the defaults in code or adjust the JSDoc to say “server defaults”. I suggest setting them explicitly for predictability.

Apply:

   const requestData: WhisperTranscriptionRequest = {
     file: base64Data,
     filename,
     content_type: contentType,
-    model: options?.model || "whisper-large-v3",
-    ...(options?.language && { language: options.language }),
-    ...(options?.prompt && { prompt: options.prompt }),
-    ...(options?.response_format && { response_format: options.response_format }),
-    ...(options?.temperature !== undefined && { temperature: options.temperature })
+    model: options?.model ?? "whisper-large-v3",
+    ...(options?.language && { language: options.language }),
+    ...(options?.prompt && { prompt: options.prompt }),
+    response_format: options?.response_format ?? "json",
+    temperature: options?.temperature ?? 0.0
   };

Also applies to: 1539-1548


1536-1538: Don’t default content_type to audio/mpeg—derive or fall back to octet-stream

Defaulting to MP3 can be wrong for WAV/OGG/WEBM etc. Derive from filename if type is missing, else use application/octet-stream.

Apply:

-  const filename = file instanceof File ? file.name : "audio";
-  const contentType = file.type || "audio/mpeg";
+  const filename = file instanceof File ? file.name : "audio";
+  const contentType =
+    file.type ||
+    (filename.toLowerCase().endsWith(".mp3") ? "audio/mpeg" :
+     filename.toLowerCase().endsWith(".wav") ? "audio/wav" :
+     filename.toLowerCase().endsWith(".mp4") ? "audio/mp4" :
+     filename.toLowerCase().endsWith(".m4a") ? "audio/m4a" :
+     filename.toLowerCase().endsWith(".flac") ? "audio/flac" :
+     filename.toLowerCase().endsWith(".ogg") ? "audio/ogg" :
+     filename.toLowerCase().endsWith(".webm") ? "audio/webm" :
+     "application/octet-stream");

1531-1534: Guardrails: add size limit and temperature bounds check

Reading large blobs then base64-encoding doubles memory. Add a reasonable size cap (e.g., 25MB) and clamp temperature to [0,1] to prevent server-side errors.

Apply:

 export async function transcribeAudio(
   file: File | Blob,
   options?: {
@@
 ): Promise<WhisperTranscriptionResponse> {
+  // Basic validation
+  const MAX_AUDIO_BYTES = 25 * 1024 * 1024; // 25MB
+  if (file.size > MAX_AUDIO_BYTES) {
+    throw new Error(`Audio exceeds ${Math.round(MAX_AUDIO_BYTES / (1024*1024))}MB limit`);
+  }
+  if (options?.temperature !== undefined) {
+    if (Number.isNaN(options.temperature)) throw new Error("temperature must be a number");
+    const t = Math.min(1, Math.max(0, options.temperature));
+    if (t !== options.temperature) {
+      options = { ...options, temperature: t };
+    }
+  }

1465-1474: Stronger typing for response_format

Constrain response_format to known values to catch mistakes at compile time.

Apply:

+export type WhisperResponseFormat = "json" | "text" | "srt" | "verbose_json" | "vtt";
 export type WhisperTranscriptionRequest = {
   file: string; // Base64-encoded audio file data
   filename: string;
   content_type: string;
   model: string;
   language?: string;
   prompt?: string;
-  response_format?: string;
+  response_format?: WhisperResponseFormat;
   temperature?: number;
 };
src/lib/test/integration/ai.test.ts (2)

168-215: End-to-end TTS→Whisper test is solid; consider trimming logs and simplifying Blob/File creation

  • The many console logs can add noise/flakiness in CI; consider gating behind an env var.
  • Minor simplification: avoid Buffer; build Blob/File directly from ArrayBuffer.

Apply:

-  const audioBuffer = Buffer.from(await ttsResponse.arrayBuffer());
-  console.log(`Generated audio size: ${audioBuffer.length} bytes`);
-  
-  // Step 2: Create a Blob from the audio buffer
-  const audioBlob = new Blob([audioBuffer], { type: "audio/mpeg" });
+  const ab = await ttsResponse.arrayBuffer();
+  // console.debug(`Generated audio size: ${ab.byteLength} bytes`);
+  const audioBlob = new Blob([ab], { type: "audio/mpeg" });
   const audioFile = new File([audioBlob], "tts_output.mp3", { type: "audio/mpeg" });

168-215: Consider an API-key-only path test

Context claims API key can be used instead of JWT for /v1/*; add a test that clears tokens, sets an API key, and calls transcription to ensure parity with JWT flows.

I can draft an additional integration test that sets a temporary API key and verifies transcribeAudio without JWT. Want me to open a follow-up PR?

src/lib/main.tsx (1)

644-673: Context docs: note automatic API key usage

The provider currently claims “When set, this will be used instead of JWT for /v1/* endpoints.” Make it explicit that context.transcribeAudio uses the in-memory apiKey automatically (no need to pass options.apiKey), or implement that behavior (see next comment).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1368e3 and 15fd195.

📒 Files selected for processing (3)
  • src/lib/api.ts (1 hunks)
  • src/lib/main.tsx (3 hunks)
  • src/lib/test/integration/ai.test.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/lib/test/integration/ai.test.ts (2)
src/lib/ai.ts (1)
  • createCustomFetch (9-201)
src/lib/api.ts (1)
  • transcribeAudio (1519-1558)
src/lib/api.ts (1)
src/lib/encryptedApi.ts (1)
  • openAiAuthenticatedApiCall (80-110)
🔇 Additional comments (1)
src/lib/main.tsx (1)

740-742: Default context mapping LGTM

Exposes deleteApiKey and transcribeAudio; once the provider wiring above is applied, behavior will align with the API key docs.

Both functions now automatically use the context's apiKey when available:
- fetchModels() now passes context.apiKey to api.fetchModels
- transcribeAudio() merges context.apiKey with any provided options
This ensures consistency with how API keys are handled across the SDK

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has implemented the previously suggested fix to properly wire the context.apiKey into the transcribeAudio function calls. The changes modify both the default context and provider context in main.tsx to ensure consistent API key injection behavior.

The implementation creates wrapper functions for both fetchModels and transcribeAudio in the default context that pass undefined as the API key parameter, maintaining backward compatibility. In the provider context, these functions are wrapped to automatically inject the apiKey from the context state when available. For transcribeAudio, the wrapper spreads any existing options and adds the apiKey property, while for fetchModels, it simply passes the apiKey directly.

This change aligns the transcribeAudio function with the existing pattern used by other OpenAI-compatible endpoints in the SDK, ensuring that when users set an API key via setApiKey(), it gets automatically applied to transcription requests without requiring manual parameter passing.

Confidence score: 4/5

  • This change properly addresses the previous review feedback and implements the API key injection correctly
  • The implementation follows the established pattern used by other API functions in the codebase
  • The wrapper functions maintain backward compatibility while adding the needed functionality

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/lib/main.tsx (2)

740-741: Prefer direct alias for default context to avoid unnecessary wrapper

No behavior change; simpler and avoids re-creating a closure.

-  transcribeAudio: (file, options) => api.transcribeAudio(file, options)
+  transcribeAudio: api.transcribeAudio

644-673: Clarify auth precedence in docs

Add a note that a caller-supplied options.apiKey (when provided) takes precedence over context.apiKey, otherwise context.apiKey is used, else JWT.

  /**
   * Transcribes audio using the Whisper API
@@
   * - temperature: Sampling temperature between 0 and 1 (default: 0.0)
   * 
+  * Authentication precedence:
+  * - If options.apiKey is provided, it is used.
+  * - Else if context.apiKey is set, it is used.
+  * - Else the active JWT session is used.
+  *
   * Supported audio formats: MP3, WAV, MP4, M4A, FLAC, OGG, WEBM
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd195 and 05ed085.

📒 Files selected for processing (1)
  • src/lib/main.tsx (3 hunks)
🔇 Additional comments (2)
src/lib/main.tsx (2)

1135-1135: Nice: fetchModels correctly injects context apiKey

Wrapper aligns with the JWT-or-API-key behavior elsewhere. LGTM.


734-734: Default context fetchModels pass-through looks good

Passing undefined keeps behavior sane outside the provider. LGTM.

@AnthonyRonning AnthonyRonning merged commit ff44b64 into master Sep 13, 2025
7 of 9 checks passed
@AnthonyRonning AnthonyRonning deleted the feat/whisper-transcription branch September 13, 2025 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants