feat: add Whisper transcription API to TypeScript SDK #49

AnthonyRonning · 2025-09-13T04:29:48Z

Add transcribeAudio function in api.ts with full encryption support
Export transcribeAudio from OpenSecret context in main.tsx
Support optional parameters: model, language, prompt, response_format, temperature
Add test that chains TTS → Whisper to verify functionality
Uses whisper-large-v3 model (routes to Tinfoil's whisper-large-v3-turbo)
Supports all common audio formats (MP3, WAV, MP4, M4A, FLAC, OGG, WEBM)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added audio transcription via Whisper: upload a File/Blob and receive text (default model provided).
- Exposed transcribeAudio on the app context so components can call transcription directly.
- Supports optional API-key override for transcription requests.
Documentation
- Added usage guidance, parameter details, defaults, supported formats, and error notes for transcribeAudio.
Tests
- Added an integration test validating end-to-end audio transcription.

- Add transcribeAudio function in api.ts with full encryption support - Export transcribeAudio from OpenSecret context in main.tsx - Support optional parameters: model, language, prompt, response_format, temperature - Add test that chains TTS → Whisper to verify functionality - Uses whisper-large-v3 model (routes to Tinfoil's whisper-large-v3-turbo) - Supports all common audio formats (MP3, WAV, MP4, M4A, FLAC, OGG, WEBM) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-09-13T04:29:54Z

Walkthrough

Adds Whisper transcription support: new request/response types and a transcribeAudio function that base64-encodes audio and POSTs to /v1/audio/transcriptions using authenticated call (JWT or API key). Exposes transcribeAudio via OpenSecret context and adds an integration test generating TTS audio and asserting transcription.

Changes

Cohort / File(s)	Summary
Transcription API `src/lib/api.ts`	Adds `WhisperTranscriptionRequest` and `WhisperTranscriptionResponse` types and exports `transcribeAudio(file: File
Context Exposure `src/lib/main.tsx`	Extends `OpenSecretContextType` with `transcribeAudio`, wires context default and provider value to delegate to `api.transcribeAudio`, and provider wrapper injects `apiKey` when present. Also updates `fetchModels` wrappers to accept optional apiKey.
Integration Test `src/lib/test/integration/ai.test.ts`	Adds an integration test importing `transcribeAudio`, generates MP3 via TTS, converts it to a `File`, calls `transcribeAudio` (using `whisper-large-v3`), and asserts the returned transcript contains "hello". Uses existing test helpers and a `@ts-expect-error` for an unsupported Kokoro voice.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor UI as Component
  participant Ctx as OpenSecretContext
  participant API as api.transcribeAudio
  participant Auth as openAiAuthenticatedApiCall
  participant OAI as POST /v1/audio/transcriptions

  UI->>Ctx: transcribeAudio(file, options)
  Ctx->>API: delegate (inject apiKey if present)
  API->>API: encode file → base64\ninfer filename & content_type\nbuild WhisperTranscriptionRequest
  alt apiKey provided
    API->>Auth: call with apiKey override
  else JWT flow
    API->>Auth: call with JWT
  end
  Auth->>OAI: POST payload
  OAI-->>Auth: { text }
  Auth-->>API: { text }
  API-->>Ctx: { text }
  Ctx-->>UI: { text }

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hummed to the code in a whispering breeze,
Turned waves into words with nimble ease.
Context now speaks, “transcribe on the go!”
Keys or tokens—either will flow.
Tests sing “hello” — bunny’s ears aglow. 🐇🎧

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely summarizes the primary change — adding Whisper transcription support to the TypeScript SDK — and uses a conventional "feat:" prefix; it directly reflects the changes in api.ts and main.tsx and is neither vague nor misleading. A teammate scanning history will understand the main intent without needing file lists or extra detail.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/whisper-transcription

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2025-09-13T04:30:36Z

Deploying opensecret-sdk with Cloudflare Pages

Latest commit:	`05ed085`
Status:	✅ Deploy successful!
Preview URL:	https://45230d66.opensecret-sdk.pages.dev
Branch Preview URL:	https://feat-whisper-transcription.opensecret-sdk.pages.dev

View logs

greptile-apps

Greptile Summary

This PR adds Whisper-based audio transcription capabilities to the OpenSecret TypeScript SDK, expanding the existing AI integration features. The implementation introduces a new transcribeAudio function in api.ts that follows the established SDK patterns by using the openAiAuthenticatedApiCall helper for encrypted API communication, supporting both JWT token and API key authentication methods.

The transcription function converts audio files (File or Blob objects) to base64 format for secure transmission, similar to the existing document upload functionality. It supports all standard Whisper API parameters including model selection (defaulting to whisper-large-v3), language hints, prompts, response format configuration, and temperature controls. The function handles common audio formats including MP3, WAV, MP4, M4A, FLAC, OGG, and WEBM.

The functionality is properly integrated into the OpenSecret React context through main.tsx, where the transcribeAudio method is added to the OpenSecretContextType interface and exposed through the provider. This follows the same architectural pattern used for other SDK features like AI chat and TTS, ensuring consistency across the codebase.

A comprehensive integration test validates the feature through a TTS → Whisper transcription chain, generating audio using the Kokoro TTS model and then transcribing it back to verify the complete audio processing pipeline works correctly within the SDK's encrypted environment.

Confidence score: 5/5

This PR is safe to merge with minimal risk as it follows established patterns and adds new functionality without modifying existing code paths
Score reflects adherence to existing architectural patterns, comprehensive testing, and proper integration with the SDK's encryption and authentication systems
No files require special attention as the implementation consistently follows the codebase's established conventions

_{3 files reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

src/lib/api.ts (4)
1494-1499: Docs claim defaults for response_format/temperature, but code doesn’t set them

Either set the defaults in code or adjust the JSDoc to say “server defaults”. I suggest setting them explicitly for predictability.

Apply:
   const requestData: WhisperTranscriptionRequest = {
     file: base64Data,
     filename,
     content_type: contentType,
-    model: options?.model || "whisper-large-v3",
-    ...(options?.language && { language: options.language }),
-    ...(options?.prompt && { prompt: options.prompt }),
-    ...(options?.response_format && { response_format: options.response_format }),
-    ...(options?.temperature !== undefined && { temperature: options.temperature })
+    model: options?.model ?? "whisper-large-v3",
+    ...(options?.language && { language: options.language }),
+    ...(options?.prompt && { prompt: options.prompt }),
+    response_format: options?.response_format ?? "json",
+    temperature: options?.temperature ?? 0.0
   };
Also applies to: 1539-1548

1536-1538: Don’t default content_type to audio/mpeg—derive or fall back to octet-stream

Defaulting to MP3 can be wrong for WAV/OGG/WEBM etc. Derive from filename if type is missing, else use application/octet-stream.

Apply:
-  const filename = file instanceof File ? file.name : "audio";
-  const contentType = file.type || "audio/mpeg";
+  const filename = file instanceof File ? file.name : "audio";
+  const contentType =
+    file.type ||
+    (filename.toLowerCase().endsWith(".mp3") ? "audio/mpeg" :
+     filename.toLowerCase().endsWith(".wav") ? "audio/wav" :
+     filename.toLowerCase().endsWith(".mp4") ? "audio/mp4" :
+     filename.toLowerCase().endsWith(".m4a") ? "audio/m4a" :
+     filename.toLowerCase().endsWith(".flac") ? "audio/flac" :
+     filename.toLowerCase().endsWith(".ogg") ? "audio/ogg" :
+     filename.toLowerCase().endsWith(".webm") ? "audio/webm" :
+     "application/octet-stream");
1531-1534: Guardrails: add size limit and temperature bounds check

Reading large blobs then base64-encoding doubles memory. Add a reasonable size cap (e.g., 25MB) and clamp temperature to [0,1] to prevent server-side errors.

Apply:
 export async function transcribeAudio(
   file: File | Blob,
   options?: {
@@
 ): Promise<WhisperTranscriptionResponse> {
+  // Basic validation
+  const MAX_AUDIO_BYTES = 25 * 1024 * 1024; // 25MB
+  if (file.size > MAX_AUDIO_BYTES) {
+    throw new Error(`Audio exceeds ${Math.round(MAX_AUDIO_BYTES / (1024*1024))}MB limit`);
+  }
+  if (options?.temperature !== undefined) {
+    if (Number.isNaN(options.temperature)) throw new Error("temperature must be a number");
+    const t = Math.min(1, Math.max(0, options.temperature));
+    if (t !== options.temperature) {
+      options = { ...options, temperature: t };
+    }
+  }
1465-1474: Stronger typing for response_format

Constrain response_format to known values to catch mistakes at compile time.

Apply:
+export type WhisperResponseFormat = "json" | "text" | "srt" | "verbose_json" | "vtt";
 export type WhisperTranscriptionRequest = {
   file: string; // Base64-encoded audio file data
   filename: string;
   content_type: string;
   model: string;
   language?: string;
   prompt?: string;
-  response_format?: string;
+  response_format?: WhisperResponseFormat;
   temperature?: number;
 };
src/lib/test/integration/ai.test.ts (2)
168-215: End-to-end TTS→Whisper test is solid; consider trimming logs and simplifying Blob/File creation

The many console logs can add noise/flakiness in CI; consider gating behind an env var.

Minor simplification: avoid Buffer; build Blob/File directly from ArrayBuffer.

Apply:
-  const audioBuffer = Buffer.from(await ttsResponse.arrayBuffer());
-  console.log(`Generated audio size: ${audioBuffer.length} bytes`);
-  
-  // Step 2: Create a Blob from the audio buffer
-  const audioBlob = new Blob([audioBuffer], { type: "audio/mpeg" });
+  const ab = await ttsResponse.arrayBuffer();
+  // console.debug(`Generated audio size: ${ab.byteLength} bytes`);
+  const audioBlob = new Blob([ab], { type: "audio/mpeg" });
   const audioFile = new File([audioBlob], "tts_output.mp3", { type: "audio/mpeg" });
168-215: Consider an API-key-only path test

Context claims API key can be used instead of JWT for /v1/*; add a test that clears tokens, sets an API key, and calls transcription to ensure parity with JWT flows.

I can draft an additional integration test that sets a temporary API key and verifies transcribeAudio without JWT. Want me to open a follow-up PR?
src/lib/main.tsx (1)

644-673: Context docs: note automatic API key usage

The provider currently claims “When set, this will be used instead of JWT for /v1/* endpoints.” Make it explicit that context.transcribeAudio uses the in-memory apiKey automatically (no need to pass options.apiKey), or implement that behavior (see next comment).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1368e3 and 15fd195.

📒 Files selected for processing (3)

src/lib/api.ts (1 hunks)
src/lib/main.tsx (3 hunks)
src/lib/test/integration/ai.test.ts (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/lib/test/integration/ai.test.ts (2)

src/lib/ai.ts (1)

createCustomFetch (9-201)

src/lib/api.ts (1)

transcribeAudio (1519-1558)

src/lib/api.ts (1)

src/lib/encryptedApi.ts (1)

openAiAuthenticatedApiCall (80-110)

🔇 Additional comments (1)

src/lib/main.tsx (1)

740-742: Default context mapping LGTM

Exposes deleteApiKey and transcribeAudio; once the provider wiring above is applied, behavior will align with the API key docs.

src/lib/main.tsx

Both functions now automatically use the context's apiKey when available: - fetchModels() now passes context.apiKey to api.fetchModels - transcribeAudio() merges context.apiKey with any provided options This ensures consistency with how API keys are handled across the SDK 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

greptile-apps

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has implemented the previously suggested fix to properly wire the context.apiKey into the transcribeAudio function calls. The changes modify both the default context and provider context in main.tsx to ensure consistent API key injection behavior.

The implementation creates wrapper functions for both fetchModels and transcribeAudio in the default context that pass undefined as the API key parameter, maintaining backward compatibility. In the provider context, these functions are wrapped to automatically inject the apiKey from the context state when available. For transcribeAudio, the wrapper spreads any existing options and adds the apiKey property, while for fetchModels, it simply passes the apiKey directly.

This change aligns the transcribeAudio function with the existing pattern used by other OpenAI-compatible endpoints in the SDK, ensuring that when users set an API key via setApiKey(), it gets automatically applied to transcription requests without requiring manual parameter passing.

Confidence score: 4/5

This change properly addresses the previous review feedback and implements the API key injection correctly
The implementation follows the established pattern used by other API functions in the codebase
The wrapper functions maintain backward compatibility while adding the needed functionality

_{1 file reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/lib/main.tsx (2)
740-741: Prefer direct alias for default context to avoid unnecessary wrapper

No behavior change; simpler and avoids re-creating a closure.
-  transcribeAudio: (file, options) => api.transcribeAudio(file, options)
+  transcribeAudio: api.transcribeAudio
644-673: Clarify auth precedence in docs

Add a note that a caller-supplied options.apiKey (when provided) takes precedence over context.apiKey, otherwise context.apiKey is used, else JWT.
  /**
   * Transcribes audio using the Whisper API
@@
   * - temperature: Sampling temperature between 0 and 1 (default: 0.0)
   * 
+  * Authentication precedence:
+  * - If options.apiKey is provided, it is used.
+  * - Else if context.apiKey is set, it is used.
+  * - Else the active JWT session is used.
+  *
   * Supported audio formats: MP3, WAV, MP4, M4A, FLAC, OGG, WEBM

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd195 and 05ed085.

📒 Files selected for processing (1)

src/lib/main.tsx (3 hunks)

🔇 Additional comments (2)

src/lib/main.tsx (2)

1135-1135: Nice: fetchModels correctly injects context apiKey

Wrapper aligns with the JWT-or-API-key behavior elsewhere. LGTM.

734-734: Default context fetchModels pass-through looks good

Passing undefined keeps behavior sane outside the provider. LGTM.

src/lib/main.tsx

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

src/lib/main.tsx Show resolved Hide resolved

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

src/lib/main.tsx Show resolved Hide resolved

AnthonyRonning merged commit ff44b64 into master Sep 13, 2025
7 of 9 checks passed

AnthonyRonning deleted the feat/whisper-transcription branch September 13, 2025 05:15

This was referenced Sep 13, 2025

refactor: simplify transcribeAudio to use individual parameters #50

Merged

test: add Whisper transcription tests with real MP3 file #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Whisper transcription API to TypeScript SDK #49

feat: add Whisper transcription API to TypeScript SDK #49

Uh oh!

AnthonyRonning commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add Whisper transcription API to TypeScript SDK #49

feat: add Whisper transcription API to TypeScript SDK #49

Uh oh!

Conversation

AnthonyRonning commented Sep 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

cloudflare-workers-and-pages bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying opensecret-sdk with Cloudflare Pages

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 5/5

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnthonyRonning commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Sep 13, 2025 •

edited

Loading