-
Notifications
You must be signed in to change notification settings - Fork 6
feat: add Whisper transcription API to TypeScript SDK #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add transcribeAudio function in api.ts with full encryption support - Export transcribeAudio from OpenSecret context in main.tsx - Support optional parameters: model, language, prompt, response_format, temperature - Add test that chains TTS → Whisper to verify functionality - Uses whisper-large-v3 model (routes to Tinfoil's whisper-large-v3-turbo) - Supports all common audio formats (MP3, WAV, MP4, M4A, FLAC, OGG, WEBM) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughAdds Whisper transcription support: new request/response types and a transcribeAudio function that base64-encodes audio and POSTs to /v1/audio/transcriptions using authenticated call (JWT or API key). Exposes transcribeAudio via OpenSecret context and adds an integration test generating TTS audio and asserting transcription. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor UI as Component
participant Ctx as OpenSecretContext
participant API as api.transcribeAudio
participant Auth as openAiAuthenticatedApiCall
participant OAI as POST /v1/audio/transcriptions
UI->>Ctx: transcribeAudio(file, options)
Ctx->>API: delegate (inject apiKey if present)
API->>API: encode file → base64\ninfer filename & content_type\nbuild WhisperTranscriptionRequest
alt apiKey provided
API->>Auth: call with apiKey override
else JWT flow
API->>Auth: call with JWT
end
Auth->>OAI: POST payload
OAI-->>Auth: { text }
Auth-->>API: { text }
API-->>Ctx: { text }
Ctx-->>UI: { text }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
Deploying opensecret-sdk with
|
| Latest commit: |
05ed085
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://45230d66.opensecret-sdk.pages.dev |
| Branch Preview URL: | https://feat-whisper-transcription.opensecret-sdk.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds Whisper-based audio transcription capabilities to the OpenSecret TypeScript SDK, expanding the existing AI integration features. The implementation introduces a new transcribeAudio function in api.ts that follows the established SDK patterns by using the openAiAuthenticatedApiCall helper for encrypted API communication, supporting both JWT token and API key authentication methods.
The transcription function converts audio files (File or Blob objects) to base64 format for secure transmission, similar to the existing document upload functionality. It supports all standard Whisper API parameters including model selection (defaulting to whisper-large-v3), language hints, prompts, response format configuration, and temperature controls. The function handles common audio formats including MP3, WAV, MP4, M4A, FLAC, OGG, and WEBM.
The functionality is properly integrated into the OpenSecret React context through main.tsx, where the transcribeAudio method is added to the OpenSecretContextType interface and exposed through the provider. This follows the same architectural pattern used for other SDK features like AI chat and TTS, ensuring consistency across the codebase.
A comprehensive integration test validates the feature through a TTS → Whisper transcription chain, generating audio using the Kokoro TTS model and then transcribing it back to verify the complete audio processing pipeline works correctly within the SDK's encrypted environment.
Confidence score: 5/5
- This PR is safe to merge with minimal risk as it follows established patterns and adds new functionality without modifying existing code paths
- Score reflects adherence to existing architectural patterns, comprehensive testing, and proper integration with the SDK's encryption and authentication systems
- No files require special attention as the implementation consistently follows the codebase's established conventions
3 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
src/lib/api.ts (4)
1494-1499: Docs claim defaults for response_format/temperature, but code doesn’t set themEither set the defaults in code or adjust the JSDoc to say “server defaults”. I suggest setting them explicitly for predictability.
Apply:
const requestData: WhisperTranscriptionRequest = { file: base64Data, filename, content_type: contentType, - model: options?.model || "whisper-large-v3", - ...(options?.language && { language: options.language }), - ...(options?.prompt && { prompt: options.prompt }), - ...(options?.response_format && { response_format: options.response_format }), - ...(options?.temperature !== undefined && { temperature: options.temperature }) + model: options?.model ?? "whisper-large-v3", + ...(options?.language && { language: options.language }), + ...(options?.prompt && { prompt: options.prompt }), + response_format: options?.response_format ?? "json", + temperature: options?.temperature ?? 0.0 };Also applies to: 1539-1548
1536-1538: Don’t default content_type to audio/mpeg—derive or fall back to octet-streamDefaulting to MP3 can be wrong for WAV/OGG/WEBM etc. Derive from filename if type is missing, else use application/octet-stream.
Apply:
- const filename = file instanceof File ? file.name : "audio"; - const contentType = file.type || "audio/mpeg"; + const filename = file instanceof File ? file.name : "audio"; + const contentType = + file.type || + (filename.toLowerCase().endsWith(".mp3") ? "audio/mpeg" : + filename.toLowerCase().endsWith(".wav") ? "audio/wav" : + filename.toLowerCase().endsWith(".mp4") ? "audio/mp4" : + filename.toLowerCase().endsWith(".m4a") ? "audio/m4a" : + filename.toLowerCase().endsWith(".flac") ? "audio/flac" : + filename.toLowerCase().endsWith(".ogg") ? "audio/ogg" : + filename.toLowerCase().endsWith(".webm") ? "audio/webm" : + "application/octet-stream");
1531-1534: Guardrails: add size limit and temperature bounds checkReading large blobs then base64-encoding doubles memory. Add a reasonable size cap (e.g., 25MB) and clamp temperature to [0,1] to prevent server-side errors.
Apply:
export async function transcribeAudio( file: File | Blob, options?: { @@ ): Promise<WhisperTranscriptionResponse> { + // Basic validation + const MAX_AUDIO_BYTES = 25 * 1024 * 1024; // 25MB + if (file.size > MAX_AUDIO_BYTES) { + throw new Error(`Audio exceeds ${Math.round(MAX_AUDIO_BYTES / (1024*1024))}MB limit`); + } + if (options?.temperature !== undefined) { + if (Number.isNaN(options.temperature)) throw new Error("temperature must be a number"); + const t = Math.min(1, Math.max(0, options.temperature)); + if (t !== options.temperature) { + options = { ...options, temperature: t }; + } + }
1465-1474: Stronger typing for response_formatConstrain response_format to known values to catch mistakes at compile time.
Apply:
+export type WhisperResponseFormat = "json" | "text" | "srt" | "verbose_json" | "vtt"; export type WhisperTranscriptionRequest = { file: string; // Base64-encoded audio file data filename: string; content_type: string; model: string; language?: string; prompt?: string; - response_format?: string; + response_format?: WhisperResponseFormat; temperature?: number; };src/lib/test/integration/ai.test.ts (2)
168-215: End-to-end TTS→Whisper test is solid; consider trimming logs and simplifying Blob/File creation
- The many console logs can add noise/flakiness in CI; consider gating behind an env var.
- Minor simplification: avoid Buffer; build Blob/File directly from ArrayBuffer.
Apply:
- const audioBuffer = Buffer.from(await ttsResponse.arrayBuffer()); - console.log(`Generated audio size: ${audioBuffer.length} bytes`); - - // Step 2: Create a Blob from the audio buffer - const audioBlob = new Blob([audioBuffer], { type: "audio/mpeg" }); + const ab = await ttsResponse.arrayBuffer(); + // console.debug(`Generated audio size: ${ab.byteLength} bytes`); + const audioBlob = new Blob([ab], { type: "audio/mpeg" }); const audioFile = new File([audioBlob], "tts_output.mp3", { type: "audio/mpeg" });
168-215: Consider an API-key-only path testContext claims API key can be used instead of JWT for /v1/*; add a test that clears tokens, sets an API key, and calls transcription to ensure parity with JWT flows.
I can draft an additional integration test that sets a temporary API key and verifies transcribeAudio without JWT. Want me to open a follow-up PR?
src/lib/main.tsx (1)
644-673: Context docs: note automatic API key usageThe provider currently claims “When set, this will be used instead of JWT for /v1/* endpoints.” Make it explicit that context.transcribeAudio uses the in-memory apiKey automatically (no need to pass options.apiKey), or implement that behavior (see next comment).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lib/api.ts(1 hunks)src/lib/main.tsx(3 hunks)src/lib/test/integration/ai.test.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/lib/test/integration/ai.test.ts (2)
src/lib/ai.ts (1)
createCustomFetch(9-201)src/lib/api.ts (1)
transcribeAudio(1519-1558)
src/lib/api.ts (1)
src/lib/encryptedApi.ts (1)
openAiAuthenticatedApiCall(80-110)
🔇 Additional comments (1)
src/lib/main.tsx (1)
740-742: Default context mapping LGTMExposes deleteApiKey and transcribeAudio; once the provider wiring above is applied, behavior will align with the API key docs.
Both functions now automatically use the context's apiKey when available: - fetchModels() now passes context.apiKey to api.fetchModels - transcribeAudio() merges context.apiKey with any provided options This ensures consistency with how API keys are handled across the SDK 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The developer has implemented the previously suggested fix to properly wire the context.apiKey into the transcribeAudio function calls. The changes modify both the default context and provider context in main.tsx to ensure consistent API key injection behavior.
The implementation creates wrapper functions for both fetchModels and transcribeAudio in the default context that pass undefined as the API key parameter, maintaining backward compatibility. In the provider context, these functions are wrapped to automatically inject the apiKey from the context state when available. For transcribeAudio, the wrapper spreads any existing options and adds the apiKey property, while for fetchModels, it simply passes the apiKey directly.
This change aligns the transcribeAudio function with the existing pattern used by other OpenAI-compatible endpoints in the SDK, ensuring that when users set an API key via setApiKey(), it gets automatically applied to transcription requests without requiring manual parameter passing.
Confidence score: 4/5
- This change properly addresses the previous review feedback and implements the API key injection correctly
- The implementation follows the established pattern used by other API functions in the codebase
- The wrapper functions maintain backward compatibility while adding the needed functionality
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/lib/main.tsx (2)
740-741: Prefer direct alias for default context to avoid unnecessary wrapperNo behavior change; simpler and avoids re-creating a closure.
- transcribeAudio: (file, options) => api.transcribeAudio(file, options) + transcribeAudio: api.transcribeAudio
644-673: Clarify auth precedence in docsAdd a note that a caller-supplied
options.apiKey(when provided) takes precedence overcontext.apiKey, otherwisecontext.apiKeyis used, else JWT./** * Transcribes audio using the Whisper API @@ * - temperature: Sampling temperature between 0 and 1 (default: 0.0) * + * Authentication precedence: + * - If options.apiKey is provided, it is used. + * - Else if context.apiKey is set, it is used. + * - Else the active JWT session is used. + * * Supported audio formats: MP3, WAV, MP4, M4A, FLAC, OGG, WEBM
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lib/main.tsx(3 hunks)
🔇 Additional comments (2)
src/lib/main.tsx (2)
1135-1135: Nice: fetchModels correctly injects context apiKeyWrapper aligns with the JWT-or-API-key behavior elsewhere. LGTM.
734-734: Default context fetchModels pass-through looks goodPassing
undefinedkeeps behavior sane outside the provider. LGTM.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests