-
-
Notifications
You must be signed in to change notification settings - Fork 1
feat:Add language_detection_options to TranscriptOptionalParams #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds language_detection_options to TranscriptOptionalParams in src/libs/AssemblyAI/openapi.yaml, introducing expected_languages (array of strings) and fallback_language (string, default "auto"), with additionalProperties: false. Existing language_detection boolean remains unchanged. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant API as Transcription API
participant LD as Language Detector
participant ASR as ASR Engine
Note over C,API: New optional payload: language_detection_options { expected_languages[], fallback_language }
C->>API: Create transcript (language_detection, language_detection_options)
API->>LD: Detect language (expected_languages, fallback_language)
alt Detected in expected_languages
LD-->>API: Detected language (in-list)
else Not in expected_languages
LD-->>API: Use fallback_language (or "auto")
end
API->>ASR: Transcribe with selected language
ASR-->>API: Transcription result
API-->>C: Transcript response
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/libs/AssemblyAI/openapi.yaml (3)
1257-1258: Tighten label wordingThe x-label reads like a sentence. Use a concise label for consistency with neighboring fields.
- x-label: Specify options for Automatic Language Detection. + x-label: Language detection options
1256-1275: Add examples and clarify precedence with language_detection
- Add an example showing
language_detection: truewithlanguage_detection_optionspopulated.- Clarify how
language_code,language_detection, andlanguage_detection_optionsinteract (e.g., precedence if bothlanguage_codeand detection are provided).I can draft example payloads and a short precedence note if helpful.
1256-1275: Optional: add schema-level guards (OpenAPI 3.1 JSON Schema if/then)To prevent misuse:
- If
language_detectionis false, disallowlanguage_detection_options.- If
language_detectionis true andexpected_languagesis empty, consider requiring either a non-emptyexpected_languagesor a non-"auto"fallback_language.Here’s a snippet to add under
TranscriptOptionalParams(outside this hunk):# at the same level as properties: allOf: - if: properties: language_detection: const: false then: not: required: [language_detection_options] # optionally require expected_languages when auto fallback is used # (tune to product behavior) - if: properties: language_detection: const: true language_detection_options: properties: fallback_language: enum: ["auto"] then: properties: language_detection_options: required: [expected_languages]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (4)
src/libs/AssemblyAI/Generated/AssemblyAI.JsonSerializerContextTypes.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParams.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.Json.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.g.csis excluded by!**/generated/**
📒 Files selected for processing (1)
src/libs/AssemblyAI/openapi.yaml(1 hunks)
| expected_languages: | ||
| x-label: Minimum speakers expected | ||
| description: List of languages expected in the audio file. | ||
| type: array | ||
| objects: | ||
| x-label: language | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix invalid OpenAPI keyword and align types with TranscriptLanguageCode
- Replace non-standard "objects" with "items" (breaks validation/codegen).
- Correct mislabeled x-label and item typing; allow either known codes or custom strings like
language_codedoes.
expected_languages:
- x-label: Minimum speakers expected
- description: List of languages expected in the audio file.
- type: array
- objects:
- x-label: language
- type: string
+ x-label: Expected languages
+ description: List of expected language codes (see TranscriptLanguageCode).
+ type: array
+ items:
+ x-label: Language
+ anyOf:
+ - $ref: "#/components/schemas/TranscriptLanguageCode"
+ - type: string
+ x-ts-type: Array<LiteralUnion<TranscriptLanguageCode, string>>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| expected_languages: | |
| x-label: Minimum speakers expected | |
| description: List of languages expected in the audio file. | |
| type: array | |
| objects: | |
| x-label: language | |
| type: string | |
| expected_languages: | |
| x-label: Expected languages | |
| description: List of expected language codes (see TranscriptLanguageCode). | |
| type: array | |
| items: | |
| x-label: Language | |
| anyOf: | |
| - $ref: "#/components/schemas/TranscriptLanguageCode" | |
| - type: string | |
| x-ts-type: Array<LiteralUnion<TranscriptLanguageCode, string>> |
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1262 to 1268, the schema uses
the non-standard keyword "objects" and has a mislabeled x-label/type; replace
"objects" with "items", change the x-label to "language" (or remove if
redundant), and make items a string type that matches the TranscriptLanguageCode
behavior (i.e., allow known language codes but accept custom strings — implement
as type: string with the same enum/ref as TranscriptLanguageCode if available or
leave unconstrained string to allow custom codes). Ensure description remains
"List of languages expected in the audio file."
| x-label: Fallback language | ||
| description: | | ||
| If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score. | ||
| type: string | ||
| default: "auto" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make fallback_language schema consistent; fix doc that suggests an array
- The description instructs to set ["auto"] (array) but the field is a string. Use "auto".
- Also model the type as union of "auto" or a language code (or custom string), mirroring
language_code.
fallback_language:
x-label: Fallback language
- description: |
- If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
- type: string
- default: "auto"
+ description: |
+ If the detected language is not in `expected_languages`, use this fallback.
+ Set to `"auto"` to let the model choose the fallback from `expected_languages` with the highest confidence.
+ anyOf:
+ - enum: ["auto"]
+ - $ref: "#/components/schemas/TranscriptLanguageCode"
+ - type: string
+ default: "auto"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| x-label: Fallback language | |
| description: | | |
| If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score. | |
| type: string | |
| default: "auto" | |
| fallback_language: | |
| x-label: Fallback language | |
| description: | | |
| If the detected language is not in `expected_languages`, use this fallback. | |
| Set to `"auto"` to let the model choose the fallback from `expected_languages` with the highest confidence. | |
| anyOf: | |
| - enum: ["auto"] | |
| - $ref: "#/components/schemas/TranscriptLanguageCode" | |
| - type: string | |
| default: "auto" |
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1270 to 1275, the
fallback_language entry is inconsistent: the description suggests an array
["auto"] while the schema types as string; update the description to reference
"auto" (not ["auto"]) and change the schema to mirror language_code by modeling
a union that allows the literal "auto" or a language code/custom string (e.g.,
enum/oneOf or pattern-based string) so the field accepts either the special
"auto" token or a language identifier; ensure the default remains "auto" and
update any wording to clarify behavior when "auto" is used.
Pull request was closed
Summary by CodeRabbit