-
-
Notifications
You must be signed in to change notification settings - Fork 1
feat:Add strict language_detection_options to TranscriptOptionalParams #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new strict language_detection_options object to TranscriptOptionalParams in the OpenAPI spec, introducing expected_languages (array of strings) and fallback_language (string, default "auto"), without modifying other existing fields. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as Transcription API
participant LD as Language Detector
participant Model as Transcription Model
rect rgb(235, 245, 255)
note right of Client: Request includes language_detection_options
Client->>API: Create transcript (audio, language_detection_options)
API->>LD: Detect language(s)
alt Detected in expected_languages
LD-->>API: detected_language
else Not in expected_languages
note right of LD: Use fallback_language (or "auto")
LD-->>API: fallback_language
end
API->>Model: Transcribe with chosen language
Model-->>API: Transcript result
API-->>Client: Transcript response
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/libs/AssemblyAI/openapi.yaml (2)
1256-1261: Document interplay with language_detection and precedence vs language_code.Clarify whether language_detection_options is ignored unless language_detection=true, and what happens if language_code is also set. Consider adding an if/then constraint in 3.1 (JSON Schema) to enforce valid combinations.
Do you want me to add an if/then schema so options are only allowed when language_detection is true?
1256-1275: Optional: add an example under TranscriptOptionalParams showing language_detection_options.Helps SDKs and users.
I can add a minimal example using expected_languages: ["en_us","es"] and fallback_language: "auto".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (4)
src/libs/AssemblyAI/Generated/AssemblyAI.JsonSerializerContextTypes.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParams.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.Json.g.csis excluded by!**/generated/**src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.g.csis excluded by!**/generated/**
📒 Files selected for processing (1)
src/libs/AssemblyAI/openapi.yaml(1 hunks)
| expected_languages: | ||
| x-label: Minimum speakers expected | ||
| description: List of languages expected in the audio file. | ||
| type: array | ||
| objects: | ||
| x-label: language | ||
| type: string | ||
| fallback_language: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix invalid OpenAPI: use items (not objects) and correct label/schema for expected_languages.
Array schemas must use items. Current use of objects will break validation/SDK generation. Also the x-label is incorrect.
Apply:
- expected_languages:
- x-label: Minimum speakers expected
- description: List of languages expected in the audio file.
- type: array
- objects:
- x-label: language
- type: string
+ expected_languages:
+ x-label: Expected languages
+ description: List of languages expected in the audio file.
+ type: array
+ items:
+ anyOf:
+ - $ref: "#/components/schemas/TranscriptLanguageCode"
+ - type: string
+ x-ts-type: LiteralUnion<TranscriptLanguageCode, string>
+ minItems: 1
+ uniqueItems: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| expected_languages: | |
| x-label: Minimum speakers expected | |
| description: List of languages expected in the audio file. | |
| type: array | |
| objects: | |
| x-label: language | |
| type: string | |
| fallback_language: | |
| expected_languages: | |
| x-label: Expected languages | |
| description: List of languages expected in the audio file. | |
| type: array | |
| items: | |
| anyOf: | |
| - $ref: "#/components/schemas/TranscriptLanguageCode" | |
| - type: string | |
| x-ts-type: LiteralUnion<TranscriptLanguageCode, string> | |
| minItems: 1 | |
| uniqueItems: true |
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1262 to 1269, the
expected_languages array schema is invalid: it uses "objects" and misplaces
x-label. Replace "objects" with "items", move the per-item x-label (if needed)
under items, and ensure items:type is string; keep the array-level
x-label/description/type as-is. Concretely, define expected_languages with type:
array, an optional x-label for the array, description, and an items block with
type: string and any per-item x-label/schema.
| x-label: Fallback language | ||
| description: | | ||
| If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score. | ||
| type: string | ||
| default: "auto" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align fallback_language description with type and constrain values.
Doc says specify ["auto"] (array) but the field is a string. Use "auto" (no brackets) and constrain to either a language code or the literal "auto".
- fallback_language:
- x-label: Fallback language
- description: |
- If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
- type: string
- default: "auto"
+ fallback_language:
+ x-label: Fallback language
+ description: |
+ If the detected language is not in `expected_languages`, this value is used. Set to "auto" to choose the highest-confidence language from `expected_languages`. Requires `expected_languages` to be non-empty when "auto" is used.
+ anyOf:
+ - $ref: "#/components/schemas/TranscriptLanguageCode"
+ - type: string
+ enum: ["auto"]
+ default: "auto"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| x-label: Fallback language | |
| description: | | |
| If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score. | |
| type: string | |
| default: "auto" | |
| fallback_language: | |
| x-label: Fallback language | |
| description: | | |
| If the detected language is not in `expected_languages`, this value is used. Set to "auto" to choose the highest-confidence language from `expected_languages`. Requires `expected_languages` to be non-empty when "auto" is used. | |
| anyOf: | |
| - $ref: "#/components/schemas/TranscriptLanguageCode" | |
| - type: string | |
| enum: ["auto"] | |
| default: "auto" |
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1270-1275, the description
incorrectly refers to ["auto"] (an array) while the field is defined as a
string; update the description to say use "auto" (a string) and then add a
constraint so the value must be either the literal "auto" or a valid language
code (e.g., BCP-47); implement that constraint via an enum of allowed literals
or a regex/pattern that permits "auto" or language codes and adjust the
description to reflect the allowed values.
Summary by CodeRabbit