Skip to content

Conversation

@HavenDV
Copy link
Contributor

@HavenDV HavenDV commented Aug 27, 2025

Summary by CodeRabbit

  • New Features
    • Added configurable Automatic Language Detection options for transcripts.
    • Specify expected languages to improve detection accuracy.
    • Choose a fallback language when the detected language isn’t in the expected list; defaults to auto-selecting the highest-confidence language.
    • Backward-compatible: existing language detection behavior remains unchanged if options are not set.

@coderabbitai
Copy link

coderabbitai bot commented Aug 27, 2025

Walkthrough

Adds language_detection_options to TranscriptOptionalParams in src/libs/AssemblyAI/openapi.yaml, introducing expected_languages (array of strings) and fallback_language (string, default "auto"), with additionalProperties: false. Existing language_detection boolean remains unchanged.

Changes

Cohort / File(s) Change Summary
AssemblyAI OpenAPI schema
src/libs/AssemblyAI/openapi.yaml
Added TranscriptOptionalParams.language_detection_options object with constrained schema: expected_languages:string[] and fallback_language:string (default "auto"); no change to existing language_detection field.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant API as Transcription API
    participant LD as Language Detector
    participant ASR as ASR Engine

    Note over C,API: New optional payload: language_detection_options { expected_languages[], fallback_language }

    C->>API: Create transcript (language_detection, language_detection_options)
    API->>LD: Detect language (expected_languages, fallback_language)
    alt Detected in expected_languages
        LD-->>API: Detected language (in-list)
    else Not in expected_languages
        LD-->>API: Use fallback_language (or "auto")
    end
    API->>ASR: Transcribe with selected language
    ASR-->>API: Transcription result
    API-->>C: Transcript response
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I wiggle my ears at options anew,
Expected tongues in a tidy queue—
If voices wander, no need to stew,
A fallback hop will carry us through.
With "auto" winds and clouds of clue,
The transcript fields bloom fresh with dew. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bot/update-openapi_202508270125

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@HavenDV HavenDV enabled auto-merge (squash) August 27, 2025 01:26
@coderabbitai coderabbitai bot changed the title feat:@coderabbitai feat:Add language_detection_options to TranscriptOptionalParams Aug 27, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/libs/AssemblyAI/openapi.yaml (3)

1257-1258: Tighten label wording

The x-label reads like a sentence. Use a concise label for consistency with neighboring fields.

-          x-label: Specify options for Automatic Language Detection.
+          x-label: Language detection options

1256-1275: Add examples and clarify precedence with language_detection

  • Add an example showing language_detection: true with language_detection_options populated.
  • Clarify how language_code, language_detection, and language_detection_options interact (e.g., precedence if both language_code and detection are provided).

I can draft example payloads and a short precedence note if helpful.


1256-1275: Optional: add schema-level guards (OpenAPI 3.1 JSON Schema if/then)

To prevent misuse:

  • If language_detection is false, disallow language_detection_options.
  • If language_detection is true and expected_languages is empty, consider requiring either a non-empty expected_languages or a non-"auto" fallback_language.

Here’s a snippet to add under TranscriptOptionalParams (outside this hunk):

# at the same level as properties:
allOf:
  - if:
      properties:
        language_detection:
          const: false
    then:
      not:
        required: [language_detection_options]
  # optionally require expected_languages when auto fallback is used
  # (tune to product behavior)
  - if:
      properties:
        language_detection:
          const: true
        language_detection_options:
          properties:
            fallback_language:
              enum: ["auto"]
    then:
      properties:
        language_detection_options:
          required: [expected_languages]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c531b33 and 179c912.

⛔ Files ignored due to path filters (4)
  • src/libs/AssemblyAI/Generated/AssemblyAI.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParams.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.Json.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.g.cs is excluded by !**/generated/**
📒 Files selected for processing (1)
  • src/libs/AssemblyAI/openapi.yaml (1 hunks)

Comment on lines +1262 to +1268
expected_languages:
x-label: Minimum speakers expected
description: List of languages expected in the audio file.
type: array
objects:
x-label: language
type: string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix invalid OpenAPI keyword and align types with TranscriptLanguageCode

  • Replace non-standard "objects" with "items" (breaks validation/codegen).
  • Correct mislabeled x-label and item typing; allow either known codes or custom strings like language_code does.
             expected_languages:
-              x-label: Minimum speakers expected
-              description: List of languages expected in the audio file.
-              type: array
-              objects:
-                x-label: language
-                type: string
+              x-label: Expected languages
+              description: List of expected language codes (see TranscriptLanguageCode).
+              type: array
+              items:
+                x-label: Language
+                anyOf:
+                  - $ref: "#/components/schemas/TranscriptLanguageCode"
+                  - type: string
+              x-ts-type: Array<LiteralUnion<TranscriptLanguageCode, string>>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expected_languages:
x-label: Minimum speakers expected
description: List of languages expected in the audio file.
type: array
objects:
x-label: language
type: string
expected_languages:
x-label: Expected languages
description: List of expected language codes (see TranscriptLanguageCode).
type: array
items:
x-label: Language
anyOf:
- $ref: "#/components/schemas/TranscriptLanguageCode"
- type: string
x-ts-type: Array<LiteralUnion<TranscriptLanguageCode, string>>
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1262 to 1268, the schema uses
the non-standard keyword "objects" and has a mislabeled x-label/type; replace
"objects" with "items", change the x-label to "language" (or remove if
redundant), and make items a string type that matches the TranscriptLanguageCode
behavior (i.e., allow known language codes but accept custom strings — implement
as type: string with the same enum/ref as TranscriptLanguageCode if available or
leave unconstrained string to allow custom codes). Ensure description remains
"List of languages expected in the audio file."

Comment on lines +1270 to +1275
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Make fallback_language schema consistent; fix doc that suggests an array

  • The description instructs to set ["auto"] (array) but the field is a string. Use "auto".
  • Also model the type as union of "auto" or a language code (or custom string), mirroring language_code.
             fallback_language:
               x-label: Fallback language
-              description: |
-                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
-              type: string
-              default: "auto"
+              description: |
+                If the detected language is not in `expected_languages`, use this fallback.
+                Set to `"auto"` to let the model choose the fallback from `expected_languages` with the highest confidence.
+              anyOf:
+                - enum: ["auto"]
+                - $ref: "#/components/schemas/TranscriptLanguageCode"
+                - type: string
+              default: "auto"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"
fallback_language:
x-label: Fallback language
description: |
If the detected language is not in `expected_languages`, use this fallback.
Set to `"auto"` to let the model choose the fallback from `expected_languages` with the highest confidence.
anyOf:
- enum: ["auto"]
- $ref: "#/components/schemas/TranscriptLanguageCode"
- type: string
default: "auto"
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1270 to 1275, the
fallback_language entry is inconsistent: the description suggests an array
["auto"] while the schema types as string; update the description to reference
"auto" (not ["auto"]) and change the schema to mirror language_code by modeling
a union that allows the literal "auto" or a language code/custom string (e.g.,
enum/oneOf or pattern-based string) so the field accepts either the special
"auto" token or a language identifier; ensure the default remains "auto" and
update any wording to clarify behavior when "auto" is used.

@HavenDV HavenDV closed this Aug 27, 2025
auto-merge was automatically disabled August 27, 2025 11:19

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants