From 2b4fec3c8f2a943dec2d336055b1229b37efc9c8 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Wed, 28 Jan 2026 01:51:28 +0000 Subject: [PATCH 1/2] Move Quickstart section to top of multilingual streaming page Co-Authored-By: Dan Ince --- .../universal-streaming/multilingual.mdx | 164 +++++++++--------- 1 file changed, 82 insertions(+), 82 deletions(-) diff --git a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx index f2a2b4ed..0aea2a9c 100644 --- a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx +++ b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx @@ -9,88 +9,6 @@ description: "Transcribe audio in multiple languages" Multilingual streaming allows you to transcribe audio streams in multiple languages. -## Configuration - -To utilize multilingual streaming, you need to include `"speech_model":"universal-streaming-multilingual"` as a query parameter in the WebSocket URL. - -## Supported languages - -Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese. - -## Language detection - -The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn. - -### Configuration - -To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL: - -``` -wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true -``` - -### Output format - -When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields: - -- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French) -- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection - - - The `language_code` and `language_confidence` fields only appear when either: - - The `utterance` field is non-empty and contains a complete utterance - The - `end_of_turn` field is `true` - - -### Example response - -Here's an example Turn message with language detection enabled, showing Spanish being detected: - -```json -{ - "turn_order": 1, - "turn_is_formatted": false, - "end_of_turn": false, - "transcript": "Buenos", - "end_of_turn_confidence": 0.991195, - "words": [ - { - "start": 29920, - "end": 30080, - "text": "Buenos", - "confidence": 0.979445, - "word_is_final": true - }, - { - "start": 30320, - "end": 30400, - "text": "días", - "confidence": 0.774696, - "word_is_final": false - } - ], - "utterance": "Buenos días.", - "language_code": "es", - "language_confidence": 0.999997, - "type": "Turn" -} -``` - -In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`. - -## Understanding formatting - -The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing. - - - While the API still returns the `turn_is_formatted` parameter to maintain - interface consistency with other streaming models, the multilingual model - doesn't perform additional formatting operations. All transcripts from the - multilingual model are already formatted as they're generated. - - -In the future, this built-in formatting capability will be extended to our English-only streaming model as well. - ## Quickstart @@ -738,3 +656,85 @@ run(); + +## Configuration + +To utilize multilingual streaming, you need to include `"speech_model":"universal-streaming-multilingual"` as a query parameter in the WebSocket URL. + +## Supported languages + +Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese. + +## Language detection + +The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn. + +### Configuration + +To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL: + +``` +wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true +``` + +### Output format + +When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields: + +- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French) +- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection + + + The `language_code` and `language_confidence` fields only appear when either: + - The `utterance` field is non-empty and contains a complete utterance - The + `end_of_turn` field is `true` + + +### Example response + +Here's an example Turn message with language detection enabled, showing Spanish being detected: + +```json +{ + "turn_order": 1, + "turn_is_formatted": false, + "end_of_turn": false, + "transcript": "Buenos", + "end_of_turn_confidence": 0.991195, + "words": [ + { + "start": 29920, + "end": 30080, + "text": "Buenos", + "confidence": 0.979445, + "word_is_final": true + }, + { + "start": 30320, + "end": 30400, + "text": "días", + "confidence": 0.774696, + "word_is_final": false + } + ], + "utterance": "Buenos días.", + "language_code": "es", + "language_confidence": 0.999997, + "type": "Turn" +} +``` + +In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`. + +## Understanding formatting + +The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing. + + + While the API still returns the `turn_is_formatted` parameter to maintain + interface consistency with other streaming models, the multilingual model + doesn't perform additional formatting operations. All transcripts from the + multilingual model are already formatted as they're generated. + + +In the future, this built-in formatting capability will be extended to our English-only streaming model as well. From 8635f32132e858d9dd42f0075412b157b9a54fe5 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Wed, 28 Jan 2026 01:59:49 +0000 Subject: [PATCH 2/2] Move Configuration section above Quickstart Co-Authored-By: Dan Ince --- .../universal-streaming/multilingual.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx index 0aea2a9c..59980bbd 100644 --- a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx +++ b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx @@ -9,6 +9,14 @@ description: "Transcribe audio in multiple languages" Multilingual streaming allows you to transcribe audio streams in multiple languages. +## Configuration + +To utilize multilingual streaming, you need to include `"speech_model":"universal-streaming-multilingual"` as a query parameter in the WebSocket URL. + +## Supported languages + +Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese. + ## Quickstart @@ -657,14 +665,6 @@ run(); -## Configuration - -To utilize multilingual streaming, you need to include `"speech_model":"universal-streaming-multilingual"` as a query parameter in the WebSocket URL. - -## Supported languages - -Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese. - ## Language detection The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn.