diff --git a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx index f2a2b4ed..59980bbd 100644 --- a/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx +++ b/fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx @@ -17,80 +17,6 @@ To utilize multilingual streaming, you need to include `"speech_model":"universa Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese. -## Language detection - -The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn. - -### Configuration - -To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL: - -``` -wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true -``` - -### Output format - -When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields: - -- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French) -- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection - - - The `language_code` and `language_confidence` fields only appear when either: - - The `utterance` field is non-empty and contains a complete utterance - The - `end_of_turn` field is `true` - - -### Example response - -Here's an example Turn message with language detection enabled, showing Spanish being detected: - -```json -{ - "turn_order": 1, - "turn_is_formatted": false, - "end_of_turn": false, - "transcript": "Buenos", - "end_of_turn_confidence": 0.991195, - "words": [ - { - "start": 29920, - "end": 30080, - "text": "Buenos", - "confidence": 0.979445, - "word_is_final": true - }, - { - "start": 30320, - "end": 30400, - "text": "días", - "confidence": 0.774696, - "word_is_final": false - } - ], - "utterance": "Buenos días.", - "language_code": "es", - "language_confidence": 0.999997, - "type": "Turn" -} -``` - -In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`. - -## Understanding formatting - -The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing. - - - While the API still returns the `turn_is_formatted` parameter to maintain - interface consistency with other streaming models, the multilingual model - doesn't perform additional formatting operations. All transcripts from the - multilingual model are already formatted as they're generated. - - -In the future, this built-in formatting capability will be extended to our English-only streaming model as well. - ## Quickstart @@ -738,3 +664,77 @@ run(); + +## Language detection + +The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn. + +### Configuration + +To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL: + +``` +wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true +``` + +### Output format + +When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields: + +- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French) +- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection + + + The `language_code` and `language_confidence` fields only appear when either: + - The `utterance` field is non-empty and contains a complete utterance - The + `end_of_turn` field is `true` + + +### Example response + +Here's an example Turn message with language detection enabled, showing Spanish being detected: + +```json +{ + "turn_order": 1, + "turn_is_formatted": false, + "end_of_turn": false, + "transcript": "Buenos", + "end_of_turn_confidence": 0.991195, + "words": [ + { + "start": 29920, + "end": 30080, + "text": "Buenos", + "confidence": 0.979445, + "word_is_final": true + }, + { + "start": 30320, + "end": 30400, + "text": "días", + "confidence": 0.774696, + "word_is_final": false + } + ], + "utterance": "Buenos días.", + "language_code": "es", + "language_confidence": 0.999997, + "type": "Turn" +} +``` + +In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`. + +## Understanding formatting + +The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing. + + + While the API still returns the `turn_is_formatted` parameter to maintain + interface consistency with other streaming models, the multilingual model + doesn't perform additional formatting operations. All transcripts from the + multilingual model are already formatted as they're generated. + + +In the future, this built-in formatting capability will be extended to our English-only streaming model as well.