Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 74 additions & 74 deletions fern/pages/02-speech-to-text/universal-streaming/multilingual.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,80 +17,6 @@ To utilize multilingual streaming, you need to include `"speech_model":"universa

Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese.

## Language detection

The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn.

### Configuration

To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL:

```
wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true
```

### Output format

When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields:

- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French)
- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection

<Note>
The `language_code` and `language_confidence` fields only appear when either:
- The `utterance` field is non-empty and contains a complete utterance - The
`end_of_turn` field is `true`
</Note>

### Example response

Here's an example Turn message with language detection enabled, showing Spanish being detected:

```json
{
"turn_order": 1,
"turn_is_formatted": false,
"end_of_turn": false,
"transcript": "Buenos",
"end_of_turn_confidence": 0.991195,
"words": [
{
"start": 29920,
"end": 30080,
"text": "Buenos",
"confidence": 0.979445,
"word_is_final": true
},
{
"start": 30320,
"end": 30400,
"text": "días",
"confidence": 0.774696,
"word_is_final": false
}
],
"utterance": "Buenos días.",
"language_code": "es",
"language_confidence": 0.999997,
"type": "Turn"
}
```

In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`.

## Understanding formatting

The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing.

<Note>
While the API still returns the `turn_is_formatted` parameter to maintain
interface consistency with other streaming models, the multilingual model
doesn't perform additional formatting operations. All transcripts from the
multilingual model are already formatted as they're generated.
</Note>

In the future, this built-in formatting capability will be extended to our English-only streaming model as well.

## Quickstart

<Tabs>
Expand Down Expand Up @@ -738,3 +664,77 @@ run();
</Tab>

</Tabs>

## Language detection

The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn.

### Configuration

To enable language detection, include `language_detection=true` as a query parameter in the WebSocket URL:

```
wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true
```

### Output format

When language detection is enabled, each Turn message (with either a **complete utterance** or `end_of_turn: true`) will include two additional fields:

- `language_code`: The language code of the detected language (e.g., `"es"` for Spanish, `"fr"` for French)
- `language_confidence`: A confidence score between 0 and 1 indicating how confident the model is in the language detection

<Note>
The `language_code` and `language_confidence` fields only appear when either:
- The `utterance` field is non-empty and contains a complete utterance - The
`end_of_turn` field is `true`
</Note>

### Example response

Here's an example Turn message with language detection enabled, showing Spanish being detected:

```json
{
"turn_order": 1,
"turn_is_formatted": false,
"end_of_turn": false,
"transcript": "Buenos",
"end_of_turn_confidence": 0.991195,
"words": [
{
"start": 29920,
"end": 30080,
"text": "Buenos",
"confidence": 0.979445,
"word_is_final": true
},
{
"start": 30320,
"end": 30400,
"text": "días",
"confidence": 0.774696,
"word_is_final": false
}
],
"utterance": "Buenos días.",
"language_code": "es",
"language_confidence": 0.999997,
"type": "Turn"
}
```

In this example, the model detected Spanish (`"es"`) with a confidence of `0.999997`.

## Understanding formatting

The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you'll receive properly formatted text without requiring any additional post-processing.

<Note>
While the API still returns the `turn_is_formatted` parameter to maintain
interface consistency with other streaming models, the multilingual model
doesn't perform additional formatting operations. All transcripts from the
multilingual model are already formatted as they're generated.
</Note>

In the future, this built-in formatting capability will be extended to our English-only streaming model as well.
Loading