Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/deployments/container/cpu-speech-to-text.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ The parameters are:

- `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text)

- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/#operating-points) you want to prewarm
- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/languages#operating-points) you want to prewarm

- `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start.

Expand Down
2 changes: 1 addition & 1 deletion docs/deployments/container/gpu-speech-to-text.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Once the GPU Server is running, follow the [Instructions for Linking a CPU Conta

### Running only one operating point

[Operating Points](/speech-to-text/#operating-points-1) represent different levels of model complexity.
[Operating Points](/speech-to-text/languages#operating-points) represent different levels of model complexity.
To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the
`SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`.

Expand Down
4 changes: 2 additions & 2 deletions docs/deployments/container/gpu-translation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,6 @@ If one or more of the target languages are not supported for the source language
}
```

Please note, this behaviour is different when using our [SaaS Deployment](/speech-to-text/features/translation#unsupported-target-language).
Please note, this behaviour is different when using our SaaS Deployment.

For all other errors, please see [documentation here](/speech-to-text/features/translation#batch-error-responses)
For all other errors, please see our documentation.
2 changes: 1 addition & 1 deletion docs/deployments/kubernetes/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ Using Helm, customers can customize deployments through configurable values, man

Speechmatics Kubernetes deployment supports the following applications:
- [Realtime](/speech-to-text/realtime/quickstart): Stream audio from an input device or file and receive real-time transcription updates as audio is processed.
- [Voice Agent – Flow](/voice-agents-flow): A Voice Agent API that enables responsive, real-time speech-to-speech interactions in your applications.
- [Voice Agent – Flow](/voice-agents/flow): A Voice Agent API that enables responsive, real-time speech-to-speech interactions in your applications.
2 changes: 1 addition & 1 deletion docs/speech-to-text/batch/input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import batchSchema from "!openapi-schema-loader!@site/spec/batch.yaml";
:::info
This page documents audio inputs for transcription by **REST API** (a.k.a. Batch SaaS).
* For Realtime transcription, see the [Realtime Transcription input](/speech-to-text/realtime/input).
* For Flow Voice AI, see the [Flow Voice AI supported formats and limits](/voice-agents-flow/supported-formats-and-limits).
* For Flow Voice AI, see the [Flow Voice AI supported formats and limits](/voice-agents/flow/supported-formats-and-limits).
:::

## Supported file types
Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/batch/language-identification.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ This error is available when checking the [job details](//api-ref/batch/get-job-

### Errors when used with translation

It is not possible to translate between all language pairs. When `auto` language is used, this can mean some translation target languages will not be available. See the full list of [Supported Language Pairs](/speech-to-text/features/translation#supported-translation-pairs).
It is not possible to translate between all language pairs. When `auto` language is used, this can mean some translation target languages will not be available. See the full list of [Supported Language Pairs](/speech-to-text/features/translation#languages).

These errors are available when getting the [job transcript](/api-ref/batch/get-the-transcript-for-a-transcription-job):

Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/features/audio-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,6 @@ To obtain volume labelling without filtering any audio, supply an empty config o

Once the audio is in a raw format (16kHz 16bit mono), it is split into 0.01s chunks. For each chunk, the root mean square amplitude of the signal is calculated, and scaled to the range `0 - 100`. If the volume is less than the supplied cut-off, the chunk will be replaced with silence.

To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [Enhanced Operating Point](/speech-to-text/#operating-points-1), which is more robust against inadvertent damage to the audio.
To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [Enhanced Operating Points](/speech-to-text/languages#operating-points), which is more robust against inadvertent damage to the audio.

The word volume calculation takes the start and end times of words, and applies a weighted average of the volumes of each audio chunk which make up the word. The weighting attempts to ignore areas of silence within long words, and provide a better match with the volume classification a human listener would make.
3 changes: 1 addition & 2 deletions docs/speech-to-text/features/feature-discovery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,5 @@ The feature discovery endpoint will include an object with the following propert
- `languages` - Includes a list of supported ISO language codes
- `locales` - Includes any languages with a supported [Output Locale](/speech-to-text/formatting#output-locale)
- `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text)
- `translation` - Includes all supported [translation pairs](/speech-to-text/features/translation#supported-translation-pairs)
- `translation` - Includes all [supported translation pairs](/speech-to-text/features/translation#languages)
- `languageid` - List of languages supported by [Language Identification](/speech-to-text/batch/language-identification)

2 changes: 1 addition & 1 deletion docs/speech-to-text/features/translation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ You can configure up to five translation languages at a time.

## Batch output

The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same [ISO language codes](/speech-to-text/languages#languages) as for transcription).
The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same [ISO language codes](/speech-to-text/languages) as for transcription).

<SchemaNode schema={transcriptResponseSchema.definitions.RetrieveTranscriptResponse} />

Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/formatting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ This configuration:
The `sensitivity` parameter accepts values from 0 to 1. Higher values produce more punctuation in the output.

:::warning
Disabling punctuation may slightly reduce speaker diarization accuracy. See the [speaker diarization and punctuation](/speech-to-text/features/diarization#speaker-diarization-and-punctuation) section for details.
Disabling punctuation may slightly reduce speaker diarization accuracy. See the [speaker diarization and punctuation](/speech-to-text/features/diarization#diarization-and-punctuation) section for details.
:::

## Next steps
Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/realtime/input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import realtimeSchema from "!asyncapi-schema-loader!@site/spec/realtime.yaml"
:::info
This page is about the **Real-time transcription API** (websocket).
* For information on Batch SaaS, see the [Batch SaaS input](/speech-to-text/batch/input).
* For information on Flow Voice AI, see the [Flow Voice AI input](/voice-agents-flow/supported-formats-and-limits).
* For information on Flow Voice AI, see the [Flow Voice AI input](/voice-agents/flow/supported-formats-and-limits).
:::

## Supported input audio formats
Expand Down
49 changes: 0 additions & 49 deletions docs/voice-agents-flow/sidebar.ts

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ For more details, refer to [StartConversation API reference](/api-ref/flow-voice

### Function calling

[Function Calling](/voice-agents-flow/features/function-calling) allows you to connect Flow to external tools and systems. This unlocks Flow's ability to act in the real-world and better serve the needs of your users.
[Function Calling](/voice-agents/flow/features/function-calling) allows you to connect Flow to external tools and systems. This unlocks Flow's ability to act in the real-world and better serve the needs of your users.

This could involve needing real-time information such as opening/closing times or validation services for authentication or action APIs that control a fast food system while placing a drive-thru order.

Expand All @@ -31,7 +31,7 @@ You might want to control ongoing conversation based on what's spoken by the use

#### Steering the conversation

[Application Inputs](/voice-agents-flow/features/application-inputs) allow you to steer the conversation by adding helpful updates & information asynchronously to Flow
[Application Inputs](/voice-agents/flow/features/application-inputs) allow you to steer the conversation by adding helpful updates & information asynchronously to Flow

### Managing call recordings and transcripts

Expand Down
61 changes: 61 additions & 0 deletions docs/voice-agents/flow/sidebar.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
export default {
type: "category",
label: "Flow",
collapsible: true,
collapsed: true,
items: [
{
type: "doc",
label: "Overview",
id: "voice-agents/flow/index",
},
{
type: "category",
label:"Features",
collapsible: true,
collapsed: true,
items: [
{
type: "doc",
id: "voice-agents/flow/features/application-inputs",
},
{
type: "doc",
id: "voice-agents/flow/features/function-calling",
},
{
type: "doc",
id: "voice-agents/flow/features/webrtc-livekit",
},
],
},
{
type: "category",
label:"Guides",
collapsible: true,
collapsed: true,
items: [
{
type: "doc",
id: "voice-agents/flow/guides/nextjs-guide",
},
{
type: "doc",
id: "voice-agents/flow/guides/react-native",
},
],
},
{
type: "doc",
id: "voice-agents/flow/setup",
},
{
type: "doc",
id: "voice-agents/flow/supported-formats-and-limits",
},
{
type: "doc",
id: "voice-agents/flow/supported-languages",
},
],
} as const;
3 changes: 3 additions & 0 deletions docs/voice-agents/sidebar.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import voiceAgentsFlowSidebar from "./flow/sidebar";

export default {
type: "category",
label: "Voice agents",
Expand All @@ -14,5 +16,6 @@ export default {
id: "voice-agents/features",
label: "Features",
},
voiceAgentsFlowSidebar,
],
} as const;
40 changes: 40 additions & 0 deletions scripts/redirects/redirects.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,45 @@
{
"source": "/speech-to-text/realtime/realtime-speaker-identification",
"destination": "/speech-to-text/realtime/speaker-identification"
},
{
"source": "/voice-agents-flow/features/application-inputs",
"destination": "/voice-agents/flow/features/application-inputs"
},
{
"source": "/voice-agents-flow/setup",
"destination": "/voice-agents/flow/setup"
},
{
"source": "/voice-agents-flow/features/function-calling",
"destination": "/voice-agents/flow/features/function-calling"
},
{
"source": "/voice-agents-flow",
"destination": "/voice-agents/flow"
},
{
"source": "/voice-agents-flow/supported-languages",
"destination": "/voice-agents/flow/supported-languages"
},
{
"source": "/voice-agents-flow/features/webrtc-livekit",
"destination": "/voice-agents/flow/features/webrtc-livekit"
},
{
"source": "/voice-agents-flow/guides/nextjs-guide",
"destination": "/voice-agents/flow/guides/nextjs-guide"
},
{
"source": "/voice-agents-flow/guides/react-native",
"destination": "/voice-agents/flow/guides/react-native"
},
{
"source": "/voice-agents-flow",
"destination": "/voice-agents/flow"
},
{
"source": "/voice-agents-flow",
"destination": "/voice-agents/flow"
}
]
2 changes: 0 additions & 2 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ import deploymentsSidebar from "./docs/deployments/sidebar";
import gettingStartedSidebar from "./docs/get-started/sidebar";
import speechToTextSidebar from "./docs/speech-to-text/sidebar";
import textToSpeechSidebar from "./docs/text-to-speech/sidebar";
import voiceAgentsFlowSidebar from "./docs/voice-agents-flow/sidebar";
import integrationsAndSDKSidebar from "./docs/integrations-and-sdks/sidebar";
import voiceAgentsSidebar from "./docs/voice-agents/sidebar";

Expand All @@ -14,7 +13,6 @@ export default {
voiceAgentsSidebar,
textToSpeechSidebar,
integrationsAndSDKSidebar,
voiceAgentsFlowSidebar,
deploymentsSidebar,
{
type: "category",
Expand Down
2 changes: 1 addition & 1 deletion spec/flow-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -707,7 +707,7 @@ components:
type: string
# description: The id of the agent or persona to use during the conversation.
description: |
Required in the the `StartConversation` message in the Flow API. Generated from the [Speechmatics Portal](https://portal.speechmatics.com/). This maps to the [language supported](/voice-agents-flow/supported-languages), agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.
Required in the the `StartConversation` message in the Flow API. Generated from the [Speechmatics Portal](https://portal.speechmatics.com/). This maps to the [language supported](/voice-agents/flow/supported-languages), agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.
template_variables:
type: object
additionalProperties:
Expand Down
Loading