diff --git a/docs/deployments/container/cpu-speech-to-text.mdx b/docs/deployments/container/cpu-speech-to-text.mdx index 317e4c73..60b63656 100644 --- a/docs/deployments/container/cpu-speech-to-text.mdx +++ b/docs/deployments/container/cpu-speech-to-text.mdx @@ -266,7 +266,7 @@ The parameters are: - `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text) -- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/#operating-points) you want to prewarm +- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/languages#operating-points) you want to prewarm - `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start. diff --git a/docs/deployments/container/gpu-speech-to-text.mdx b/docs/deployments/container/gpu-speech-to-text.mdx index 3889a466..815ff5c8 100644 --- a/docs/deployments/container/gpu-speech-to-text.mdx +++ b/docs/deployments/container/gpu-speech-to-text.mdx @@ -107,7 +107,7 @@ Once the GPU Server is running, follow the [Instructions for Linking a CPU Conta ### Running only one operating point -[Operating Points](/speech-to-text/#operating-points-1) represent different levels of model complexity. +[Operating Points](/speech-to-text/languages#operating-points) represent different levels of model complexity. To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the `SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`. diff --git a/docs/deployments/container/gpu-translation.mdx b/docs/deployments/container/gpu-translation.mdx index 7cba9017..ced1601a 100644 --- a/docs/deployments/container/gpu-translation.mdx +++ b/docs/deployments/container/gpu-translation.mdx @@ -207,6 +207,6 @@ If one or more of the target languages are not supported for the source language } ``` -Please note, this behaviour is different when using our [SaaS Deployment](/speech-to-text/features/translation#unsupported-target-language). +Please note, this behaviour is different when using our SaaS Deployment. -For all other errors, please see [documentation here](/speech-to-text/features/translation#batch-error-responses) +For all other errors, please see our documentation. diff --git a/docs/deployments/kubernetes/index.mdx b/docs/deployments/kubernetes/index.mdx index a6e20e62..a90dcc7a 100644 --- a/docs/deployments/kubernetes/index.mdx +++ b/docs/deployments/kubernetes/index.mdx @@ -33,4 +33,4 @@ Using Helm, customers can customize deployments through configurable values, man Speechmatics Kubernetes deployment supports the following applications: - [Realtime](/speech-to-text/realtime/quickstart): Stream audio from an input device or file and receive real-time transcription updates as audio is processed. - - [Voice Agent – Flow](/voice-agents-flow): A Voice Agent API that enables responsive, real-time speech-to-speech interactions in your applications. +- [Voice Agent – Flow](/voice-agents/flow): A Voice Agent API that enables responsive, real-time speech-to-speech interactions in your applications. \ No newline at end of file diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index fbfa6f5f..cc77e31f 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -13,7 +13,7 @@ import batchSchema from "!openapi-schema-loader!@site/spec/batch.yaml"; :::info This page documents audio inputs for transcription by **REST API** (a.k.a. Batch SaaS). * For Realtime transcription, see the [Realtime Transcription input](/speech-to-text/realtime/input). -* For Flow Voice AI, see the [Flow Voice AI supported formats and limits](/voice-agents-flow/supported-formats-and-limits). +* For Flow Voice AI, see the [Flow Voice AI supported formats and limits](/voice-agents/flow/supported-formats-and-limits). ::: ## Supported file types diff --git a/docs/speech-to-text/batch/language-identification.mdx b/docs/speech-to-text/batch/language-identification.mdx index 8022aba3..7407e024 100644 --- a/docs/speech-to-text/batch/language-identification.mdx +++ b/docs/speech-to-text/batch/language-identification.mdx @@ -375,7 +375,7 @@ This error is available when checking the [job details](//api-ref/batch/get-job- ### Errors when used with translation -It is not possible to translate between all language pairs. When `auto` language is used, this can mean some translation target languages will not be available. See the full list of [Supported Language Pairs](/speech-to-text/features/translation#supported-translation-pairs). +It is not possible to translate between all language pairs. When `auto` language is used, this can mean some translation target languages will not be available. See the full list of [Supported Language Pairs](/speech-to-text/features/translation#languages). These errors are available when getting the [job transcript](/api-ref/batch/get-the-transcript-for-a-transcription-job): diff --git a/docs/speech-to-text/features/audio-filtering.mdx b/docs/speech-to-text/features/audio-filtering.mdx index 5bb1b18e..5855df09 100644 --- a/docs/speech-to-text/features/audio-filtering.mdx +++ b/docs/speech-to-text/features/audio-filtering.mdx @@ -73,6 +73,6 @@ To obtain volume labelling without filtering any audio, supply an empty config o Once the audio is in a raw format (16kHz 16bit mono), it is split into 0.01s chunks. For each chunk, the root mean square amplitude of the signal is calculated, and scaled to the range `0 - 100`. If the volume is less than the supplied cut-off, the chunk will be replaced with silence. -To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [Enhanced Operating Point](/speech-to-text/#operating-points-1), which is more robust against inadvertent damage to the audio. +To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [Enhanced Operating Points](/speech-to-text/languages#operating-points), which is more robust against inadvertent damage to the audio. The word volume calculation takes the start and end times of words, and applies a weighted average of the volumes of each audio chunk which make up the word. The weighting attempts to ignore areas of silence within long words, and provide a better match with the volume classification a human listener would make. diff --git a/docs/speech-to-text/features/feature-discovery.mdx b/docs/speech-to-text/features/feature-discovery.mdx index 09daf484..f6fb3bfb 100644 --- a/docs/speech-to-text/features/feature-discovery.mdx +++ b/docs/speech-to-text/features/feature-discovery.mdx @@ -24,6 +24,5 @@ The feature discovery endpoint will include an object with the following propert - `languages` - Includes a list of supported ISO language codes - `locales` - Includes any languages with a supported [Output Locale](/speech-to-text/formatting#output-locale) - `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text) - - `translation` - Includes all supported [translation pairs](/speech-to-text/features/translation#supported-translation-pairs) + - `translation` - Includes all [supported translation pairs](/speech-to-text/features/translation#languages) - `languageid` - List of languages supported by [Language Identification](/speech-to-text/batch/language-identification) - diff --git a/docs/speech-to-text/features/translation.mdx b/docs/speech-to-text/features/translation.mdx index 9b3e0aad..59656215 100644 --- a/docs/speech-to-text/features/translation.mdx +++ b/docs/speech-to-text/features/translation.mdx @@ -60,7 +60,7 @@ You can configure up to five translation languages at a time. ## Batch output -The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same [ISO language codes](/speech-to-text/languages#languages) as for transcription). +The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same [ISO language codes](/speech-to-text/languages) as for transcription). diff --git a/docs/speech-to-text/formatting.mdx b/docs/speech-to-text/formatting.mdx index e2842912..71bcfaa4 100644 --- a/docs/speech-to-text/formatting.mdx +++ b/docs/speech-to-text/formatting.mdx @@ -398,7 +398,7 @@ This configuration: The `sensitivity` parameter accepts values from 0 to 1. Higher values produce more punctuation in the output. :::warning -Disabling punctuation may slightly reduce speaker diarization accuracy. See the [speaker diarization and punctuation](/speech-to-text/features/diarization#speaker-diarization-and-punctuation) section for details. +Disabling punctuation may slightly reduce speaker diarization accuracy. See the [speaker diarization and punctuation](/speech-to-text/features/diarization#diarization-and-punctuation) section for details. ::: ## Next steps diff --git a/docs/speech-to-text/realtime/input.mdx b/docs/speech-to-text/realtime/input.mdx index 06b2ce15..904869dc 100644 --- a/docs/speech-to-text/realtime/input.mdx +++ b/docs/speech-to-text/realtime/input.mdx @@ -14,7 +14,7 @@ import realtimeSchema from "!asyncapi-schema-loader!@site/spec/realtime.yaml" :::info This page is about the **Real-time transcription API** (websocket). * For information on Batch SaaS, see the [Batch SaaS input](/speech-to-text/batch/input). -* For information on Flow Voice AI, see the [Flow Voice AI input](/voice-agents-flow/supported-formats-and-limits). +* For information on Flow Voice AI, see the [Flow Voice AI input](/voice-agents/flow/supported-formats-and-limits). ::: ## Supported input audio formats diff --git a/docs/voice-agents-flow/sidebar.ts b/docs/voice-agents-flow/sidebar.ts deleted file mode 100644 index 418ef968..00000000 --- a/docs/voice-agents-flow/sidebar.ts +++ /dev/null @@ -1,49 +0,0 @@ -export default { - type: "category", - label: "Voice agents – Flow", - collapsible: false, - collapsed: false, - items: [ - { - type: "doc", - id: "voice-agents-flow/index", - }, - { - type: "category", - label: "Features", - items: [ - { - type: "autogenerated", - dirName: "voice-agents-flow/features", - }, - ], - }, - { - type: "category", - label: "Guides", - items: [ - { - type: "autogenerated", - dirName: "voice-agents-flow/guides", - }, - { - type: "doc", - id: "guides/projects", - }, - ], - }, - { - type: "doc", - id: "voice-agents-flow/setup", - }, - { - type: "doc", - id: "voice-agents-flow/supported-formats-and-limits", - }, - - { - type: "doc", - id: "voice-agents-flow/supported-languages", - }, - ], -} as const; diff --git a/docs/voice-agents-flow/features/application-inputs.mdx b/docs/voice-agents/flow/features/application-inputs.mdx similarity index 100% rename from docs/voice-agents-flow/features/application-inputs.mdx rename to docs/voice-agents/flow/features/application-inputs.mdx diff --git a/docs/voice-agents-flow/features/assets/function-calling.py b/docs/voice-agents/flow/features/assets/function-calling.py similarity index 100% rename from docs/voice-agents-flow/features/assets/function-calling.py rename to docs/voice-agents/flow/features/assets/function-calling.py diff --git a/docs/voice-agents-flow/features/assets/livekit-poc.html b/docs/voice-agents/flow/features/assets/livekit-poc.html similarity index 100% rename from docs/voice-agents-flow/features/assets/livekit-poc.html rename to docs/voice-agents/flow/features/assets/livekit-poc.html diff --git a/docs/voice-agents-flow/features/function-calling.mdx b/docs/voice-agents/flow/features/function-calling.mdx similarity index 100% rename from docs/voice-agents-flow/features/function-calling.mdx rename to docs/voice-agents/flow/features/function-calling.mdx diff --git a/docs/voice-agents-flow/features/webrtc-livekit.mdx b/docs/voice-agents/flow/features/webrtc-livekit.mdx similarity index 100% rename from docs/voice-agents-flow/features/webrtc-livekit.mdx rename to docs/voice-agents/flow/features/webrtc-livekit.mdx diff --git a/docs/voice-agents-flow/guides/nextjs-guide.mdx b/docs/voice-agents/flow/guides/nextjs-guide.mdx similarity index 100% rename from docs/voice-agents-flow/guides/nextjs-guide.mdx rename to docs/voice-agents/flow/guides/nextjs-guide.mdx diff --git a/docs/voice-agents-flow/guides/react-native.mdx b/docs/voice-agents/flow/guides/react-native.mdx similarity index 100% rename from docs/voice-agents-flow/guides/react-native.mdx rename to docs/voice-agents/flow/guides/react-native.mdx diff --git a/docs/voice-agents-flow/index.md b/docs/voice-agents/flow/index.md similarity index 100% rename from docs/voice-agents-flow/index.md rename to docs/voice-agents/flow/index.md diff --git a/docs/voice-agents-flow/setup.mdx b/docs/voice-agents/flow/setup.mdx similarity index 95% rename from docs/voice-agents-flow/setup.mdx rename to docs/voice-agents/flow/setup.mdx index 9d07e642..8531e27d 100644 --- a/docs/voice-agents-flow/setup.mdx +++ b/docs/voice-agents/flow/setup.mdx @@ -20,7 +20,7 @@ For more details, refer to [StartConversation API reference](/api-ref/flow-voice ### Function calling -[Function Calling](/voice-agents-flow/features/function-calling) allows you to connect Flow to external tools and systems. This unlocks Flow's ability to act in the real-world and better serve the needs of your users. +[Function Calling](/voice-agents/flow/features/function-calling) allows you to connect Flow to external tools and systems. This unlocks Flow's ability to act in the real-world and better serve the needs of your users. This could involve needing real-time information such as opening/closing times or validation services for authentication or action APIs that control a fast food system while placing a drive-thru order. @@ -31,7 +31,7 @@ You might want to control ongoing conversation based on what's spoken by the use #### Steering the conversation -[Application Inputs](/voice-agents-flow/features/application-inputs) allow you to steer the conversation by adding helpful updates & information asynchronously to Flow +[Application Inputs](/voice-agents/flow/features/application-inputs) allow you to steer the conversation by adding helpful updates & information asynchronously to Flow ### Managing call recordings and transcripts diff --git a/docs/voice-agents/flow/sidebar.ts b/docs/voice-agents/flow/sidebar.ts new file mode 100644 index 00000000..b6f69577 --- /dev/null +++ b/docs/voice-agents/flow/sidebar.ts @@ -0,0 +1,61 @@ +export default { + type: "category", + label: "Flow", + collapsible: true, + collapsed: true, + items: [ + { + type: "doc", + label: "Overview", + id: "voice-agents/flow/index", + }, + { + type: "category", + label:"Features", + collapsible: true, + collapsed: true, + items: [ + { + type: "doc", + id: "voice-agents/flow/features/application-inputs", + }, + { + type: "doc", + id: "voice-agents/flow/features/function-calling", + }, + { + type: "doc", + id: "voice-agents/flow/features/webrtc-livekit", + }, + ], + }, + { + type: "category", + label:"Guides", + collapsible: true, + collapsed: true, + items: [ + { + type: "doc", + id: "voice-agents/flow/guides/nextjs-guide", + }, + { + type: "doc", + id: "voice-agents/flow/guides/react-native", + }, + ], + }, + { + type: "doc", + id: "voice-agents/flow/setup", + }, + { + type: "doc", + id: "voice-agents/flow/supported-formats-and-limits", + }, + { + type: "doc", + id: "voice-agents/flow/supported-languages", + }, + ], +} as const; \ No newline at end of file diff --git a/docs/voice-agents-flow/supported-formats-and-limits.mdx b/docs/voice-agents/flow/supported-formats-and-limits.mdx similarity index 100% rename from docs/voice-agents-flow/supported-formats-and-limits.mdx rename to docs/voice-agents/flow/supported-formats-and-limits.mdx diff --git a/docs/voice-agents-flow/supported-languages.mdx b/docs/voice-agents/flow/supported-languages.mdx similarity index 100% rename from docs/voice-agents-flow/supported-languages.mdx rename to docs/voice-agents/flow/supported-languages.mdx diff --git a/docs/voice-agents/sidebar.ts b/docs/voice-agents/sidebar.ts index 412e29bc..f14bba42 100644 --- a/docs/voice-agents/sidebar.ts +++ b/docs/voice-agents/sidebar.ts @@ -1,3 +1,5 @@ +import voiceAgentsFlowSidebar from "./flow/sidebar"; + export default { type: "category", label: "Voice agents", @@ -14,5 +16,6 @@ export default { id: "voice-agents/features", label: "Features", }, + voiceAgentsFlowSidebar, ], } as const; \ No newline at end of file diff --git a/scripts/redirects/redirects.json b/scripts/redirects/redirects.json index 9481173c..40a720f2 100644 --- a/scripts/redirects/redirects.json +++ b/scripts/redirects/redirects.json @@ -22,5 +22,45 @@ { "source": "/speech-to-text/realtime/realtime-speaker-identification", "destination": "/speech-to-text/realtime/speaker-identification" + }, + { + "source": "/voice-agents-flow/features/application-inputs", + "destination": "/voice-agents/flow/features/application-inputs" + }, + { + "source": "/voice-agents-flow/setup", + "destination": "/voice-agents/flow/setup" + }, + { + "source": "/voice-agents-flow/features/function-calling", + "destination": "/voice-agents/flow/features/function-calling" + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow" + }, + { + "source": "/voice-agents-flow/supported-languages", + "destination": "/voice-agents/flow/supported-languages" + }, + { + "source": "/voice-agents-flow/features/webrtc-livekit", + "destination": "/voice-agents/flow/features/webrtc-livekit" + }, + { + "source": "/voice-agents-flow/guides/nextjs-guide", + "destination": "/voice-agents/flow/guides/nextjs-guide" + }, + { + "source": "/voice-agents-flow/guides/react-native", + "destination": "/voice-agents/flow/guides/react-native" + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow" + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow" } ] diff --git a/sidebars.ts b/sidebars.ts index 9a81597c..88dcea82 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -3,7 +3,6 @@ import deploymentsSidebar from "./docs/deployments/sidebar"; import gettingStartedSidebar from "./docs/get-started/sidebar"; import speechToTextSidebar from "./docs/speech-to-text/sidebar"; import textToSpeechSidebar from "./docs/text-to-speech/sidebar"; -import voiceAgentsFlowSidebar from "./docs/voice-agents-flow/sidebar"; import integrationsAndSDKSidebar from "./docs/integrations-and-sdks/sidebar"; import voiceAgentsSidebar from "./docs/voice-agents/sidebar"; @@ -14,7 +13,6 @@ export default { voiceAgentsSidebar, textToSpeechSidebar, integrationsAndSDKSidebar, - voiceAgentsFlowSidebar, deploymentsSidebar, { type: "category", diff --git a/spec/flow-api.yaml b/spec/flow-api.yaml index f8e6e671..b5f55428 100644 --- a/spec/flow-api.yaml +++ b/spec/flow-api.yaml @@ -707,7 +707,7 @@ components: type: string # description: The id of the agent or persona to use during the conversation. description: | - Required in the the `StartConversation` message in the Flow API. Generated from the [Speechmatics Portal](https://portal.speechmatics.com/). This maps to the [language supported](/voice-agents-flow/supported-languages), agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal. + Required in the the `StartConversation` message in the Flow API. Generated from the [Speechmatics Portal](https://portal.speechmatics.com/). This maps to the [language supported](/voice-agents/flow/supported-languages), agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal. template_variables: type: object additionalProperties: diff --git a/vercel.json b/vercel.json index 9786e042..1d018c97 100644 --- a/vercel.json +++ b/vercel.json @@ -31,6 +31,56 @@ "destination": "/speech-to-text/realtime/speaker-identification", "permanent": true }, + { + "source": "/voice-agents-flow/features/application-inputs", + "destination": "/voice-agents/flow/features/application-inputs", + "permanent": true + }, + { + "source": "/voice-agents-flow/setup", + "destination": "/voice-agents/flow/setup", + "permanent": true + }, + { + "source": "/voice-agents-flow/features/function-calling", + "destination": "/voice-agents/flow/features/function-calling", + "permanent": true + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow", + "permanent": true + }, + { + "source": "/voice-agents-flow/supported-languages", + "destination": "/voice-agents/flow/supported-languages", + "permanent": true + }, + { + "source": "/voice-agents-flow/features/webrtc-livekit", + "destination": "/voice-agents/flow/features/webrtc-livekit", + "permanent": true + }, + { + "source": "/voice-agents-flow/guides/nextjs-guide", + "destination": "/voice-agents/flow/guides/nextjs-guide", + "permanent": true + }, + { + "source": "/voice-agents-flow/guides/react-native", + "destination": "/voice-agents/flow/guides/react-native", + "permanent": true + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow", + "permanent": true + }, + { + "source": "/voice-agents-flow", + "destination": "/voice-agents/flow", + "permanent": true + }, { "source": "/jobsapi", "destination": "/api-ref/batch/create-a-new-job", @@ -163,42 +213,42 @@ }, { "source": "/flow/application-inputs", - "destination": "/voice-agents-flow/features/application-inputs", + "destination": "/voice-agents/flow/features/application-inputs", "permanent": true }, { "source": "/flow/config", - "destination": "/voice-agents-flow/setup", + "destination": "/voice-agents/flow/setup", "permanent": true }, { "source": "/flow/function-calling", - "destination": "/voice-agents-flow/features/function-calling", + "destination": "/voice-agents/flow/features/function-calling", "permanent": true }, { "source": "/flow/introduction", - "destination": "/voice-agents-flow", + "destination": "/voice-agents/flow", "permanent": true }, { "source": "/flow/languages-supported", - "destination": "/voice-agents-flow/supported-languages", + "destination": "/voice-agents/flow/supported-languages", "permanent": true }, { "source": "/flow/livekit-webrtc", - "destination": "/voice-agents-flow/features/webrtc-livekit", + "destination": "/voice-agents/flow/features/webrtc-livekit", "permanent": true }, { "source": "/flow/nextjs-guide", - "destination": "/voice-agents-flow/guides/nextjs-guide", + "destination": "/voice-agents/flow/guides/nextjs-guide", "permanent": true }, { "source": "/flow/react-native-guide", - "destination": "/voice-agents-flow/guides/react-native", + "destination": "/voice-agents/flow/guides/react-native", "permanent": true }, { @@ -863,12 +913,12 @@ }, { "source": "/flow", - "destination": "/voice-agents-flow", + "destination": "/voice-agents/flow", "permanent": true }, { "source": "/flow/getting-started", - "destination": "/voice-agents-flow", + "destination": "/voice-agents/flow", "permanent": true }, {