Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions docs/voice-agents/assets/basic-quickstart.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,17 @@
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
"""Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset"""

# Audio configuration
SAMPLE_RATE = 16000 # Hz
CHUNK_SIZE = 160 # Samples per read
PRESET = "scribe" # Configuration preset

# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("YOUR_API_KEY"),
preset="scribe"
preset=PRESET
)

# Handle final segments
Expand All @@ -19,17 +26,20 @@ def on_segment(message):
print(f"{speaker}: {text}")

# Setup microphone
mic = Microphone(sample_rate=16000, chunk_size=320)
mic = Microphone(SAMPLE_RATE, CHUNK_SIZE)
if not mic.start():
print("Error: Microphone not available")
return

# Connect and stream
# Connect to the Voice agent
await client.connect()

# Stream microphone audio (interruptible using keyboard)
try:
while True:
audio_chunk = await mic.read(320)
audio_chunk = await mic.read(CHUNK_SIZE)
if not audio_chunk:
break # Microphone stopped producing data
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
Expand Down
9 changes: 9 additions & 0 deletions docs/voice-agents/assets/config-overlays.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
VoiceAgentConfig(
language="es",
max_delay=0.8
)
)
10 changes: 10 additions & 0 deletions docs/voice-agents/assets/config-serialization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Export preset to JSON
config_json = VoiceAgentConfigPreset.SCRIBE().to_json()

# Load from JSON
config = VoiceAgentConfig.from_json(config_json)

# Or create from JSON string
config = VoiceAgentConfig.from_json('{"language": "en", "enable_diarization": true}')
119 changes: 95 additions & 24 deletions docs/voice-agents/overview.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
description: Learn how to build voice-enabled applications with the Speechmatics Voice SDK
description: Learn how to build voice-enabled applications with the Speechmatics Voice SDK
---
import Admonition from '@theme/Admonition';
import CodeBlock from '@theme/CodeBlock';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

import pythonVoiceQuickstart from "./assets/basic-quickstart.py?raw"
import pythonVoicePresets from "./assets/presets.py?raw"
import pythonVoiceCustomConfig from "./assets/custom-config.py?raw"
import pythonVoiceConfigOverlays from "./assets/config-overlays.py?raw"
import pythonVoiceConfigSerialization from "./assets/config-serialization.py?raw"

# Voice agents overview
The Voice SDK builds on our Realtime API to provide features optimized for conversational AI:
# Voice SDK overview
The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python:

- **Intelligent segmentation**: groups words into meaningful speech segments per speaker.
- **Turn detection**: automatically detects when speakers finish talking.
Expand Down Expand Up @@ -39,7 +39,8 @@ Use the Realtime SDK when:

### 1. Create an API key

[Create an API key in the portal](https://portal.speechmatics.com/settings/api-keys) to access the Voice SDK. Store your key securely as a managed secret.
[Create a Speechmatics API key in the portal](https://portal.speechmatics.com/settings/api-keys) to access the Voice SDK.
Store your key securely as a managed secret.

### 2. Install dependencies

Expand All @@ -51,38 +52,108 @@ pip install speechmatics-voice
pip install speechmatics-voice[smart]
```

### 3. Configure
### 3. Quickstart

Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:

```python
import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
"""Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset"""

# Audio configuration
SAMPLE_RATE = 16000 # Hz
CHUNK_SIZE = 160 # Samples per read
PRESET = "scribe" # Configuration preset

# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
preset=PRESET
)

# Print finalised segments of speech with speaker ID
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")

# Setup microphone
mic = Microphone(SAMPLE_RATE, CHUNK_SIZE)
if not mic.start():
print("Error: Microphone not available")
return

# Connect to the Voice Agent
await client.connect()

# Stream microphone audio (interruptable using keyboard)
try:
while True:
audio_chunk = await mic.read(CHUNK_SIZE)
if not audio_chunk:
break # Microphone stopped producing data
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()

if __name__ == "__main__":
asyncio.run(main())

Replace `YOUR_API_KEY` with your actual API key from the portal:
```

#### Presets - the simplest way to get started
These are purpose-built, optimized configurations, ready for use without further modification:

`fast` - low latency, fast responses

`adaptive` - general conversation

`smart_turn` - complex conversation

`external` - user handles end of turn

`scribe` - note-taking

`captions` - live captioning

To view all available presets:
```python
presets = VoiceAgentConfigPreset.list_presets()
```

### 4. Custom configurations

For more control, you can also specify custom configurations or use presets as a starting point and customise with overlays:
<Tabs>
<TabItem value="python-sdk" label="Python (Voice SDK)">
<CodeBlock language="python">
{pythonVoiceQuickstart}
</CodeBlock>
</TabItem>
<TabItem value='voice-presets' label='Voice SDK presets'>
<TabItem value='voice-custom-config' label='Custom configurations'>
Specify configurations in a `VoiceAgentConfig` object:
<CodeBlock language="python">
{pythonVoicePresets}
{pythonVoiceCustomConfig}
</CodeBlock>
</TabItem>
<TabItem value='voice-custom-config' label='Custom config'>
<TabItem value='voice-custom-config-overlays' label='Preset with a custom overlay'>
Use presets as a starting point and customise with overlays:
<CodeBlock language="python">
{pythonVoiceCustomConfig}
{pythonVoiceConfigOverlays}
</CodeBlock>
</TabItem>
</Tabs>

## FAQ
Note: If no configuration or preset is provided, the client will default to the `external` preset.

### Implementation and deployment

<details>
<summary>Can I deploy this in my own environment?</summary>

Yes! The Voice SDK can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, [speak to sales](https://www.speechmatics.com/speak-to-sales).
</details>

## FAQ
### Support

<details>
Expand All @@ -93,7 +164,7 @@ You can submit feedback, bug reports, or feature requests through the Speechmati

## Next steps

For more information, see the [Voice SDK](https://github.com/speechmatics/speechmatics-python-sdk/tree/main/sdk/voice) on github.
For more information, see the [Voice SDK](https://github.com/speechmatics/speechmatics-python-sdk/tree/main/sdk/voice) on GitHub.

To learn more, check out [the Speechmatics Academy](https://github.com/speechmatics/speechmatics-academy).

Expand Down
110 changes: 0 additions & 110 deletions docs/voice-agents/quickstart.mdx

This file was deleted.