A lightweight, enterprise-ready template to create speech-to-speech voice agents
with natural-sounding voices and seamless telephony integration.
Quick Start • Features • Architecture • Documentation • Resources
- Overview
- Quick Start
- Features
- Screenshots
- Architecture
- Session Types
- MCP Server
- .NET Aspire Orchestration
- WebSocket Protocol
- Azure Setup Guides
- Deployment
- Testing
- Configuration Reference
- Project Structure
- Troubleshooting
- Resources
Voice Agent C# leverages Azure Voice Live API and Azure Communication Services to deliver personalized self-service experiences with natural-sounding voices.
flowchart TB
subgraph Clients["Client Layer"]
Browser["🌐 Web Browser<br/>(Microphone)"]
Phone["📞 PSTN Phone<br/>(ACS)"]
Avatar["👤 WebRTC<br/>(Avatar Video)"]
end
subgraph App["Voice Agent C# Application"]
direction TB
VA["Voice Assistant<br/>Session"]
VAgent["Voice Agent<br/>Session"]
VAv["Voice Avatar<br/>Session"]
ACS["Incoming Call<br/>Handler"]
Factory["VoiceSessionFactory"]
SDK["VoiceLiveClient SDK"]
VA & VAgent & VAv & ACS --> Factory
Factory --> SDK
end
subgraph Azure["Azure Services"]
VoiceLive["Azure Voice Live API<br/>ASR + LLM + TTS"]
Foundry["Azure AI Foundry<br/>Agents"]
MCP["MCP Server<br/>Tools"]
ACSService["Azure Communication<br/>Services"]
end
Browser -->|"WebSocket"| App
Phone -->|"Media Stream"| ACSService
ACSService -->|"Events"| App
Avatar -->|"WebRTC"| App
SDK --> VoiceLive
SDK --> Foundry
SDK --> MCP
💡 Key Technologies:
- Azure Voice Live API — Unified ASR + LLM + TTS for low-latency speech-to-speech
- Azure Communication Services — PSTN telephony integration with real-time event triggers
- .NET Aspire — Local orchestration with service discovery and telemetry
⚠️ Responsibility Notice: You are responsible for assessing all associated risks and complying with applicable laws. See transparency docs for Voice Live API and ACS.
| Tool | Purpose | Install |
|---|---|---|
| .NET 10 SDK | Application runtime | winget install Microsoft.DotNet.SDK.10 |
| .NET Aspire workload | Local orchestration | dotnet workload install aspire |
| Azure CLI | Azure management | winget install Microsoft.AzureCLI |
| Azure Developer CLI | Deployment | winget install Microsoft.Azd |
# Clone and navigate to the project
git clone https://github.com/congiuluc/voice-agent-csharp.git
cd voice-agent-csharp
# Start all services via Aspire
cd aspire/voice-agent-csharp.AppHost
dotnet runThe Aspire Dashboard opens automatically at https://localhost:17122 showing all services, logs, and traces.
azd auth login
azd up| Feature | Description |
|---|---|
| 🎙️ Voice Assistant | Direct GPT model conversations with customizable instructions |
| 🤖 Voice Agent | Azure AI Foundry Agents for managed AI experiences |
| 👤 Voice Avatar | Real-time talking avatar with WebRTC video streaming |
| 📞 ACS Integration | PSTN phone call handling via Azure Communication Services |
| 🔧 MCP Tools | Extensible Model Context Protocol server |
| 🌐 Web Client | Browser-based testing with microphone support |
| 🚀 .NET Aspire | Service discovery, health checks, and OpenTelemetry |
| Category | Features |
|---|---|
| Audio Processing | Real-time ASR/LLM/TTS, server-side VAD, echo cancellation, noise reduction |
| Multi-modal | Audio, text, and video (avatar) support |
| Integration | MCP tool calling, native function calls, Foundry Agents |
| Security | Managed Identity, login protection, PBKDF2 password hashing |
| Observability | Application Insights, Serilog, OpenTelemetry, health endpoints |
| UI | Razor Pages, dark/light theme toggle |
| Field | Value |
|---|---|
| Username | admin |
| Password | Pa$$w0rd! |
📝 Change Password
1. Generate a hash:
.\generate-password-hash.ps1 -Password "YourNewPassword"2. Update src/appsettings.json:
{
"Security": {
"Authentication": {
"Username": "admin",
"PasswordHash": "YOUR_GENERATED_HASH_HERE"
}
}
}Or use environment variables (recommended for production):
Security__Authentication__UsernameSecurity__Authentication__PasswordHash
┌─────────────────────────────────────────────────────────────────────────────┐
│ Client Layer │
├─────────────────┬─────────────────┬─────────────────────────────────────────┤
│ Web Browser │ PSTN Phone │ WebRTC (Avatar) │
│ (Microphone) │ (ACS) │ │
└────────┬────────┴────────┬────────┴────────┬────────────────────────────────┘
│ WebSocket │ Media Stream │ WebRTC
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Voice Agent C# Application │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Voice │ │ Voice │ │ Voice │ │ Incoming Call │ │
│ │ Assistant │ │ Agent │ │ Avatar │ │ Handler (ACS) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └───────┬────────┘ │
│ └─────────────────┴─────────────────┴──────────────────┘ │
│ VoiceSessionFactory → VoiceLiveClient SDK │
└────────────────────────────────────────┬────────────────────────────────────┘
│
┌───────────────────────────────┼───────────────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌───────────────────────┐ ┌─────────────────┐
│ MCP Server │ │ Azure Voice Live API │ │ Azure AI Foundry│
│ ├─ Weather │ │ • Speech Recognition │ │ • Agent Runtime │
│ └─ DateTime │ │ • GPT Models │ │ • MCP Tools │
│ │ │ • Neural TTS │ │ • Knowledge Base│
└─────────────────┘ └───────────────────────┘ └─────────────────┘
The application supports four distinct session types, each optimized for different use cases:
Direct GPT model conversations with full customization.
| Feature | Support |
|---|---|
| Custom model selection | ✅ gpt-4o, gpt-4o-mini, etc. |
| Custom instructions | ✅ System prompt |
| MCP tool integration | ✅ |
| VAD, echo cancellation | ✅ |
Use Cases: Customer service chatbots, FAQ assistants, general voice interfaces
Configuration Example
{
"AzureVoiceLive": {
"Model": "gpt-4o-mini",
"Voice": "en-US-AvaNeural",
"Locale": "en-US"
}
}Integration with Azure AI Foundry Agents for managed experiences.
| Feature | Support |
|---|---|
| Foundry Agent integration | ✅ |
| Agent-managed tools | ✅ |
| Project-based selection | ✅ |
| Token-based auth | ✅ |
Use Cases: Enterprise voice agents with knowledge bases, multi-turn flows, complex business logic
Configuration Example
{
"AzureVoiceLive": {
"Endpoint": "https://your-foundry.openai.azure.com/",
"FoundryAgentId": "asst_xxxxx",
"FoundryProjectName": "your-project"
}
}Real-time talking avatar with WebRTC video streaming.
| Feature | Support |
|---|---|
| WebRTC video | ✅ |
| Multiple characters | ✅ lisa, harry, etc. |
| Configurable styles | ✅ casual-sitting, etc. |
| H.264 codec | ✅ |
Use Cases: Virtual assistants, interactive kiosks, accessible interfaces
Configuration Example
{
"AzureVoiceLive": {
"AvatarCharacter": "lisa",
"AvatarStyle": "casual-sitting",
"AvatarVideoWidth": 1920,
"AvatarVideoHeight": 1080,
"AvatarVideoBitrate": 2000,
"AvatarCodec": "H264"
}
}Azure Communication Services integration for PSTN telephony.
| Endpoint | Description |
|---|---|
POST /acs/incomingcall |
Event Grid webhook |
POST /acs/callbacks/{contextId} |
Call automation callbacks |
WS /acs/ws |
Media streaming WebSocket |
Use Cases: Call center automation, IVR replacement, phone-based customer service
The solution includes a Model Context Protocol (MCP) server providing extensible tools.
| Tool | Description | Endpoint |
|---|---|---|
| DateTime | Get current date/time with timezone support | GET /api/tools/datetime?timezone={tz} |
| Weather | Get weather via Open-Meteo API | GET /api/tools/weather?location={loc} |
{
"McpServer": {
"Url": "http://localhost:5001",
"Label": "voice-agent-mcp",
"Enabled": true,
"AllowedTools": ""
}
}📝 Adding Custom Tools
1. Create a tool class in mcp/Tools/:
using ModelContextProtocol.Server;
using System.ComponentModel;
namespace mcpServer.Tools
{
[McpServerToolType]
public class MyCustomTools
{
[McpServerTool]
[Description("My custom tool description")]
public async Task<string> MyTool(
[Description("Parameter description")] string parameter)
{
return "Result";
}
}
}2. Register in Program.cs:
builder.Services.AddScoped<MyCustomTools>();| Endpoint | Description |
|---|---|
GET /health |
Health check |
GET /openapi/v1.json |
OpenAPI specification |
GET /scalar/v1 |
Scalar API documentation |
.NET Aspire provides local development orchestration with service discovery, health checks, and distributed tracing.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Aspire AppHost (Orchestrator) │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ MCP Server │◄───────►│ Web Frontend │ │
│ │ • /health │ │ • /health │ │
│ │ • /api/tools/* │ │ • WebSocket │ │
│ │ • MCP protocol │ │ • Razor Pages │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ Service Defaults: OpenTelemetry • Health checks • Service discovery │
└─────────────────────────────────────────────────────────────────────────────┘
│
Aspire Dashboard (https://localhost:17122)
• Real-time logs • Distributed traces • Metrics
cd aspire/voice-agent-csharp.AppHost
dotnet run # Uses HTTPS profile (default)
dotnet run --launch-profile http # Uses HTTP profile🔧 Troubleshooting Aspire
| Issue | Solution |
|---|---|
| Dashboard not opening | Check launchBrowser: true in launchSettings.json |
| Service not starting | Check health endpoint returns 200 OK |
| Missing traces | Verify OTEL_EXPORTER_OTLP_ENDPOINT is set |
| Project not found | Ensure ProjectReference paths are correct |
| Endpoint | Handler | Description |
|---|---|---|
WS /voice/ws |
Voice streaming | Base64 encoded audio |
WS /web/ws |
Web client | Raw PCM16 audio |
WS /avatar/ws |
Avatar | WebRTC video support |
WS /acs/ws |
ACS | Media streaming |
📨 Message Types
Config Message (sent at connection start):
{
"kind": "Config",
"sessionType": "Assistant",
"voiceModel": "gpt-4o",
"voice": "en-US-AvaNeural",
"locale": "en-US",
"welcomeMessage": "Hello! How can I help you?",
"voiceModelInstructions": "You are a helpful assistant..."
}Text Message:
{ "kind": "Message", "text": "What's the weather like?" }Audio Data:
{ "kind": "AudioData", "audioData": { "data": "base64-pcm16", "silent": false } }Transcription:
{ "kind": "Transcription", "text": "Hello!", "role": "user" }Session Event:
{ "kind": "SessionEvent", "event": "SessionConnected", "payload": {...} }Error:
{ "kind": "Error", "message": "Failed to initialize session" }📊 Connection Flow Diagram
Web Client Voice Agent Server Voice Live API
│ │ │
│ 1. WebSocket Connect │ │
│─────────────────────────────>│ │
│ 2. Config Message │ │
│─────────────────────────────>│ │
│ │ 3. Create VoiceLiveClient│
│ │─────────────────────────>│
│ │ 4. session.created │
│ 5. SessionConnected │<─────────────────────────│
│<─────────────────────────────│ │
│ 6. Audio Data (PCM16) │ │
│─────────────────────────────>│ 7. SendInputAudioAsync │
│ │─────────────────────────>│
│ 8. Transcription (user) │ transcription │
│<─────────────────────────────│<─────────────────────────│
│ 9. Audio Delta │ response.audio.delta │
│<─────────────────────────────│<─────────────────────────│
- Azure subscription with permissions to create resources
- Recommended Regions:
swedencentral,eastus2,westus2
| Service | Description | Pricing |
|---|---|---|
| Azure Speech Voice Live | Speech-to-speech interactions | View |
| Azure Communication Services | Call workflows | View |
| Azure Container Apps | App hosting | View |
| Azure Container Registry | Container images | View |
| Azure Key Vault | Secrets management | View |
| Azure AI Foundry | Agent hosting | View |
💡 Use the Azure pricing calculator to estimate costs.
- Azure AI Foundry project with a deployed GPT model
- Azure AI User RBAC role at project scope
-
Navigate to Azure AI Foundry:
- Open Azure Portal → Resource Group → AI Project → Launch studio
-
Create Agent:
- Select Agents → + New agent
-
Configure:
- Name: e.g., "Voice Customer Service Agent"
- Instructions: Define behavior and personality
- Model: Select deployed model (e.g.,
gpt-4o-mini) - Tools: Add Code Interpreter, File Search, or custom functions
-
Get Agent ID:
- Copy the Agent ID (
asst_xxxxx) after saving
- Copy the Agent ID (
-
Update Configuration:
{ "AzureVoiceLive": { "Endpoint": "https://your-project.services.ai.azure.com/", "FoundryAgentId": "asst_xxxxxxxxxxxxx", "FoundryProjectName": "your-project-name" } }
- Paid Azure subscription (not trial/free credits)
- Azure Communication Services resource
- Billing address in supported region
-
Navigate: Azure Portal → Communication Services → Phone numbers
-
Search: Click Get → Select:
- Country/Region
- Number Type: Toll-free or Local
- Use case: A2P (Application to Person)
- Calling: ✅ Make calls, ✅ Receive calls
-
Purchase: Select a number → Add to cart → Buy now
-
Verify: Number appears after provisioning (few minutes)
📝 Numbers are held for 16 minutes during selection. Monthly charges apply.
-
Create Subscription:
- Azure Portal → Communication Services → Events → + Event Subscription
-
Configure:
Field Value Name incoming-call-subscriptionEvent Types ✅ Incoming Call Endpoint Type Web Hook Endpoint https://<your-app>/acs/incomingcall -
Retry Policy (recommended):
- Max Delivery Attempts:
2 - Event TTL:
1minute
- Max Delivery Attempts:
-
Verify: Event Grid sends validation event → App responds → Status shows Active
# 1. Login
azd auth login
# 2. Provision and deploy
azd up
# → Provide environment name (e.g., "voice-agent-prod")
# → Select subscription and location (swedencentral recommended)
# 3. Subsequent deployments
azd deploy| Resource | Description |
|---|---|
| Resource Group | rg-{environmentName}-{suffix} |
| User Assigned Identity | App authentication |
| Azure AI Services | Voice Live API + GPT models |
| Communication Services | Phone call integration |
| Key Vault | Secure secrets storage |
| Container Registry | Container images |
| Container Apps | Main app + MCP server |
| Log Analytics | Centralized logging |
| Application Insights | Telemetry and monitoring |
- Navigate to application URL
- Select session type: Voice Assistant, Voice Agent, or Voice Avatar
- Click Start Talking → Speak → Click Stop Conversation
⚠️ Web client is for testing purposes only.
- Set up Event Grid webhook
- Purchase phone number
- Dial the ACS phone number to connect
Option 1: .NET Aspire (Recommended)
cd aspire/voice-agent-csharp.AppHost
dotnet runOption 2: Manual Start
# Terminal 1: MCP Server
cd mcp && dotnet run
# Terminal 2: Main App
cd src && dotnet runAccess at https://localhost:5001
{
"AzureVoiceLive": {
"ApiKey": "",
"Endpoint": "",
"SpeechRegion": "westeurope",
"Model": "gpt-4o",
"Voice": "en-US-AvaNeural",
"Locale": "en-US",
"UseDefaultAzureCredential": true,
"AvatarCharacter": "lisa",
"AvatarStyle": "casual-sitting"
},
"AzureCommunicationServices": {
"Endpoint": "",
"DevTunnel": ""
},
"McpServer": {
"Url": "http://localhost:5001",
"Label": "voice-agent-mcp",
"Enabled": true
},
"ApplicationInsights": {
"ConnectionString": ""
}
}| Setting | Default | Description |
|---|---|---|
Model |
gpt-4o |
GPT model name |
Voice |
en-US-AvaNeural |
Azure Neural TTS voice |
Locale |
en-US |
Language/locale code |
AvatarCharacter |
lisa |
Avatar character |
AvatarStyle |
casual-sitting |
Avatar style |
📖 See Azure Neural TTS voices for full voice list.
voice-agent-csharp/
├── 📄 azure.yaml # Azure Developer CLI config
├── 📄 Dockerfile # Main app container
├── 📄 voice-agent-csharp.sln # Solution file
│
├── 📁 aspire/ # .NET Aspire orchestration
│ ├── voice-agent-csharp.AppHost/
│ │ └── AppHost.cs # Aspire entry point
│ └── voice-agent-csharp.ServiceDefaults/
│ └── Extensions.cs # Shared defaults (OpenTelemetry, health)
│
├── 📁 src/ # Main application
│ ├── Program.cs # Entry point
│ ├── Features/
│ │ ├── IncomingCall/ # ACS call handling
│ │ ├── Shared/ # Common components (VoiceSessionBase, MCP)
│ │ ├── VoiceAgent/ # Foundry agent integration
│ │ ├── VoiceAssistant/ # Direct model sessions
│ │ └── VoiceAvatar/ # Avatar with WebRTC
│ ├── Pages/ # Razor Pages UI
│ └── wwwroot/ # Static assets
│
├── 📁 mcp/ # MCP Tools Server
│ ├── Program.cs
│ └── Tools/
│ ├── DateTimeTools.cs
│ └── WeatherTools.cs
│
└── 📁 infra/ # Infrastructure as Code (Bicep)
├── main.bicep
└── modules/
| Issue | Solution |
|---|---|
| ACS webhook validation fails | Ensure endpoint ends with /acs/incomingcall |
| No audio in browser | Check microphone permissions and HTTPS |
| MCP tools not working | Verify MCP server is running and URL is correct |
| Avatar not rendering | Check WebRTC/ICE connectivity |
| Dashboard not opening | Verify launchBrowser: true in launchSettings.json |
# Remove all Azure resources
azd down
# Redeploy to different region
Remove-Item -Recurse -Force .azure
azd up| Resource | Link |
|---|---|
| Voice Live API Overview | 📖 Learn |
| Azure Speech Services | 📖 Learn |
| ACS Call Automation | 📖 Learn |
| Model Context Protocol | 📖 Docs |
This project welcomes contributions! Please read our Contributing Guide for details.
This project is licensed under the MIT License - see LICENSE for details.
Trademarks: This project may contain trademarks or logos subject to Microsoft's Trademark & Brand Guidelines.



