Production-ready voice agent deployed on AWS with comprehensive observability via LangSmith and OpenTelemetry.
The demonstration above shows real-time tracing of voice agent interactions in LangSmith, capturing STT transcriptions, LLM inference, TTS synthesis, and conversation turn metrics.
- Overview
- Architecture
- Project Structure
- Technology Stack
- Prerequisites
- Local Development
- AWS Deployment
- LangSmith Tracing
- API Reference
- Configuration
- Troubleshooting
- Contributing
- License
This project implements a real-time voice agent built with Pipecat that functions as an AWS instructor. Users engage in natural voice conversations to learn about AWS services, prepare for certifications, and receive hands-on guidance.
The system is designed for production deployment on AWS infrastructure using Terraform, with a React frontend for user interaction and comprehensive observability through LangSmith tracing. The voice agent uses SmallWebRTCTransport for peer-to-peer WebRTC connections, eliminating dependency on third-party WebRTC providers.
- Real-time voice interaction with sub-second latency
- AWS Bedrock Claude 3.5 Haiku for natural language understanding
- OpenAI Whisper for speech-to-text transcription
- OpenAI TTS for text-to-speech synthesis
- Native WebRTC transport using Pipecat SmallWebRTCTransport
- AWS Cognito for user authentication
- ECS Fargate for serverless container deployment
- S3 and CloudFront for frontend hosting
- LangSmith integration for end-to-end tracing and observability
AWS Cloud
+--------------------------------------------------------------------------------+
| |
| +------------------+ +------------------+ +------------------------+ |
| | CloudFront | | API Gateway | | Cognito User Pool | |
| | + S3 Bucket | | (HTTP API) | | (Authentication) | |
| | (Frontend) | | | | | |
| +--------+---------+ +--------+---------+ +-----------+------------+ |
| | | | |
| | v | |
| | +------------------+ | |
| | | Application | | |
| | | Load Balancer | | |
| | +--------+---------+ | |
| | | | |
| | v | |
| | +----------------------------------+ | |
| | | ECS Fargate | | |
| | | +----------------------------+ | | |
| +-------------->| | Voice Agent Container | | | |
| | | - FastAPI Server |<+-+ |
| | | - WebRTC Signaling | | |
| | | - Pipecat Pipeline | | |
| | | - LangSmith Tracing | | |
| | +----------------------------+ | |
| +----------------------------------+ |
| | |
| v |
| +----------------------------------+ |
| | External Services | |
| | - AWS Bedrock (Claude) | |
| | - OpenAI (STT/TTS) | |
| | - LangSmith (Tracing) | |
| +----------------------------------+ |
+--------------------------------------------------------------------------------+
+-------------+ +-------------+ +-------------+ +-------------+
| WebRTC | | OpenAI | | AWS | | OpenAI |
| Input | --> | Whisper | --> | Bedrock | --> | TTS |
| (Audio) | | (STT) | | Claude | | |
+-------------+ +-------------+ +-------------+ +-------------+
| | |
v v v
+--------------------------------------------------+
| LangSmith OpenTelemetry |
| Tracing |
+--------------------------------------------------+
voice-agent-tracing/
├── bot.py # Legacy Daily.co voice agent
├── bot_webrtc.py # Voice agent with SmallWebRTCTransport
├── server.py # Legacy Daily.co server
├── server_webrtc.py # FastAPI server with WebRTC signaling
├── tracing_observer.py # Custom LangSmith tracing observer
├── requirements.txt # Python dependencies
├── Dockerfile # Backend container definition
├── docker-compose.yml # Local development orchestration
├── deploy.sh # Deployment automation script
│
├── frontend/ # React frontend application
│ ├── src/
│ │ ├── App.tsx # Main application with routing
│ │ ├── components/
│ │ │ ├── auth/
│ │ │ │ └── AuthLogin.tsx # Authentication forms
│ │ │ └── chat/
│ │ │ └── VoiceAgent.tsx # WebRTC voice client
│ │ └── pages/
│ │ └── ChatPage.tsx # Chat interface
│ ├── package.json # Node.js dependencies
│ ├── vite.config.ts # Vite configuration
│ └── Dockerfile # Frontend container definition
│
└── infrastructure/ # Terraform IaC
├── main.tf # Provider configuration
├── variables.tf # Input variables
├── outputs.tf # Output values
├── cognito.tf # Cognito User Pool resources
├── networking.tf # VPC, subnets, security groups, ALB
├── ecs.tf # ECS cluster, task definition, service
├── iam.tf # IAM roles and policies
├── secrets.tf # Secrets Manager configuration
├── frontend.tf # S3 bucket, CloudFront distribution
└── terraform.tfvars # Variable values
| Component | Technology | Purpose |
|---|---|---|
| LLM | AWS Bedrock Claude 3.5 Haiku | Natural language understanding and response generation |
| STT | OpenAI Whisper API | Speech-to-text transcription |
| TTS | OpenAI TTS | Text-to-speech synthesis |
| Transport | Pipecat SmallWebRTCTransport | Peer-to-peer WebRTC connections |
| Backend Framework | FastAPI | HTTP API and WebRTC signaling server |
| Frontend Framework | React + TypeScript | User interface |
| Styling | Tailwind CSS | UI styling |
| Build Tool | Vite | Frontend build and development |
| Authentication | AWS Cognito | User authentication and authorization |
| Container Orchestration | AWS ECS Fargate | Serverless container deployment |
| CDN | AWS CloudFront | Frontend content delivery |
| Storage | AWS S3 | Static asset hosting |
| Secrets | AWS Secrets Manager | API key management |
| IaC | Terraform | Infrastructure provisioning |
| Tracing | LangSmith + OpenTelemetry | Observability and monitoring |
- Python 3.11 or higher
- Node.js 20 or higher
- Docker and Docker Compose
- Terraform 1.0 or higher
- AWS CLI v2 configured with appropriate credentials
| Service | Purpose | Obtain From |
|---|---|---|
| OpenAI | STT (Whisper) and TTS | https://platform.openai.com |
| AWS | Bedrock Claude access | AWS Console with Bedrock model access enabled |
| LangSmith | Tracing and observability | https://smith.langchain.com |
The AWS credentials used for deployment require the following permissions:
- EC2, VPC, and networking resources
- ECS cluster and service management
- ECR repository management
- Cognito User Pool management
- S3 bucket management
- CloudFront distribution management
- Secrets Manager access
- IAM role and policy management
- Bedrock model invocation
# Clone the repository
git clone https://github.com/ihatesea69/monitoring-voice-agent-langsmith-aws.git
cd monitoring-voice-agent-langsmith-aws
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your API keys
# Start the backend server
python server_webrtc.pyThe backend server will start on http://localhost:7860.
cd frontend
# Install dependencies
npm install
# Start development server
npm run devThe frontend development server will start on http://localhost:5173 and automatically proxy API requests to the backend.
For running both services together:
docker-compose upThis starts the backend on port 7860 and frontend on port 5173.
Edit infrastructure/terraform.tfvars:
project_name = "voice-agent"
environment = "dev"
aws_region = "us-east-1"
ecs_cpu = 512
ecs_memory = 1024cd infrastructure
# Initialize Terraform
terraform init
# Preview changes
terraform plan
# Apply infrastructure
terraform applyTerraform creates the following resources:
- VPC with two public subnets
- Internet Gateway and route tables
- Security groups for ALB and ECS tasks
- Application Load Balancer
- ECS Cluster with Fargate capacity provider
- ECR repository for container images
- Cognito User Pool and App Client
- S3 bucket for frontend assets
- CloudFront distribution
- Secrets Manager secrets for API keys
- IAM roles with Bedrock access
# Store OpenAI API key
aws secretsmanager put-secret-value \
--secret-id voice-agent-dev-openai-api-key \
--secret-string "sk-your-openai-api-key"
# Store LangSmith API key
aws secretsmanager put-secret-value \
--secret-id voice-agent-dev-langsmith-api-key \
--secret-string "lsv2_your-langsmith-api-key"# Get ECR login command from Terraform output
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
$(terraform output -raw ecr_repository_url | cut -d'/' -f1)
# Build the image
docker build -t $(terraform output -raw ecr_repository_url):latest .
# Push to ECR
docker push $(terraform output -raw ecr_repository_url):latest
# Force ECS service update
aws ecs update-service \
--cluster voice-agent-dev-cluster \
--service voice-agent-dev-backend \
--force-new-deploymentcd frontend
# Create production environment file
cat > .env << EOF
VITE_API_URL=$(cd ../infrastructure && terraform output -raw backend_url)
VITE_COGNITO_USER_POOL_ID=$(cd ../infrastructure && terraform output -raw cognito_user_pool_id)
VITE_COGNITO_CLIENT_ID=$(cd ../infrastructure && terraform output -raw cognito_user_pool_client_id)
VITE_COGNITO_REGION=us-east-1
EOF
# Build production bundle
npm run build
# Sync to S3
aws s3 sync dist/ s3://$(cd ../infrastructure && terraform output -raw frontend_bucket)/ --delete
# Invalidate CloudFront cache (optional)
aws cloudfront create-invalidation \
--distribution-id $(aws cloudfront list-distributions --query "DistributionList.Items[0].Id" --output text) \
--paths "/*"POOL_ID=$(cd infrastructure && terraform output -raw cognito_user_pool_id)
# Create user
aws cognito-idp admin-create-user \
--user-pool-id $POOL_ID \
--username test@example.com \
--user-attributes Name=email,Value=test@example.com \
--message-action SUPPRESS
# Set permanent password
aws cognito-idp admin-set-user-password \
--user-pool-id $POOL_ID \
--username test@example.com \
--password "SecurePassword123!" \
--permanentThe system implements comprehensive tracing using OpenTelemetry exported to LangSmith.
| Span Name | Type | Captured Attributes |
|---|---|---|
| stt_transcription | LLM | transcript, word_count, character_count |
| llm_generation | LLM | prompt, completion, token usage, latency |
| tts_synthesis | LLM | input_text, character_count, voice, latency |
| conversation_turn | Chain | user_message, assistant_response, total_latency |
| voice_agent_session | Chain | conversation_id, session duration |
- Navigate to https://smith.langchain.com
- Select the project specified in
LANGSMITH_PROJECTenvironment variable - Traces appear in real-time as conversations occur
For each conversation turn, the following data is captured:
- STT transcript and processing latency
- LLM prompt messages and completion response
- Estimated token usage (prompt and completion tokens)
- TTS input text and synthesis latency
- End-to-end turn latency
GET /health
Response:
{
"status": "healthy",
"service": "voice-agent"
}POST /offer
Content-Type: application/json
{
"sdp": "<SDP offer string>",
"type": "offer"
}
Response:
{
"sdp": "<SDP answer string>",
"type": "answer"
}GET /api/status
Response:
{
"active_connections": 1,
"tracing_enabled": true,
"region": "us-east-1"
}| Variable | Required | Default | Description |
|---|---|---|---|
| OPENAI_API_KEY | Yes | - | OpenAI API key for STT and TTS |
| AWS_REGION | No | us-east-1 | AWS region for Bedrock |
| AWS_ACCESS_KEY_ID | No* | - | AWS access key (uses IAM role on ECS) |
| AWS_SECRET_ACCESS_KEY | No* | - | AWS secret key (uses IAM role on ECS) |
| LANGSMITH_API_KEY | No | - | LangSmith API key for tracing |
| LANGSMITH_PROJECT | No | aws-voice-agent | LangSmith project name |
| HOST | No | 0.0.0.0 | Server bind address |
| PORT | No | 7860 | Server port |
| COGNITO_USER_POOL_ID | No | - | Cognito pool ID for auth |
| COGNITO_CLIENT_ID | No | - | Cognito client ID |
*Required for local development, not needed when running on ECS with IAM role.
| Variable | Required | Description |
|---|---|---|
| VITE_API_URL | No | Backend API URL (empty for same-origin) |
| VITE_COGNITO_USER_POOL_ID | No | Cognito User Pool ID |
| VITE_COGNITO_CLIENT_ID | No | Cognito App Client ID |
| VITE_COGNITO_REGION | No | Cognito region |
If WebRTC connections fail to establish:
- Verify the backend is accessible from the browser
- Check browser console for WebRTC errors
- For production deployments across networks, configure STUN/TURN servers
- Verify
LANGSMITH_API_KEYis set correctly - Check that the API key has write access to the project
- Review backend logs for OpenTelemetry export errors
- Check CloudWatch logs at
/ecs/voice-agent-dev-backend - Verify secrets are stored in Secrets Manager
- Ensure the Docker image was pushed successfully to ECR
- Verify the ECS task role has Bedrock permissions
- Ensure Claude model access is enabled in AWS Bedrock console
- Confirm the model ID matches an available model in the region
Contributions are welcome. To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit changes (
git commit -m 'Add improvement') - Push to branch (
git push origin feature/improvement) - Open a Pull Request
Distributed under the MIT License. See LICENSE for details.
Project Link: https://github.com/ihatesea69/monitoring-voice-agent-langsmith-aws
- Pipecat - Voice agent framework
- LangSmith - LLM observability platform
- AWS Bedrock - Managed LLM service
- OpenAI - Whisper and TTS services
- Terraform - Infrastructure as code
