Skip to content

ihatesea69/Monitoring-Voice-Agent-with-LangSmith-Build-on-AWS

Repository files navigation

AWS Voice Agent with LangSmith Tracing

Production-ready voice agent deployed on AWS with comprehensive observability via LangSmith and OpenTelemetry.


Demo

LangSmith Tracing Demo

The demonstration above shows real-time tracing of voice agent interactions in LangSmith, capturing STT transcriptions, LLM inference, TTS synthesis, and conversation turn metrics.


Table of Contents

  1. Overview
  2. Architecture
  3. Project Structure
  4. Technology Stack
  5. Prerequisites
  6. Local Development
  7. AWS Deployment
  8. LangSmith Tracing
  9. API Reference
  10. Configuration
  11. Troubleshooting
  12. Contributing
  13. License

Overview

This project implements a real-time voice agent built with Pipecat that functions as an AWS instructor. Users engage in natural voice conversations to learn about AWS services, prepare for certifications, and receive hands-on guidance.

The system is designed for production deployment on AWS infrastructure using Terraform, with a React frontend for user interaction and comprehensive observability through LangSmith tracing. The voice agent uses SmallWebRTCTransport for peer-to-peer WebRTC connections, eliminating dependency on third-party WebRTC providers.

Key Features

  • Real-time voice interaction with sub-second latency
  • AWS Bedrock Claude 3.5 Haiku for natural language understanding
  • OpenAI Whisper for speech-to-text transcription
  • OpenAI TTS for text-to-speech synthesis
  • Native WebRTC transport using Pipecat SmallWebRTCTransport
  • AWS Cognito for user authentication
  • ECS Fargate for serverless container deployment
  • S3 and CloudFront for frontend hosting
  • LangSmith integration for end-to-end tracing and observability

Architecture

AWS Deployment Architecture

                                 AWS Cloud
+--------------------------------------------------------------------------------+
|                                                                                |
|  +------------------+     +------------------+     +------------------------+  |
|  |   CloudFront     |     |   API Gateway    |     |   Cognito User Pool    |  |
|  |   + S3 Bucket    |     |   (HTTP API)     |     |   (Authentication)     |  |
|  |   (Frontend)     |     |                  |     |                        |  |
|  +--------+---------+     +--------+---------+     +-----------+------------+  |
|           |                        |                           |               |
|           |                        v                           |               |
|           |               +------------------+                 |               |
|           |               |   Application    |                 |               |
|           |               |   Load Balancer  |                 |               |
|           |               +--------+---------+                 |               |
|           |                        |                           |               |
|           |                        v                           |               |
|           |               +----------------------------------+ |               |
|           |               |   ECS Fargate                    | |               |
|           |               |   +----------------------------+ | |               |
|           +-------------->|   |  Voice Agent Container     | | |               |
|                           |   |  - FastAPI Server          |<+-+               |
|                           |   |  - WebRTC Signaling        | |                 |
|                           |   |  - Pipecat Pipeline        | |                 |
|                           |   |  - LangSmith Tracing       | |                 |
|                           |   +----------------------------+ |                 |
|                           +----------------------------------+                 |
|                                        |                                       |
|                                        v                                       |
|                           +----------------------------------+                 |
|                           |   External Services              |                 |
|                           |   - AWS Bedrock (Claude)         |                 |
|                           |   - OpenAI (STT/TTS)             |                 |
|                           |   - LangSmith (Tracing)          |                 |
|                           +----------------------------------+                 |
+--------------------------------------------------------------------------------+

Voice Pipeline Architecture

+-------------+     +-------------+     +-------------+     +-------------+
|   WebRTC    |     |   OpenAI    |     |    AWS      |     |   OpenAI    |
|   Input     | --> |   Whisper   | --> |   Bedrock   | --> |    TTS      |
|   (Audio)   |     |   (STT)     |     |   Claude    |     |             |
+-------------+     +-------------+     +-------------+     +-------------+
                           |                   |                   |
                           v                   v                   v
                    +--------------------------------------------------+
                    |            LangSmith OpenTelemetry               |
                    |                   Tracing                        |
                    +--------------------------------------------------+

Project Structure

voice-agent-tracing/
├── bot.py                     # Legacy Daily.co voice agent
├── bot_webrtc.py              # Voice agent with SmallWebRTCTransport
├── server.py                  # Legacy Daily.co server
├── server_webrtc.py           # FastAPI server with WebRTC signaling
├── tracing_observer.py        # Custom LangSmith tracing observer
├── requirements.txt           # Python dependencies
├── Dockerfile                 # Backend container definition
├── docker-compose.yml         # Local development orchestration
├── deploy.sh                  # Deployment automation script
│
├── frontend/                  # React frontend application
│   ├── src/
│   │   ├── App.tsx            # Main application with routing
│   │   ├── components/
│   │   │   ├── auth/
│   │   │   │   └── AuthLogin.tsx    # Authentication forms
│   │   │   └── chat/
│   │   │       └── VoiceAgent.tsx   # WebRTC voice client
│   │   └── pages/
│   │       └── ChatPage.tsx         # Chat interface
│   ├── package.json           # Node.js dependencies
│   ├── vite.config.ts         # Vite configuration
│   └── Dockerfile             # Frontend container definition
│
└── infrastructure/            # Terraform IaC
    ├── main.tf                # Provider configuration
    ├── variables.tf           # Input variables
    ├── outputs.tf             # Output values
    ├── cognito.tf             # Cognito User Pool resources
    ├── networking.tf          # VPC, subnets, security groups, ALB
    ├── ecs.tf                 # ECS cluster, task definition, service
    ├── iam.tf                 # IAM roles and policies
    ├── secrets.tf             # Secrets Manager configuration
    ├── frontend.tf            # S3 bucket, CloudFront distribution
    └── terraform.tfvars       # Variable values

Technology Stack

Component Technology Purpose
LLM AWS Bedrock Claude 3.5 Haiku Natural language understanding and response generation
STT OpenAI Whisper API Speech-to-text transcription
TTS OpenAI TTS Text-to-speech synthesis
Transport Pipecat SmallWebRTCTransport Peer-to-peer WebRTC connections
Backend Framework FastAPI HTTP API and WebRTC signaling server
Frontend Framework React + TypeScript User interface
Styling Tailwind CSS UI styling
Build Tool Vite Frontend build and development
Authentication AWS Cognito User authentication and authorization
Container Orchestration AWS ECS Fargate Serverless container deployment
CDN AWS CloudFront Frontend content delivery
Storage AWS S3 Static asset hosting
Secrets AWS Secrets Manager API key management
IaC Terraform Infrastructure provisioning
Tracing LangSmith + OpenTelemetry Observability and monitoring

Prerequisites

Required Software

  • Python 3.11 or higher
  • Node.js 20 or higher
  • Docker and Docker Compose
  • Terraform 1.0 or higher
  • AWS CLI v2 configured with appropriate credentials

Required API Keys

Service Purpose Obtain From
OpenAI STT (Whisper) and TTS https://platform.openai.com
AWS Bedrock Claude access AWS Console with Bedrock model access enabled
LangSmith Tracing and observability https://smith.langchain.com

AWS Permissions

The AWS credentials used for deployment require the following permissions:

  • EC2, VPC, and networking resources
  • ECS cluster and service management
  • ECR repository management
  • Cognito User Pool management
  • S3 bucket management
  • CloudFront distribution management
  • Secrets Manager access
  • IAM role and policy management
  • Bedrock model invocation

Local Development

Backend Setup

# Clone the repository
git clone https://github.com/ihatesea69/monitoring-voice-agent-langsmith-aws.git
cd monitoring-voice-agent-langsmith-aws

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env with your API keys

# Start the backend server
python server_webrtc.py

The backend server will start on http://localhost:7860.

Frontend Setup

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

The frontend development server will start on http://localhost:5173 and automatically proxy API requests to the backend.

Docker Compose Development

For running both services together:

docker-compose up

This starts the backend on port 7860 and frontend on port 5173.


AWS Deployment

Step 1: Configure Terraform Variables

Edit infrastructure/terraform.tfvars:

project_name = "voice-agent"
environment  = "dev"
aws_region   = "us-east-1"
ecs_cpu      = 512
ecs_memory   = 1024

Step 2: Initialize and Apply Terraform

cd infrastructure

# Initialize Terraform
terraform init

# Preview changes
terraform plan

# Apply infrastructure
terraform apply

Terraform creates the following resources:

  • VPC with two public subnets
  • Internet Gateway and route tables
  • Security groups for ALB and ECS tasks
  • Application Load Balancer
  • ECS Cluster with Fargate capacity provider
  • ECR repository for container images
  • Cognito User Pool and App Client
  • S3 bucket for frontend assets
  • CloudFront distribution
  • Secrets Manager secrets for API keys
  • IAM roles with Bedrock access

Step 3: Store API Keys in Secrets Manager

# Store OpenAI API key
aws secretsmanager put-secret-value \
    --secret-id voice-agent-dev-openai-api-key \
    --secret-string "sk-your-openai-api-key"

# Store LangSmith API key
aws secretsmanager put-secret-value \
    --secret-id voice-agent-dev-langsmith-api-key \
    --secret-string "lsv2_your-langsmith-api-key"

Step 4: Build and Push Docker Image

# Get ECR login command from Terraform output
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin \
    $(terraform output -raw ecr_repository_url | cut -d'/' -f1)

# Build the image
docker build -t $(terraform output -raw ecr_repository_url):latest .

# Push to ECR
docker push $(terraform output -raw ecr_repository_url):latest

# Force ECS service update
aws ecs update-service \
    --cluster voice-agent-dev-cluster \
    --service voice-agent-dev-backend \
    --force-new-deployment

Step 5: Deploy Frontend

cd frontend

# Create production environment file
cat > .env << EOF
VITE_API_URL=$(cd ../infrastructure && terraform output -raw backend_url)
VITE_COGNITO_USER_POOL_ID=$(cd ../infrastructure && terraform output -raw cognito_user_pool_id)
VITE_COGNITO_CLIENT_ID=$(cd ../infrastructure && terraform output -raw cognito_user_pool_client_id)
VITE_COGNITO_REGION=us-east-1
EOF

# Build production bundle
npm run build

# Sync to S3
aws s3 sync dist/ s3://$(cd ../infrastructure && terraform output -raw frontend_bucket)/ --delete

# Invalidate CloudFront cache (optional)
aws cloudfront create-invalidation \
    --distribution-id $(aws cloudfront list-distributions --query "DistributionList.Items[0].Id" --output text) \
    --paths "/*"

Step 6: Create Test User

POOL_ID=$(cd infrastructure && terraform output -raw cognito_user_pool_id)

# Create user
aws cognito-idp admin-create-user \
    --user-pool-id $POOL_ID \
    --username test@example.com \
    --user-attributes Name=email,Value=test@example.com \
    --message-action SUPPRESS

# Set permanent password
aws cognito-idp admin-set-user-password \
    --user-pool-id $POOL_ID \
    --username test@example.com \
    --password "SecurePassword123!" \
    --permanent

LangSmith Tracing

The system implements comprehensive tracing using OpenTelemetry exported to LangSmith.

Traced Components

Span Name Type Captured Attributes
stt_transcription LLM transcript, word_count, character_count
llm_generation LLM prompt, completion, token usage, latency
tts_synthesis LLM input_text, character_count, voice, latency
conversation_turn Chain user_message, assistant_response, total_latency
voice_agent_session Chain conversation_id, session duration

Viewing Traces

  1. Navigate to https://smith.langchain.com
  2. Select the project specified in LANGSMITH_PROJECT environment variable
  3. Traces appear in real-time as conversations occur

Trace Data Captured

For each conversation turn, the following data is captured:

  • STT transcript and processing latency
  • LLM prompt messages and completion response
  • Estimated token usage (prompt and completion tokens)
  • TTS input text and synthesis latency
  • End-to-end turn latency

API Reference

Health Check

GET /health

Response:

{
  "status": "healthy",
  "service": "voice-agent"
}

WebRTC Signaling

POST /offer
Content-Type: application/json

{
  "sdp": "<SDP offer string>",
  "type": "offer"
}

Response:

{
  "sdp": "<SDP answer string>",
  "type": "answer"
}

Server Status

GET /api/status

Response:

{
  "active_connections": 1,
  "tracing_enabled": true,
  "region": "us-east-1"
}

Configuration

Environment Variables

Variable Required Default Description
OPENAI_API_KEY Yes - OpenAI API key for STT and TTS
AWS_REGION No us-east-1 AWS region for Bedrock
AWS_ACCESS_KEY_ID No* - AWS access key (uses IAM role on ECS)
AWS_SECRET_ACCESS_KEY No* - AWS secret key (uses IAM role on ECS)
LANGSMITH_API_KEY No - LangSmith API key for tracing
LANGSMITH_PROJECT No aws-voice-agent LangSmith project name
HOST No 0.0.0.0 Server bind address
PORT No 7860 Server port
COGNITO_USER_POOL_ID No - Cognito pool ID for auth
COGNITO_CLIENT_ID No - Cognito client ID

*Required for local development, not needed when running on ECS with IAM role.

Frontend Environment Variables

Variable Required Description
VITE_API_URL No Backend API URL (empty for same-origin)
VITE_COGNITO_USER_POOL_ID No Cognito User Pool ID
VITE_COGNITO_CLIENT_ID No Cognito App Client ID
VITE_COGNITO_REGION No Cognito region

Troubleshooting

WebRTC Connection Fails

If WebRTC connections fail to establish:

  1. Verify the backend is accessible from the browser
  2. Check browser console for WebRTC errors
  3. For production deployments across networks, configure STUN/TURN servers

No Traces in LangSmith

  1. Verify LANGSMITH_API_KEY is set correctly
  2. Check that the API key has write access to the project
  3. Review backend logs for OpenTelemetry export errors

ECS Task Fails to Start

  1. Check CloudWatch logs at /ecs/voice-agent-dev-backend
  2. Verify secrets are stored in Secrets Manager
  3. Ensure the Docker image was pushed successfully to ECR

Bedrock Access Denied

  1. Verify the ECS task role has Bedrock permissions
  2. Ensure Claude model access is enabled in AWS Bedrock console
  3. Confirm the model ID matches an available model in the region

Contributing

Contributions are welcome. To contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Commit changes (git commit -m 'Add improvement')
  4. Push to branch (git push origin feature/improvement)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for details.


Contact

Project Link: https://github.com/ihatesea69/monitoring-voice-agent-langsmith-aws


Acknowledgments

About

This repos is about building Voice Agent using Pipecat and build on AWS LLM with Langsmith Monitoring

Topics

Resources

License

Stars

Watchers

Forks

Contributors