Multi-tenant voice AI platform with real-time conversation capabilities
Live Demo: https://voice-agent.cameronobrien.dev
A production-ready voice AI platform deployed on Fly.io that provides intelligent phone conversations using Twilio Media Streams, speech-to-text (Deepgram), AI reasoning (Groq), and text-to-speech (Cartesia). Features auto-scaling, multi-tenant architecture, and webhook integrations.
Try the live demo at https://voice-agent.cameronobrien.dev
Click to reveal the demo phone number and call to experience the AI voice agent in action. The agent demonstrates natural conversation, information collection, and real-time responses.
- Node.js + Express - Web server and API
- WebSockets (ws) - Real-time bidirectional communication
- Deepgram SDK - Speech-to-text transcription
- Groq SDK - Fast LLM inference (Llama models)
- Google Gemini AI - Alternative LLM with auto-fallback
- Cartesia - Ultra-low latency text-to-speech
- Neon PostgreSQL - Multi-tenant data storage
- Fly.io - Serverless deployment with auto-scaling
- Docker - Containerized deployment
- Real-Time Voice Conversations - Low-latency voice AI interactions
- Multi-Tenant Architecture - Support multiple clients with isolated data
- Auto-Scaling - Fly.io machines spin up on demand (0→1 scaling)
- AI Provider Auto-Switching - Automatic fallback between Groq and Gemini
- Speech-to-Text - Deepgram real-time transcription
- Text-to-Speech - Cartesia ultra-fast voice synthesis
- WebSocket Streaming - Real-time audio streaming
- Webhook Integration - Push call data to external systems
- Metrics API - Track usage and performance
- Custom Prompts - Configurable AI agent personalities
- Database Persistence - Store conversations and metadata
POST /voice/stream- WebSocket endpoint for voice streamingPOST /voice/conversation- Start new voice conversation
GET /metrics- Get platform usage metrics (requires API key)GET /health- Health check endpoint
- Outbound webhook to
WEBHOOK_URLon call completion
- Node.js 18+
- Fly.io account and CLI installed
- API keys for:
- Deepgram (speech-to-text)
- Groq (LLM)
- Google Gemini (LLM backup)
- Cartesia (text-to-speech)
- Neon database
# Install dependencies
npm install
# Configure environment
cp .env.example .env
# Fill in all API keys
# Run database extensions
psql $DATABASE_URL < db-schema-extensions.sql
# Start development server
npm run devServer runs on http://localhost:8080
npm start- Start production servernpm run dev- Start development server with hot reloadnpm test- Run tests (not yet implemented)
See .env.example for required configuration:
PORT- Server port (default: 8080)NODE_ENV- Environment (development/production)
FLY_STREAM_URL- Your WebSocket stream URL (e.g.,wss://your-app.fly.dev/stream)CORS_ORIGINS- Comma-separated list of allowed origins for admin API
DATABASE_URL- Neon PostgreSQL connection string
DEEPGRAM_API_KEY- Deepgram API key for speech-to-textGROQ_API_KEY- Groq API key for LLM (primary)GOOGLE_API_KEY- Google Gemini API key (fallback)CARTESIA_API_KEY- Cartesia API key for text-to-speech
WEBHOOK_URL- External webhook URL for call dataWEBHOOK_SECRET- Secret for webhook verification
ADMIN_API_KEY- API key for accessing admin endpoints
LLM_PROVIDER- Force LLM provider (auto, groq, or gemini)BLOCKED_NUMBER- Phone number to block from connecting
├── src/ # Source code
│ ├── api/ # API routes
│ ├── db/ # Database client and queries
│ ├── prompts/ # AI agent prompts
│ ├── services/ # Core services
│ │ ├── deepgram.js # Speech-to-text
│ │ ├── groq.js # Groq LLM
│ │ ├── gemini.js # Gemini LLM
│ │ └── cartesia.js # Text-to-speech
│ ├── utils/ # Utilities
│ └── server.js # Express server
├── scripts/ # Deployment scripts
├── public/ # Static assets
├── db-schema-extensions.sql # Database setup
├── Dockerfile # Container definition
├── fly.toml # Fly.io configuration
└── package.json # Dependencies
# Install Fly.io CLI
curl -L https://fly.io/install.sh | sh
# Login to Fly.io
fly auth login
# Launch app (creates fly.toml with your app name)
fly launch
# Set secrets (do NOT put in fly.toml)
fly secrets set DATABASE_URL="postgresql://..."
fly secrets set DEEPGRAM_API_KEY="..."
fly secrets set GROQ_API_KEY="..."
fly secrets set GOOGLE_API_KEY="..."
fly secrets set CARTESIA_API_KEY="..."
fly secrets set FLY_STREAM_URL="wss://your-app-name.fly.dev/stream"
fly secrets set CORS_ORIGINS="https://your-domain.com"
fly secrets set ADMIN_API_KEY="your-random-32-char-hex"
fly secrets set WEBHOOK_URL="https://your-domain.com/api/webhooks/call-data"
fly secrets set WEBHOOK_SECRET="your-webhook-secret"
# Deploy
fly deployWhen deploying your own instance, you'll need to customize:
-
App Name: Run
fly launchto generate your own app name, or editfly.toml:app = 'your-app-name'
-
Stream URL: Set
FLY_STREAM_URLto your app's WebSocket endpoint:fly secrets set FLY_STREAM_URL="wss://your-app-name.fly.dev/stream"
-
CORS Origins: Set allowed origins for your admin dashboard:
fly secrets set CORS_ORIGINS="https://your-admin-domain.com,https://your-app.com"
-
Twilio Webhook: Point your Twilio number's webhook to:
https://your-app-name.fly.dev/api/twilio/router
# Deploy new version
fly deploy
# View logs
fly logs
# Check status
fly status
# Scale machines
fly scale count 1 # or 0 for auto-scalingThe fly.toml configures:
- Auto-Scaling: Machines spin down to 0 when idle, start on first request
- Region:
sjc(San Jose, California) - adjust for your users - Resources: 1GB RAM, 1 shared CPU
- HTTPS: Force HTTPS on all requests
- Port: Internal port 8080
Each client/tenant has:
- Isolated database records
- Custom AI prompts
- Separate webhook endpoints
- Individual usage tracking
Tenant identification via:
- API key authentication
- Request headers
- Database tenant_id column
- Client connects via WebSocket
- Audio streaming - Client sends audio chunks
- Speech-to-Text - Deepgram transcribes in real-time
- AI Processing - Groq/Gemini generates response
- Text-to-Speech - Cartesia synthesizes voice
- Audio streaming - Server sends audio chunks back
- Webhook - Call data posted to WEBHOOK_URL
Set LLM_PROVIDER=auto to automatically failover:
- Try Groq (fastest, lowest latency)
- If Groq fails/rate-limited → switch to Gemini
- Log provider switches for monitoring
Run database extensions:
psql $DATABASE_URL < db-schema-extensions.sqlCreates:
conversationstable - Store conversation historytenantstable - Multi-tenant configurationcall_logstable - Call metadata and metrics- Database extensions for performance
curl https://your-app.fly.dev/healthcurl -H "Authorization: Bearer YOUR_METRICS_API_KEY" \
https://your-app.fly.dev/metricsReturns:
- Total calls
- Average call duration
- AI provider usage
- Error rates
- Latency metrics
fly logsOn call completion, sends POST to WEBHOOK_URL:
{
"call_id": "uuid",
"duration": 120,
"transcript": "...",
"ai_provider": "groq",
"timestamp": "2025-11-18T12:00:00Z",
"metadata": { ... }
}Includes HMAC signature for verification using WEBHOOK_SECRET.
Use WebSocket client to test:
const ws = new WebSocket('ws://localhost:8080/voice/stream');
ws.onopen = () => {
// Send audio chunks
ws.send(audioBuffer);
};
ws.onmessage = (event) => {
// Receive synthesized audio
const audio = event.data;
};Edit src/prompts/ to customize AI agent personality:
- System prompt
- Conversation context
- Response style
- Domain knowledge
- Response Latency: ~600-800ms (user stops speaking → AI audio starts)
- Auto-Scaling: 0→1 in <5 seconds
- Concurrent Calls: Unlimited (scales horizontally)
- Uptime: 99.9% (Fly.io SLA)
Latency Breakdown (measured in production):
- Speech-to-Text (Deepgram): ~50-100ms
- LLM Inference (Groq): ~400-500ms
- Text-to-Speech TTFB (Cartesia): ~100-300ms
- Network overhead: ~50-100ms
- HTTPS Only - Force HTTPS on all requests
- API Key Auth - Metrics endpoint requires authentication
- Webhook Signing - HMAC verification for webhooks
- Database Isolation - Multi-tenant data separation
- Secret Management - Fly.io secrets (not in code)
Approximate Fly.io costs (auto-scaling):
- Idle: $0/month (0 machines running)
- Active: ~$0.02/hour per machine (1GB RAM)
- Egress: $0.02/GB
Voice AI API costs:
- Deepgram: ~$0.0043/minute
- Groq: Free tier available
- Gemini: Pay-per-use pricing
- Cartesia: ~$0.05/1K characters
- Check Fly.io logs:
fly logs - Verify all secrets are set:
fly secrets list - Check database connectivity
- Verify WebSocket connection
- Check Deepgram API key
- Ensure audio format is compatible
- Switch to Groq (faster than Gemini)
- Move Fly.io region closer to users
- Optimize prompts for shorter responses
MIT