A modern, full-stack voice agent system featuring real-time WebRTC communication, intelligent conversation routing, and high-quality text-to-speech. Meet Alex, your AI voice assistant powered by LangGraph, LiveKit, FastAPI, and ElevenLabs.
- Real-time WebRTC: Low-latency voice communication through the browser
- Speech Recognition: Automatic speech-to-text conversion
- ElevenLabs TTS: Premium text-to-speech with natural voice output
- Audio Visualization: Dynamic audio level indicators with dark mode
- LangGraph Integration: Sophisticated conversation flow management
- Weather Assistance: Dedicated weather information and advice capabilities
- General Conversation: Helpful responses for various topics and questions
- Smart Routing: Automatic classification and routing of user queries
- Centralized Prompts: Consistent personality and capabilities across all interactions
- FastAPI Backend: High-performance Python API server
- Next.js Frontend: Modern React-based user interface with TypeScript
- LiveKit Cloud: Scalable WebRTC signaling and media relay
- Automated UX: Streamlined room creation and connection flow
- Python 3.11+ (for LangGraph compatibility)
- Node.js 18+ (for Next.js frontend)
- LiveKit Cloud account (or local LiveKit server)
- API Keys: OpenAI, ElevenLabs
cd python
pip install -e .
# or using poetry
poetry install
Create a .env file in the python/ directory:
# LiveKit Configuration
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud
# AI Services
OPENAI_API_KEY=your_openai_api_key
ELEVEN_API_KEY=your_elevenlabs_api_key
# Optional: LangSmith Tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=LanggraphLivekit
LANGSMITH_TRACING=true
Option A: Using Make (Recommended)
cd python
# Start LangGraph agent server
make start-agent
# In another terminal: Start pipeline worker
make start-pipeline
# In another terminal: Start FastAPI server
make start-webrtc
Option B: Manual Start
# Terminal 1: LangGraph Server
cd python
langgraph dev --no-browser
# Terminal 2: Pipeline Worker
cd python
python llm/pipeline.py dev
# Terminal 3: FastAPI Server
cd python
python webrtc_server.py
cd frontend
pnpm install
pnpm run dev
Access the frontend at http://localhost:3000.
- Frontend:
http://localhost:3000(Next.js UI) - FastAPI Server:
http://localhost:8000(Room management, tokens) - LangGraph Server:
http://localhost:2024(Agent logic) - Pipeline Worker: Background voice processing
- LiveKit: WebRTC signaling (via cloud)
- Open the app in your browser at
http://localhost:3000 - Click the microphone button to connect
- When active, speak into your microphone
- Audio bars show input levels
- Alex will respond via voice
Try asking:
- "Hi Alex, what's your name?"
- "What can you help me with today?"
- "What's the weather like in New York?"
- "Tell me about the weather patterns in summer"
- "Can you give me some general advice?"
- Audio visualization with responsive bars
- Dark mode with gradient styling
- Connection status display
- Responsive design for desktop and mobile
- Automated room creation and joining
FastAPI backend (http://localhost:8000) provides:
POST /create-room
Content-Type: application/json
{
"room_name": "voice-chat-abc123"
}
POST /token
Content-Type: application/json
{
"room_name": "voice-chat-abc123",
"participant_name": "user-xyz789"
}
POST /start-agent
Content-Type: application/json
{
"room_name": "voice-chat-abc123"
}
Note: The frontend handles these calls automatically.
Frontend (Next.js + TypeScript):
VoiceAgentClean.tsx: Main component- LiveKit client for audio
- Tailwind CSS for styling
- Audio context for real-time visualization
Backend (Python):
webrtc_server.py: FastAPI serverllm/pipeline.py: Voice processing workerllm/agent.py: LangGraph agent logicllm/prompt.py: Centralized promptslanggraph_livekit_agents/: Custom LangGraph adapters
User → LiveKit (STT) → LangGraph Agent → ElevenLabs (TTS) → User
- User speaks into the browser
- LiveKit captures and transcribes speech
- LangGraph routes and processes the query
- ElevenLabs generates voice output
- LiveKit streams response audio back to the user
LangGraph State Machine:
Supervisor → Weather Node (weather queries)
→ Other Node (general conversation)
Capabilities:
- Weather advice and suggestions
- General chat and help
- Automatic routing based on intent
- Unified, consistent personality through prompts
- Ensure FastAPI server is running
- Check LiveKit credentials in
.env - Ensure WebRTC is not blocked by a firewall
- Check that the pipeline worker is running
- Verify LangGraph server is active on port 2024
- Confirm your OpenAI and ElevenLabs keys are set
- Check browser microphone permissions
- Make sure speakers/headphones are working
- Try switching to Chrome or Edge
- Use Python 3.11+
- Ensure dependencies are installed correctly
- Activate your virtual environment
- Check terminal logs for backend errors
- Use browser console to catch frontend issues
- Manually test API endpoints using
curlor Postman - Make sure all required services are running
- Real weather API integration
- Additional agent tools (calendar, notes, etc.)
- Support for multiple voices
- Multi-language support
- Mobile app with voice chat
- Voice command/wake word support
Set a specific ElevenLabs voice in VoiceAssistant:
tts = elevenlabs.TTS(voice_id="YourVoiceID")
Modify Alex's behavior by updating the instructions in VoiceAssistant:
super().__init__(
instructions="""Your custom instructions here...""",
# ...
)
To switch to Deepgram:
from livekit.plugins import deepgram
stt = deepgram.STT()
- FastAPI runs with auto-reload
- Code changes automatically restart the server
To test functionality, open:
http://localhost:8000/static/index.html
Steps for production setup:
- Use a production LiveKit server
- Configure proper CORS settings
- Separate environment configs
- Set up logging and monitoring
- Use a WSGI server like Gunicorn
gunicorn webrtc_server:app -w 4 -k uvicorn.workers.UvicornWorker