WebRTC Voice Agent with Alex

A modern, full-stack voice agent system featuring real-time WebRTC communication, intelligent conversation routing, and high-quality text-to-speech. Meet Alex, your AI voice assistant powered by LangGraph, LiveKit, FastAPI, and ElevenLabs.

Features

Voice Interaction

Real-time WebRTC: Low-latency voice communication through the browser
Speech Recognition: Automatic speech-to-text conversion
ElevenLabs TTS: Premium text-to-speech with natural voice output
Audio Visualization: Dynamic audio level indicators with dark mode

Intelligent Agent (Alex)

LangGraph Integration: Sophisticated conversation flow management
Weather Assistance: Dedicated weather information and advice capabilities
General Conversation: Helpful responses for various topics and questions
Smart Routing: Automatic classification and routing of user queries
Centralized Prompts: Consistent personality and capabilities across all interactions

Modern Architecture

FastAPI Backend: High-performance Python API server
Next.js Frontend: Modern React-based user interface with TypeScript
LiveKit Cloud: Scalable WebRTC signaling and media relay
Automated UX: Streamlined room creation and connection flow

Quick Start

Prerequisites

Python 3.11+ (for LangGraph compatibility)
Node.js 18+ (for Next.js frontend)
LiveKit Cloud account (or local LiveKit server)
API Keys: OpenAI, ElevenLabs

Backend Setup (Python)

1. Install Dependencies

cd python
pip install -e .
# or using poetry
poetry install

2. Environment Configuration

Create a .env file in the python/ directory:

# LiveKit Configuration
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud

# AI Services
OPENAI_API_KEY=your_openai_api_key
ELEVEN_API_KEY=your_elevenlabs_api_key

# Optional: LangSmith Tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=LanggraphLivekit
LANGSMITH_TRACING=true

3. Start Backend Services

Option A: Using Make (Recommended)

cd python

# Start LangGraph agent server
make start-agent

# In another terminal: Start pipeline worker
make start-pipeline

# In another terminal: Start FastAPI server
make start-webrtc

Option B: Manual Start

# Terminal 1: LangGraph Server
cd python
langgraph dev --no-browser

# Terminal 2: Pipeline Worker
cd python
python llm/pipeline.py dev

# Terminal 3: FastAPI Server
cd python
python webrtc_server.py

Frontend Setup (Next.js)

1. Install Dependencies

cd frontend
pnpm install

2. Start Development Server

pnpm run dev

Access the frontend at http://localhost:3000.

Service Architecture

Frontend: http://localhost:3000 (Next.js UI)
FastAPI Server: http://localhost:8000 (Room management, tokens)
LangGraph Server: http://localhost:2024 (Agent logic)
Pipeline Worker: Background voice processing
LiveKit: WebRTC signaling (via cloud)

Usage

Web Interface

Open the app in your browser at http://localhost:3000
Click the microphone button to connect
When active, speak into your microphone
Audio bars show input levels
Alex will respond via voice

Example Conversations

Try asking:

"Hi Alex, what's your name?"
"What can you help me with today?"
"What's the weather like in New York?"
"Tell me about the weather patterns in summer"
"Can you give me some general advice?"

UI Features

Audio visualization with responsive bars
Dark mode with gradient styling
Connection status display
Responsive design for desktop and mobile
Automated room creation and joining

API Endpoints

FastAPI backend (http://localhost:8000) provides:

Create Room

POST /create-room
Content-Type: application/json

{
  "room_name": "voice-chat-abc123"
}

Get Join Token

POST /token
Content-Type: application/json

{
  "room_name": "voice-chat-abc123",
  "participant_name": "user-xyz789"
}

Start Agent

POST /start-agent
Content-Type: application/json

{
  "room_name": "voice-chat-abc123"
}

Note: The frontend handles these calls automatically.

Architecture Overview

Components

Frontend (Next.js + TypeScript):

VoiceAgentClean.tsx: Main component
LiveKit client for audio
Tailwind CSS for styling
Audio context for real-time visualization

Backend (Python):

webrtc_server.py: FastAPI server
llm/pipeline.py: Voice processing worker
llm/agent.py: LangGraph agent logic
llm/prompt.py: Centralized prompts
langgraph_livekit_agents/: Custom LangGraph adapters

Voice Processing Flow

User → LiveKit (STT) → LangGraph Agent → ElevenLabs (TTS) → User

User speaks into the browser
LiveKit captures and transcribes speech
LangGraph routes and processes the query
ElevenLabs generates voice output
LiveKit streams response audio back to the user

Alex's Agent Logic

LangGraph State Machine:

Supervisor → Weather Node (weather queries)
           → Other Node (general conversation)

Capabilities:

Weather advice and suggestions
General chat and help
Automatic routing based on intent
Unified, consistent personality through prompts

Troubleshooting

Connection or Token Errors

Ensure FastAPI server is running
Check LiveKit credentials in .env
Ensure WebRTC is not blocked by a firewall

Agent Not Responding

Check that the pipeline worker is running
Verify LangGraph server is active on port 2024
Confirm your OpenAI and ElevenLabs keys are set

Audio Issues

Check browser microphone permissions
Make sure speakers/headphones are working
Try switching to Chrome or Edge

Import Errors

Use Python 3.11+
Ensure dependencies are installed correctly
Activate your virtual environment

Debugging Tips

Check terminal logs for backend errors
Use browser console to catch frontend issues
Manually test API endpoints using curl or Postman
Make sure all required services are running

What's Next

Potential Enhancements

Real weather API integration
Additional agent tools (calendar, notes, etc.)
Support for multiple voices
Multi-language support
Mobile app with voice chat
Voice command/wake word support

Customization

Voice Selection

Set a specific ElevenLabs voice in VoiceAssistant:

tts = elevenlabs.TTS(voice_id="YourVoiceID")

Agent Instructions

Modify Alex's behavior by updating the instructions in VoiceAssistant:

super().__init__(
    instructions="""Your custom instructions here...""",
    # ...
)

STT Provider

To switch to Deepgram:

from livekit.plugins import deepgram
stt = deepgram.STT()

Development

FastAPI runs with auto-reload
Code changes automatically restart the server

To test functionality, open: http://localhost:8000/static/index.html

Production Deployment

Steps for production setup:

Use a production LiveKit server
Configure proper CORS settings
Separate environment configs
Set up logging and monitoring
Use a WSGI server like Gunicorn

gunicorn webrtc_server:app -w 4 -k uvicorn.workers.UvicornWorker

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

byamasu-patrick/livekit-langraph-starterkit

Folders and files

Latest commit

History

Repository files navigation

WebRTC Voice Agent with Alex

Features

Voice Interaction

Intelligent Agent (Alex)

Modern Architecture

Quick Start

Prerequisites

Backend Setup (Python)

1. Install Dependencies

2. Environment Configuration

3. Start Backend Services

Frontend Setup (Next.js)

1. Install Dependencies

2. Start Development Server

Service Architecture

Usage

Web Interface

Example Conversations

UI Features

API Endpoints

Create Room

Get Join Token

Start Agent

Architecture Overview

Components

Voice Processing Flow

Alex's Agent Logic

Troubleshooting

Connection or Token Errors

Agent Not Responding

Audio Issues

Import Errors

Debugging Tips

What's Next

Potential Enhancements

Customization

Voice Selection

Agent Instructions

STT Provider

Development

Production Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages