A real-time voice assistant powered by LiveKit, with switchable STT providers (Deepgram or Groq Whisper), Groq LLM, and ElevenLabs TTS.
VOICE PIPELINE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β User βββββΆβ LiveKit βββββΆβ Python Agent β β
β β Browser ββββββ Cloud ββββββ (OSA Worker) β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β β β
β β βββββββββββββββββββββββββββ β
β β β β
β β βΌ β
β β ββββββββββββββ β
β ββββββΆβ Go Backend β β
β ββββββββββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ βΌ βΌ β
β ββββββββ ββββββββββββ βββββββββββββ β
β β Groq β β Deepgram β β ElevenLabsβ β
β β LLM β β STT β β TTS β β
β ββββββββ ββββββββββββ βββββββββββββ β
β OR β
β ββββββββββββ β
β β Groq β β
β β Whisper β β
β ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Dual STT Support: Switch between Deepgram and Groq Whisper STT in the UI
- Real-time Voice: Sub-second latency voice conversations
- Live Transcripts: See both user and agent transcripts in real-time
- Source Indicator: UI shows which STT provider is active
- Personality: OSA has a warm, enthusiastic personality with emotions
- Auto-cleanup: Rooms automatically close when users disconnect
git clone https://github.com/robertohluna/LiveKitVoiceAgent.git
cd LiveKitVoiceAgent
cp .env.example .env
# Edit .env with your API keys# Terminal 1: Go Backend
cd backend && go run ./cmd/server
# Terminal 2: Python Agents (both)
cd agent
source venv/bin/activate
python agent.py dev &
python agent_groq.py dev &
# Terminal 3: Frontend
cd frontend && npm install && npm run dev- Open http://localhost:5173
- Select STT provider (Deepgram or Groq Whisper)
- Click Connect
- Start talking!
| Feature | Deepgram STT | Groq Whisper |
|---|---|---|
| Agent File | agent.py |
agent_groq.py |
| STT Provider | Deepgram Nova | Groq Whisper |
| LLM Provider | Groq (via Go Backend) | Groq (via Go Backend) |
| Latency | ~200-400ms | ~300-500ms |
| Accuracy | Excellent | Very Good |
| Cost | Pay per minute | Included with Groq |
Both agents use the same:
- LLM: Groq llama-3.3-70b-versatile (via Go Backend)
- TTS: ElevenLabs
- VAD: Silero
When connected, the console shows which STT is active:
[DEEPGRAM] user: Hello there
[DEEPGRAM] agent: Oh that's exciting, it's great to meet you!
or
[GROQ-WHISPER] user: Hello there
[GROQ-WHISPER] agent: Oh that's exciting, it's great to meet you!
# LiveKit (required)
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
# AI Services (required)
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=optional_voice_idLiveKitVoiceAgent/
βββ frontend/ # Svelte frontend
β βββ src/lib/
β β βββ livekit.ts # LiveKit client wrapper
β β βββ components/
β β βββ VoiceAgent.svelte
β βββ package.json
β
βββ backend/ # Go backend
β βββ cmd/server/main.go # Entry point
β βββ internal/
β βββ handler/ # HTTP handlers
β βββ groq/ # Groq API client
β βββ config/ # Environment config
β
βββ agent/ # Python agents
β βββ agent.py # Deepgram STT agent
β βββ agent_groq.py # Groq Whisper STT agent
β βββ requirements.txt
β
βββ docs/ # Documentation
β βββ API.md # API reference
β βββ TROUBLESHOOTING.md # Common issues
β
βββ .env.example # Environment template
βββ README.md
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/api/token |
POST | Get LiveKit room token + dispatch agent |
/api/room/delete |
POST | Delete room (cleanup) |
/api/chat |
POST | Send message to Groq LLM |
POST /api/token
{
"room_name": "voice-abc123",
"participant_name": "user",
"agent_name": "deepgram-agent" // or "groq-agent"
}- Frontend: User selects STT provider from toggle
- Token Request: Frontend sends
agent_nameto backend - Agent Dispatch: Backend dispatches specific agent via LiveKit API
- Agent Filter: Each agent only accepts jobs with its name
- Connection: Only the selected agent joins the room
# In agent.py
async def request_fnc(req: JobRequest):
if req.agent_name != "deepgram-agent":
await req.reject() # Reject if not for us
return
await req.accept()Both agents use a custom GoBackendLLM class that:
- Converts LiveKit chat context to messages
- Calls Go backend
/api/chatendpoint - Sends transcript to frontend via data channel
- Returns response to TTS for speech synthesis
class GoBackendLLM(llm.LLM):
def chat(self, *, chat_ctx, **kwargs):
messages = self._convert_context(chat_ctx)
return GoBackendLLMStream(messages)Agent Response β GoBackendLLM._run() β Callback β publish_data()
β
Frontend receives {"type": "transcript", "role": "agent",
"text": "...", "source": "deepgram"}
- Check agent logs for "registered worker" message
- Verify Go backend is running on :8080
- Check API keys in .env
- Check browser microphone permissions
- Ensure ElevenLabs voice ID is valid
- Check agent logs for TTS errors
- Agents now filter by name, should not happen
- If stuck, restart both agents
- Make sure both agents are running
- Check agent dispatch logs in Go backend
livekit-agents>=1.3.11
livekit-plugins-deepgram
livekit-plugins-groq
livekit-plugins-elevenlabs
livekit-plugins-silero
aiohttp
python-dotenv
github.com/livekit/protocol
github.com/livekit/server-sdk-go
github.com/joho/godotenv
livekit-client
svelte
tailwindcss
MIT