Skip to content

Miosa-osa/LiveKitVoiceAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OSA - Open Speech Assistant

A real-time voice assistant powered by LiveKit, with switchable STT providers (Deepgram or Groq Whisper), Groq LLM, and ElevenLabs TTS.

Architecture

                           VOICE PIPELINE
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                                                             β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
    β”‚  β”‚  User    │───▢│   LiveKit    │───▢│  Python Agent   β”‚   β”‚
    β”‚  β”‚ Browser  │◀───│    Cloud     │◀───│  (OSA Worker)   β”‚   β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
    β”‚       β”‚                                      β”‚              β”‚
    β”‚       β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
    β”‚       β”‚            β”‚                                        β”‚
    β”‚       β”‚            β–Ό                                        β”‚
    β”‚       β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                  β”‚
    β”‚       └────▢│ Go Backend β”‚                                  β”‚
    β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β”‚
    β”‚                    β”‚                                        β”‚
    β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
    β”‚     β–Ό              β–Ό              β–Ό                         β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
    β”‚  β”‚ Groq β”‚    β”‚ Deepgram β”‚   β”‚ ElevenLabsβ”‚                   β”‚
    β”‚  β”‚ LLM  β”‚    β”‚   STT    β”‚   β”‚    TTS    β”‚                   β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
    β”‚                  OR                                         β”‚
    β”‚             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                    β”‚
    β”‚             β”‚  Groq    β”‚                                    β”‚
    β”‚             β”‚ Whisper  β”‚                                    β”‚
    β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features

  • Dual STT Support: Switch between Deepgram and Groq Whisper STT in the UI
  • Real-time Voice: Sub-second latency voice conversations
  • Live Transcripts: See both user and agent transcripts in real-time
  • Source Indicator: UI shows which STT provider is active
  • Personality: OSA has a warm, enthusiastic personality with emotions
  • Auto-cleanup: Rooms automatically close when users disconnect

Quick Start

1. Clone and Configure

git clone https://github.com/robertohluna/LiveKitVoiceAgent.git
cd LiveKitVoiceAgent
cp .env.example .env
# Edit .env with your API keys

2. Start All Services

# Terminal 1: Go Backend
cd backend && go run ./cmd/server

# Terminal 2: Python Agents (both)
cd agent
source venv/bin/activate
python agent.py dev &
python agent_groq.py dev &

# Terminal 3: Frontend
cd frontend && npm install && npm run dev

3. Use

  1. Open http://localhost:5173
  2. Select STT provider (Deepgram or Groq Whisper)
  3. Click Connect
  4. Start talking!

Agent Configurations

Two STT Options

Feature Deepgram STT Groq Whisper
Agent File agent.py agent_groq.py
STT Provider Deepgram Nova Groq Whisper
LLM Provider Groq (via Go Backend) Groq (via Go Backend)
Latency ~200-400ms ~300-500ms
Accuracy Excellent Very Good
Cost Pay per minute Included with Groq

Both agents use the same:

  • LLM: Groq llama-3.3-70b-versatile (via Go Backend)
  • TTS: ElevenLabs
  • VAD: Silero

Console Output

When connected, the console shows which STT is active:

[DEEPGRAM] user: Hello there
[DEEPGRAM] agent: Oh that's exciting, it's great to meet you!

or

[GROQ-WHISPER] user: Hello there
[GROQ-WHISPER] agent: Oh that's exciting, it's great to meet you!

Environment Variables

# LiveKit (required)
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud

# AI Services (required)
GROQ_API_KEY=your_groq_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=optional_voice_id

Project Structure

LiveKitVoiceAgent/
β”œβ”€β”€ frontend/                 # Svelte frontend
β”‚   β”œβ”€β”€ src/lib/
β”‚   β”‚   β”œβ”€β”€ livekit.ts       # LiveKit client wrapper
β”‚   β”‚   └── components/
β”‚   β”‚       └── VoiceAgent.svelte
β”‚   └── package.json
β”‚
β”œβ”€β”€ backend/                  # Go backend
β”‚   β”œβ”€β”€ cmd/server/main.go   # Entry point
β”‚   └── internal/
β”‚       β”œβ”€β”€ handler/         # HTTP handlers
β”‚       β”œβ”€β”€ groq/            # Groq API client
β”‚       └── config/          # Environment config
β”‚
β”œβ”€β”€ agent/                    # Python agents
β”‚   β”œβ”€β”€ agent.py             # Deepgram STT agent
β”‚   β”œβ”€β”€ agent_groq.py        # Groq Whisper STT agent
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ docs/                     # Documentation
β”‚   β”œβ”€β”€ API.md               # API reference
β”‚   └── TROUBLESHOOTING.md   # Common issues
β”‚
β”œβ”€β”€ .env.example             # Environment template
└── README.md

API Endpoints

Endpoint Method Description
/health GET Health check
/api/token POST Get LiveKit room token + dispatch agent
/api/room/delete POST Delete room (cleanup)
/api/chat POST Send message to Groq LLM

Token Request

POST /api/token
{
  "room_name": "voice-abc123",
  "participant_name": "user",
  "agent_name": "deepgram-agent"  // or "groq-agent"
}

How Agent Switching Works

  1. Frontend: User selects STT provider from toggle
  2. Token Request: Frontend sends agent_name to backend
  3. Agent Dispatch: Backend dispatches specific agent via LiveKit API
  4. Agent Filter: Each agent only accepts jobs with its name
  5. Connection: Only the selected agent joins the room
# In agent.py
async def request_fnc(req: JobRequest):
    if req.agent_name != "deepgram-agent":
        await req.reject()  # Reject if not for us
        return
    await req.accept()

Technical Details

Custom LLM Integration

Both agents use a custom GoBackendLLM class that:

  1. Converts LiveKit chat context to messages
  2. Calls Go backend /api/chat endpoint
  3. Sends transcript to frontend via data channel
  4. Returns response to TTS for speech synthesis
class GoBackendLLM(llm.LLM):
    def chat(self, *, chat_ctx, **kwargs):
        messages = self._convert_context(chat_ctx)
        return GoBackendLLMStream(messages)

Transcript Flow

Agent Response β†’ GoBackendLLM._run() β†’ Callback β†’ publish_data()
                      ↓
              Frontend receives {"type": "transcript", "role": "agent",
                                "text": "...", "source": "deepgram"}

Troubleshooting

Agent not responding

  • Check agent logs for "registered worker" message
  • Verify Go backend is running on :8080
  • Check API keys in .env

No audio

  • Check browser microphone permissions
  • Ensure ElevenLabs voice ID is valid
  • Check agent logs for TTS errors

Duplicate agents

  • Agents now filter by name, should not happen
  • If stuck, restart both agents

Wrong agent connecting

  • Make sure both agents are running
  • Check agent dispatch logs in Go backend

Dependencies

Python

livekit-agents>=1.3.11
livekit-plugins-deepgram
livekit-plugins-groq
livekit-plugins-elevenlabs
livekit-plugins-silero
aiohttp
python-dotenv

Go

github.com/livekit/protocol
github.com/livekit/server-sdk-go
github.com/joho/godotenv

Frontend

livekit-client
svelte
tailwindcss

License

MIT

About

In Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors