A real-time voice conversation agent using Deepgram's Voice Agent API with OpenRouter LLMs. Built with FastAPI and modern web technologies for the lowest latency possible.
- 🎙️ Real-time voice conversation with AI using Deepgram's Voice Agent API
- 🤖 Multiple AI models from OpenRouter (Claude, GPT-4, Gemini, Llama, and more)
- 🌐 Modern web interface with live transcript display
- ⚡ Ultra-low latency streaming architecture
- 🎨 Beautiful, responsive UI with real-time status indicators
- 🔧 Configurable system prompts for custom agent behavior
- Backend: FastAPI with WebSocket support
- Frontend: Vanilla JavaScript with Web Audio API
- STT: Deepgram Nova 3
- LLM: OpenRouter (configurable models)
- TTS: Deepgram Aura 2
- Audio Pipeline: Browser microphone → WebSocket → Deepgram Agent → WebSocket → Browser speakers
- Python 3.11+
uvpackage manager- Deepgram API key (Get one here)
- OpenRouter API key (Get one here)
-
Clone the repository (if you haven't already)
-
Install dependencies using
uv:uv sync
-
Set up environment variables:
Copy
sample.envto.envand fill in your API keys:cp sample.env .env
Edit
.envwith your actual keys:DEEPGRAM_API_KEY=your_deepgram_api_key_here OPENROUTER_API_KEY=your_openrouter_api_key_here
-
Start the server:
uv run python -m src.main
Or activate the virtual environment first:
source .venv/bin/activate python -m src.main -
Open your browser:
Navigate to http://localhost:8000
-
Select a model from the dropdown menu
-
Click "Connect & Start" and allow microphone access
-
Start talking! The agent will transcribe your speech, process it with the selected LLM, and speak back to you
The application supports various OpenRouter models including:
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus
- OpenAI: GPT-4o, GPT-4o Mini, GPT-4 Turbo
- Google: Gemini 2.0 Flash, Gemini Pro 1.5
- Meta: Llama 3.3 70B, Llama 3.1 405B
- X.AI: Grok 2
- Mistral: Mistral Large
- Cohere: Command R+
deepgram_voice_agent/
├── src/
│ ├── main.py # FastAPI application with WebSocket endpoints
│ ├── config.py # Configuration and settings
│ ├── voice_agent.py # Deepgram Voice Agent wrapper
│ └── static/
│ ├── index.html # Frontend UI
│ └── app.js # Real-time audio handling and WebSocket client
├── tests/
│ └── test_main.py # Tests
├── .env # Your API keys (not in git)
├── sample.env # Example environment file
├── pyproject.toml # Project dependencies
└── README.md # This file
- Browser captures audio from your microphone using the Web Audio API
- Audio is streamed as PCM 16-bit data over WebSocket to the FastAPI backend
- Backend forwards audio to Deepgram's Voice Agent API
- Deepgram processes the audio pipeline:
- Listen (STT): Transcribes your speech using Nova 3
- Think (LLM): Processes with your selected OpenRouter model
- Speak (TTS): Generates speech using Aura 2
- Agent's audio is streamed back through WebSocket
- Browser plays the audio in real-time
- Transcripts are displayed live on the screen
GET /- Serve the web interfaceGET /api/models- Get available OpenRouter modelsGET /api/health- Health check endpoint
WS /ws/agent- Main WebSocket connection for voice agent communication
Edit src/config.py to customize:
- Default models
- Audio settings (sample rate, encoding)
- System prompts
- Available models list
Run in development mode with auto-reload:
uv run uvicorn src.main:app --reload --port 8000- Function calling / MCP tools integration
- Session persistence and conversation history
- Multiple TTS voice options
- Voice activity detection (VAD) controls
- Recording and export capabilities
- Multi-language support
- Check your browser's speaker/audio output settings
- Look for errors in the browser console
- Ensure your Deepgram API key has TTS credits
- Grant microphone permissions in your browser
- Check browser console for getUserMedia errors
- Try using HTTPS (some browsers require it)
- Verify your API keys in
.env - Check that the server is running on the correct port
- Look at server logs for detailed error messages
MIT
Built with:
- Deepgram - Voice Agent API, STT, and TTS
- OpenRouter - LLM API aggregation
- FastAPI - Modern Python web framework