Deepgram Voice Agent

A real-time voice conversation agent using Deepgram's Voice Agent API with OpenRouter LLMs. Built with FastAPI and modern web technologies for the lowest latency possible.

Features

🎙️ Real-time voice conversation with AI using Deepgram's Voice Agent API
🤖 Multiple AI models from OpenRouter (Claude, GPT-4, Gemini, Llama, and more)
🌐 Modern web interface with live transcript display
⚡ Ultra-low latency streaming architecture
🎨 Beautiful, responsive UI with real-time status indicators
🔧 Configurable system prompts for custom agent behavior

Architecture

Backend: FastAPI with WebSocket support
Frontend: Vanilla JavaScript with Web Audio API
STT: Deepgram Nova 3
LLM: OpenRouter (configurable models)
TTS: Deepgram Aura 2
Audio Pipeline: Browser microphone → WebSocket → Deepgram Agent → WebSocket → Browser speakers

Prerequisites

Python 3.11+
uv package manager
Deepgram API key (Get one here)
OpenRouter API key (Get one here)

Installation

Clone the repository (if you haven't already)
Install dependencies using uv:
```
uv sync
```

Set up environment variables:

Copy sample.env to .env and fill in your API keys:

cp sample.env .env

Edit .env with your actual keys:

DEEPGRAM_API_KEY=your_deepgram_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here

Usage

Start the server:

uv run python -m src.main

Or activate the virtual environment first:

source .venv/bin/activate
python -m src.main

Open your browser:

Navigate to http://localhost:8000
Select a model from the dropdown menu
Click "Connect & Start" and allow microphone access
Start talking! The agent will transcribe your speech, process it with the selected LLM, and speak back to you

Available Models

The application supports various OpenRouter models including:

Anthropic: Claude 3.5 Sonnet, Claude 3 Opus
OpenAI: GPT-4o, GPT-4o Mini, GPT-4 Turbo
Google: Gemini 2.0 Flash, Gemini Pro 1.5
Meta: Llama 3.3 70B, Llama 3.1 405B
X.AI: Grok 2
Mistral: Mistral Large
Cohere: Command R+

Project Structure

deepgram_voice_agent/
├── src/
│   ├── main.py           # FastAPI application with WebSocket endpoints
│   ├── config.py         # Configuration and settings
│   ├── voice_agent.py    # Deepgram Voice Agent wrapper
│   └── static/
│       ├── index.html    # Frontend UI
│       └── app.js        # Real-time audio handling and WebSocket client
├── tests/
│   └── test_main.py      # Tests
├── .env                  # Your API keys (not in git)
├── sample.env            # Example environment file
├── pyproject.toml        # Project dependencies
└── README.md            # This file

How It Works

Browser captures audio from your microphone using the Web Audio API
Audio is streamed as PCM 16-bit data over WebSocket to the FastAPI backend
Backend forwards audio to Deepgram's Voice Agent API
Deepgram processes the audio pipeline:
- Listen (STT): Transcribes your speech using Nova 3
- Think (LLM): Processes with your selected OpenRouter model
- Speak (TTS): Generates speech using Aura 2
Agent's audio is streamed back through WebSocket
Browser plays the audio in real-time
Transcripts are displayed live on the screen

API Endpoints

REST Endpoints

GET / - Serve the web interface
GET /api/models - Get available OpenRouter models
GET /api/health - Health check endpoint

WebSocket

WS /ws/agent - Main WebSocket connection for voice agent communication

Configuration

Edit src/config.py to customize:

Default models
Audio settings (sample rate, encoding)
System prompts
Available models list

Development

Run in development mode with auto-reload:

uv run uvicorn src.main:app --reload --port 8000

Future Enhancements

Function calling / MCP tools integration
Session persistence and conversation history
Multiple TTS voice options
Voice activity detection (VAD) controls
Recording and export capabilities
Multi-language support

Troubleshooting

No audio from agent

Check your browser's speaker/audio output settings
Look for errors in the browser console
Ensure your Deepgram API key has TTS credits

Microphone not working

Grant microphone permissions in your browser
Check browser console for getUserMedia errors
Try using HTTPS (some browsers require it)

Connection errors

Verify your API keys in .env
Check that the server is running on the correct port
Look at server logs for detailed error messages

License

MIT

Credits

Built with:

Deepgram - Voice Agent API, STT, and TTS
OpenRouter - LLM API aggregation
FastAPI - Modern Python web framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
.vscode		.vscode
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
BUGFIX_OPENROUTER.md		BUGFIX_OPENROUTER.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TECHNICAL_NOTES.md		TECHNICAL_NOTES.md
pyproject.toml		pyproject.toml
sample.env		sample.env
server.log		server.log
server_debug.log		server_debug.log
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deepgram Voice Agent

Features

Architecture

Prerequisites

Installation

Usage

Available Models

Project Structure

How It Works

API Endpoints

REST Endpoints

WebSocket

Configuration

Development

Future Enhancements

Troubleshooting

No audio from agent

Microphone not working

Connection errors

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Deepgram Voice Agent

Features

Architecture

Prerequisites

Installation

Usage

Available Models

Project Structure

How It Works

API Endpoints

REST Endpoints

WebSocket

Configuration

Development

Future Enhancements

Troubleshooting

No audio from agent

Microphone not working

Connection errors

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors