Real-Time Voice Agent with RAG (LiveKit + Gemini Live) 🎙️

A real-time, voice-first conversational AI agent using Google Gemini, LiveKit, and a local RAG module for grounded, low-latency responses.

This repository contains a voice-first conversational AI agent that uses Google's Gemini Live API for real-time speech-to-text, language understanding, and text-to-speech, with LiveKit handling the low-latency audio transport over WebRTC. A local RAG (Retrieval-Augmented Generation) module grounds the agent's responses in a specific knowledge base, ensuring it answers questions based on provided documentation.

For a detailed explanation of the RAG implementation, see RAG_DOCUMENTATION.md.

Architecture Overview

The system is composed of three main parts that run concurrently:

React Frontend (my-voice-app/): A simple web interface that captures microphone audio and streams it to LiveKit. It also plays back the audio stream received from the agent.
Token Server (token_server.py): A lightweight FastAPI server that issues JWTs (JSON Web Tokens) to the frontend, authorizing it to connect to a specific LiveKit room.
Voice Agent (agent.py): A Python worker that connects to the same LiveKit room. It receives the audio stream, forwards it to the Gemini Live API, and executes tools (like RAG lookups) when requested by the model.

graph TD
    subgraph Browser
        A[React UI]
    end
    subgraph Local Services
        B[Token Server @ FastAPI]
        C[Voice Agent @ Python]
    end
    subgraph Cloud Services
        D[LiveKit Cloud]
        E[Google Gemini Live API]
    end
    subgraph Data
        F[RAG Module @ FAISS]
        G[ecommerce.json]
    end

    A --"1. GET /getToken"--> B
    B --"2. Returns JWT"--> A
    A --"3. Connect w/ JWT"--> D
    C --"4. Connect w/ API Key"--> D
    D --"5. Bridge Audio Stream"--> C
    C --"6. Stream Audio"--> E
    E --"7. Request Tool Call"--> C
    C --"8. lookup_company_info()"--> F
    F --"9. Search"--> G
    F --"10. Return Context"--> C
    C --"11. Send Context"--> E
    E --"12. Stream Audio Response"--> C
    C --"13. Stream Audio"--> D
    D --"14. Stream to Browser"--> A

Getting Started

Prerequisites

Python 3.10+
Node.js 18+ and npm 9+
A Google AI Studio API key.
A LiveKit Cloud project.

Installation & Setup

Clone the repository:

git clone https://github.com/Youssef-Ashraf-Dev/Voice-Agent.git
cd Voice-Agent

Install dependencies:

# Set up Python virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install Python packages
pip install -r requirements.txt

# Install frontend packages
cd my-voice-app
npm install
cd ..

Configure Environment Variables: Create a file named .env in the root of the project directory and add your credentials. This file is ignored by Git.

# Get this from Google AI Studio
GOOGLE_API_KEY=AI...

# Get these from your LiveKit Cloud project settings
LIVEKIT_URL=wss://<your-project-name>.livekit.cloud
LIVEKIT_API_KEY=API...
LIVEKIT_API_SECRET=...

Generate RAG Embeddings: The first time you run the agent, it will automatically generate and cache the embeddings for the knowledge base (data/ecommerce.json). You can also pre-generate them with:
```
python -c "import rag; rag.get_stats()"
```
If you modify data/ecommerce.json, you must delete the embeddings_cache/ directory or run python -c "import rag; rag.rebuild_cache()" to force a regeneration.

How to Run Locally

The system requires three separate terminal sessions to run correctly.

Terminal	Command	Purpose
1	`python token_server.py`	Serves the LiveKit authentication token.
2	`python agent.py dev`	Runs the voice agent worker.
3	`cd my-voice-app; npm run dev`	Starts the frontend development server.

Once all three processes are running:

Open your browser to http://localhost:5173.
Click the "Start Voice Chat" button.
Allow microphone access when prompted.
Start speaking. The agent will listen and respond.

Demo Video

Watch the Voice Agent Demo on Google Drive

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
my-voice-app		my-voice-app
.env.example		.env.example
.gitignore		.gitignore
RAG_DOCUMENTATION.md		RAG_DOCUMENTATION.md
README.md		README.md
agent.py		agent.py
rag.py		rag.py
requirements.txt		requirements.txt
token_server.py		token_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Voice Agent with RAG (LiveKit + Gemini Live) 🎙️

Architecture Overview

Getting Started

Prerequisites

Installation & Setup

How to Run Locally

Demo Video

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time Voice Agent with RAG (LiveKit + Gemini Live) 🎙️

Architecture Overview

Getting Started

Prerequisites

Installation & Setup

How to Run Locally

Demo Video

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages