A real-time voice interface for Google's Gemini AI model that allows you to have natural conversations with Gemini using speech. This application uses WebRTC for low-latency audio streaming.
- Real-time voice interaction with Gemini
- Audio visualization with responsive waveform display
- Multiple voice options (Puck, Charon, Kore, Fenrir, Aoede)
- Low-latency response with WebRTC streaming
- Simple and intuitive user interface
- Python 3.8+
- Gemini API key (get one at Google AI Studio)
- Modern web browser with WebRTC support
-
Clone this repository:
git clone <repository-url> cd gemini-voice-chat -
Install dependencies:
pip install -r requirements.txt -
Set up your environment variables:
cp .env.example .envThen edit the
.envfile and add your Gemini API key.
Start the application with:
python app.py
By default, the application will run in the mode specified in your .env file:
MODE=UI: Launches the Gradio UI interfaceMODE=PHONE: Uses fastphone mode for mobile compatibility- Leave blank to run with uvicorn server
-
Open your browser and navigate to:
- http://localhost:7860 (default)
-
Enter your Gemini API key (if not pre-configured in
.env) -
Select your preferred voice
-
Click "Start Recording" and speak with Gemini
-
Click "Stop Recording" when you're done
The application offers several voice options for Gemini's responses:
- Puck: Default voice
- Charon: Alternative voice option
- Kore: Alternative voice option
- Fenrir: Alternative voice option
- Aoede: Alternative voice option
The application can be configured using environment variables in the .env file:
GEMINI_API_KEY: Your Google Gemini API keyMODE: Application mode (UI, PHONE, or blank for uvicorn)
Common issues:
- Connection errors: If you're using a VPN, try disabling it as it might interfere with WebRTC connections.
- Audio not working: Ensure your browser has permission to access your microphone.
- API key errors: Verify that your Gemini API key is valid and correctly entered.
[Specify your license information here]