JARVIS is a sophisticated, highly interactive voice assistant featuring a futuristic 3D visual interface. It is designed to be a "living" digital companion with a distinct personality, capable of real-time, interruptible conversation.
- Immersive 3D UI: A "nebulous sphere" visualizer (built with React Three Fiber) that reacts dynamically to agent states (Listening, Thinking, Speaking).
- Real-time Voice Interaction: Low-latency voice processing with websocket-based full-duplex communication.
- Flexible Conversation Engines: Switch between different backend architectures:
- Gemini Live: Uses Google's Multimodal Live API for an all-in-one low-latency experience.
- Deepgram Pipeline: A modular pipeline using Deepgram (STT), Gemini (LLM), and ElevenLabs (TTS).
- Smart "Barge-in": The user can interrupt the agent while it is speaking.
- Rich UI Controls:
- Device Selector: Choose your preferred microphone input.
- Chat Panel: View the text transcript of the conversation in real-time.
- Debug Panel: Monitor audio levels and system status.
- Push-to-Talk: Optional mode for discrete interaction.
- Frontend: React (Vite), TypeScript, Tailwind CSS, Three.js (React Three Fiber).
- Backend: Python (FastAPI), WebSockets.
- AI Services:
- Google Gemini (LLM & Audio)
- Deepgram (STT)
- ElevenLabs (TTS - optional)
- Node.js (v18+)
- Python (v3.10+)
- API Keys (depending on the engine you choose):
- Google Gemini API Key
- Deepgram API Key (for Deepgram Pipeline)
- ElevenLabs API Key (for Deepgram Pipeline)
# Clone the repository
git clone <your-repo-url>
cd voice-assistant
# Set up Backend Secrets
cd backend
cp .env.example .envEdit backend/.env:
Choose your conversation engine and provide the necessary keys.
Option A: Gemini Live (Recommended for simplicity)
CONVERSATION_ENGINE=gemini_live
GOOGLE_API_KEY=your_google_keyOption B: Deepgram Pipeline (For modular control)
CONVERSATION_ENGINE=deepgram_pipeline
GOOGLE_API_KEY=your_google_key
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_keyOpen a terminal for the backend:
cd backend
python3 -m venv venv # Create virtual environment
source venv/bin/activate # Activate it (Windows: venv\Scripts\activate)
pip install -r requirements.txt # Install dependencies
uvicorn main:app --reload # Start the serverThe backend runs on http://localhost:8000
Open a new terminal for the frontend:
cd frontend
npm install # Install dependencies
npm run dev # Start the dev serverThe frontend runs on http://localhost:5173 (usually)
- Open your browser to the Frontend URL.
- Click the Microphone Icon in the top right to select your input device.
- Click the Power Icon in the center bottom to wake JARVIS.
- Speak to the sphere!
frontend/: React application (UI/UX).src/components/: UI components (Sphere, Chat, Device Selector).src/hooks/: Custom hooks for Audio and WebSockets.
backend/: FastAPI server.conversation_engines/: Logic for different AI pipelines.audio_providers/: Interfaces for STT, TTS, and LLM services.
PRD.md: Product Requirements Document.SOUL.md: Agent personality definition.RULES.md: Operational constraints.
This project is currently in the Alpha phase.