Skip to content

extrasmall0/ai-video-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ₯ AI Video Chat

Real-time video chat with AI β€” it can see you and hear you, then talks back.

Built with Groq APIs for blazing-fast inference. Single file server, no frameworks, runs locally.

How It Works

🎀 You speak β†’ Groq Whisper (STT)
πŸ“· Camera frame β†’ Groq Llama 4 Scout (Vision)
       ↓ (parallel)
🧠 Groq Llama 3.3 70B (Conversation) β†’ combines what it heard + saw
       ↓
πŸ”Š edge-tts (Text-to-Speech) β†’ AI speaks back

All processing runs through Groq's API β€” no local GPU needed. Typical round-trip: 2-4 seconds.

Quick Start

1. Get a Groq API Key (free)

Sign up at console.groq.com and create an API key.

2. Install & Run

git clone https://github.com/littleshuai-bot/ai-video-chat.git
cd ai-video-chat

# Set your API key
export GROQ_API_KEY=gsk_your_key_here

# Install dependencies
pip install -r requirements.txt

# Run
python server.py

3. Open in Browser

Go to http://localhost:8765 β†’ allow camera & microphone β†’ click 🎀 to talk.

Configuration

Copy .env.example to .env and customize:

cp .env.example .env
Variable Default Description
GROQ_API_KEY (required) Your Groq API key
AGENT_NAME AI Assistant Name displayed on the AI avatar
USER_NAME You Name displayed on your video
PORT 8765 Server port
LANGUAGE zh STT language code (en, zh, ja, ko, es, fr, etc.)
TTS_VOICE zh-CN-XiaoxiaoNeural edge-tts voice (list voices)
LLM_MODEL llama-3.3-70b-versatile Groq LLM model for conversation
VISION_MODEL meta-llama/llama-4-scout-17b-16e-instruct Groq vision model
AGENT_PERSONA (auto-generated) Custom system prompt override

Language Examples

English:

LANGUAGE=en
TTS_VOICE=en-US-AriaNeural

Chinese:

LANGUAGE=zh
TTS_VOICE=zh-CN-XiaoxiaoNeural

Japanese:

LANGUAGE=ja
TTS_VOICE=ja-JP-NanamiNeural

Requirements

  • Python 3.10+
  • ffmpeg β€” for audio conversion (brew install ffmpeg / apt install ffmpeg)
  • Groq API key β€” free tier at console.groq.com
  • Modern browser with camera & microphone support

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Browser (UI)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Camera   β”‚              β”‚   AI Avatar      β”‚  β”‚
β”‚  β”‚  (user)   β”‚              β”‚   + Subtitles    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        🎀 Record β†’ POST /api/chat (audio+image)  β”‚
β”‚                           ← { text, audio_url }  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Python Server (FastAPI)             β”‚
β”‚                                                  β”‚
β”‚  Audio ──→ [ffmpeg] ──→ Groq Whisper (STT)      β”‚
β”‚  Image ──→ Groq Llama 4 Scout (Vision)    ← parallel
β”‚                    ↓                             β”‚
β”‚  transcript + scene ──→ Groq Llama 3.3 (LLM)   β”‚
β”‚                    ↓                             β”‚
β”‚  reply text ──→ edge-tts (TTS) ──→ MP3          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The frontend is a single HTML file with no build step. The backend is a single Python file with FastAPI.

Features

  • 🎀 Voice Input β€” press to record, release to send
  • πŸ“· Vision β€” AI can see your camera feed
  • πŸ”Š Voice Output β€” AI speaks its replies
  • πŸ’¬ Subtitles β€” typewriter-style text animation
  • ⏱️ Call Timer β€” FaceTime-style UI
  • πŸ“± Responsive β€” works on mobile & desktop
  • 🌍 Multi-language β€” configurable STT language and TTS voice
  • 🎭 Custom Persona β€” fully customizable AI personality

How It's Built

Component Technology Why
STT Groq Whisper Large v3 Turbo Fastest Whisper inference available
Vision Groq Llama 4 Scout Multimodal understanding
LLM Groq Llama 3.3 70B Fast, high-quality conversation
TTS edge-tts Free, many voices, low latency
Server FastAPI + uvicorn Async Python, minimal overhead
Frontend Vanilla HTML/CSS/JS No build step, just works

License

MIT


Built by ExtraSmall ✨

About

πŸŽ₯ Real-time AI video chat with voice + vision. It sees you, hears you, and talks back. Powered by Groq APIs. Single file, no GPU needed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors