🎥 AI Video Chat

Real-time video chat with AI — it can see you and hear you, then talks back.

Built with Groq APIs for blazing-fast inference. Single file server, no frameworks, runs locally.

How It Works

🎤 You speak → Groq Whisper (STT)
📷 Camera frame → Groq Llama 4 Scout (Vision)
       ↓ (parallel)
🧠 Groq Llama 3.3 70B (Conversation) → combines what it heard + saw
       ↓
🔊 edge-tts (Text-to-Speech) → AI speaks back

All processing runs through Groq's API — no local GPU needed. Typical round-trip: 2-4 seconds.

Quick Start

1. Get a Groq API Key (free)

Sign up at console.groq.com and create an API key.

2. Install & Run

git clone https://github.com/littleshuai-bot/ai-video-chat.git
cd ai-video-chat

# Set your API key
export GROQ_API_KEY=gsk_your_key_here

# Install dependencies
pip install -r requirements.txt

# Run
python server.py

3. Open in Browser

Go to http://localhost:8765 → allow camera & microphone → click 🎤 to talk.

Configuration

Copy .env.example to .env and customize:

cp .env.example .env

Variable	Default	Description
`GROQ_API_KEY`	(required)	Your Groq API key
`AGENT_NAME`	`AI Assistant`	Name displayed on the AI avatar
`USER_NAME`	`You`	Name displayed on your video
`PORT`	`8765`	Server port
`LANGUAGE`	`zh`	STT language code (`en`, `zh`, `ja`, `ko`, `es`, `fr`, etc.)
`TTS_VOICE`	`zh-CN-XiaoxiaoNeural`	edge-tts voice (list voices)
`LLM_MODEL`	`llama-3.3-70b-versatile`	Groq LLM model for conversation
`VISION_MODEL`	`meta-llama/llama-4-scout-17b-16e-instruct`	Groq vision model
`AGENT_PERSONA`	(auto-generated)	Custom system prompt override

Language Examples

English:

LANGUAGE=en
TTS_VOICE=en-US-AriaNeural

Chinese:

LANGUAGE=zh
TTS_VOICE=zh-CN-XiaoxiaoNeural

Japanese:

LANGUAGE=ja
TTS_VOICE=ja-JP-NanamiNeural

Requirements

Python 3.10+
ffmpeg — for audio conversion (brew install ffmpeg / apt install ffmpeg)
Groq API key — free tier at console.groq.com
Modern browser with camera & microphone support

Architecture

┌─────────────────────────────────────────────────┐
│                  Browser (UI)                    │
│  ┌──────────┐              ┌──────────────────┐  │
│  │  Camera   │              │   AI Avatar      │  │
│  │  (user)   │              │   + Subtitles    │  │
│  └──────────┘              └──────────────────┘  │
│        🎤 Record → POST /api/chat (audio+image)  │
│                           ← { text, audio_url }  │
└─────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────┐
│              Python Server (FastAPI)             │
│                                                  │
│  Audio ──→ [ffmpeg] ──→ Groq Whisper (STT)      │
│  Image ──→ Groq Llama 4 Scout (Vision)    ← parallel
│                    ↓                             │
│  transcript + scene ──→ Groq Llama 3.3 (LLM)   │
│                    ↓                             │
│  reply text ──→ edge-tts (TTS) ──→ MP3          │
└─────────────────────────────────────────────────┘

The frontend is a single HTML file with no build step. The backend is a single Python file with FastAPI.

Features

🎤 Voice Input — press to record, release to send
📷 Vision — AI can see your camera feed
🔊 Voice Output — AI speaks its replies
💬 Subtitles — typewriter-style text animation
⏱️ Call Timer — FaceTime-style UI
📱 Responsive — works on mobile & desktop
🌍 Multi-language — configurable STT language and TTS voice
🎭 Custom Persona — fully customizable AI personality

How It's Built

Component	Technology	Why
STT	Groq Whisper Large v3 Turbo	Fastest Whisper inference available
Vision	Groq Llama 4 Scout	Multimodal understanding
LLM	Groq Llama 3.3 70B	Fast, high-quality conversation
TTS	edge-tts	Free, many voices, low latency
Server	FastAPI + uvicorn	Async Python, minimal overhead
Frontend	Vanilla HTML/CSS/JS	No build step, just works

License

MIT

Built by ExtraSmall ✨

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
static		static
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎥 AI Video Chat

How It Works

Quick Start

1. Get a Groq API Key (free)

2. Install & Run

3. Open in Browser

Configuration

Language Examples

Requirements

Architecture

Features

How It's Built

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎥 AI Video Chat

How It Works

Quick Start

1. Get a Groq API Key (free)

2. Install & Run

3. Open in Browser

Configuration

Language Examples

Requirements

Architecture

Features

How It's Built

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages