Gemini Live Agent Challenge | Category: Live Agents π£οΈ
π Live Demo Β·ποΈ Architecture
GCP_Deploy_Logs.1.mp4
Practicing for technical interviews is one of the most stressful parts of a job search:
- Real mock interviews cost $150β300/session β inaccessible for most candidates
- Candidates make the same mistakes repeatedly β filler words, bad posture, unstructured answers β without ever knowing it
- Existing AI chatbots are text-only, missing the critical non-verbal dimensions that real interviewers evaluate
InterviewAce is a real-time, multimodal AI interview coach that replicates a genuine Google Meet-style technical interview. A live AI hiring manager ("Coach Ace") sees your body language through the camera, hears your filler words through the microphone, and speaks to you naturally through native voice β all powered by the Gemini Live API and Google ADK.
No text boxes. No typing. Just a real conversation with an AI that actually watches and listens.
| Feature | How It Works |
|---|---|
| π£οΈ Native Audio Voice | Real-time bidirectional audio via Gemini 2.5 Flash Native Audio. Sub-500ms latency. Supports natural interruptions and barge-in. |
| π Live Camera Vision | Webcam frames streamed at adaptive 0.33-1 fps for real-time body language analysis (posture, eye contact, expression, gestures). Bandwidth-adaptive. |
| π 13 ADK Background Tools | Silent analysis tools fire automatically during the interview β filler word detection, STAR method evaluation, voice confidence (pace/volume/tone/pauses), body language (gestures/expressions), and dynamic difficulty scaling. |
| π Google Search Grounding | ADK's built-in google_search tool prevents hallucination of company interview facts. Expanded grounding for 8+ companies. |
| π’ Company-Specific Styles | Google, Amazon (Leadership Principles), Meta, Apple, Microsoft, Netflix, Airbnb, Stripe, Uber interview question frameworks. |
| π Session Report & Transcript | Full performance breakdown with downloadable transcript after every session. |
| π¨ Google Meet Replica UI | Pixel-perfect Meet interface with closed captions, volume visualizers, participant panel, chat sidebar, and session timer. |
| β‘ Adaptive Performance | Dynamic difficulty scaling based on candidate performance. Bandwidth-adaptive video streaming. Robust interruption handling. |
InterviewAce follows the official ADK bidirectional streaming pattern (LiveRequestQueue) for real-time voice + vision interaction.
graph TB
subgraph "π₯οΈ Browser (Vanilla JS)"
MIC[π€ Microphone<br/>PCM 16kHz] --> WS
CAM[π· Camera<br/>JPEG 1fps] --> WS
WS[WebSocket Client] <--> |Audio + Images + JSON| BACKEND
WS --> PLAYER[π Audio Player<br/>PCM Playback]
WS --> CC[π¬ Closed Captions]
WS --> ANALYTICS[π Live Analytics<br/>Sidebar]
end
subgraph "βοΈ FastAPI Backend (Python)"
BACKEND[WebSocket Server<br/>main.py] --> LRQ[LiveRequestQueue]
LRQ --> RUNNER[ADK Runner]
RUNNER --> SESSION[InMemorySessionService]
end
subgraph "π€ Google ADK Agent"
RUNNER <--> |Bidi Stream| GEMINI[Gemini 2.5 Flash<br/>Native Audio + Vision]
GEMINI --> |Autonomous Tool Calls| TOOLS
end
subgraph "π§ 11 Custom Tools (3 Tiers)"
direction LR
subgraph "Tier 1: Core Analysis"
T1[save_session_feedback]
T2[detect_filler_words]
T3[analyze_body_language]
T4[evaluate_star_method]
end
subgraph "Tier 2: Deep Coaching"
T5[analyze_voice_confidence]
T6[get_improvement_tips]
T7[fetch_grounding_data]
end
subgraph "Tier 3: Reporting"
T8[get_session_history]
T9[save_session_recording]
T10[generate_session_report]
end
subgraph "Grounding"
T11[google_search<br/>ADK Built-in]
end
end
TOOLS --> T1 & T2 & T3 & T4 & T5 & T6 & T7 & T8 & T9 & T10 & T11
subgraph "βοΈ Google Cloud"
CR[Cloud Run<br/>Serverless Container]
GCR[Container Registry]
end
CR --> BACKEND
1. User speaks β Mic captures PCM audio β WebSocket β FastAPI β LiveRequestQueue β Gemini Live API
2. User's camera β JPEG frame (1 fps) β WebSocket β FastAPI β LiveRequestQueue β Gemini Vision
3. Gemini responds β Native audio bytes β WebSocket β Browser AudioPlayer β User hears voice
4. Gemini calls tools β ADK executes silently β Results update sidebar analytics in real-time
5. Gemini transcribes β Input/Output transcription β Closed Captions rendered in UI
Coach Ace is a single ADK Agent with a carefully engineered persona β a senior hiring manager with 15 years at Google, Meta, Amazon, and Apple. The agent:
- Greets the candidate naturally when they join the meeting
- Generates dynamic interview questions based on the selected role, company style, and difficulty level
- Listens to answers and provides natural, human-like follow-ups ("Okay, and what was the actual outcome there?")
- Silently analyzes the candidate every 2-3 answers using background tool calls:
save_session_feedbackβ Scores confidence, clarity, content, body language (0-100)detect_filler_wordsβ Counts "um", "uh", "like", "you know", etc.analyze_body_languageβ Rates posture, eye contact, expression from camera framesevaluate_star_methodβ Checks if the answer followed Situation-Task-Action-Result structure
- Generates a comprehensive session report when the interview ends
| Tier | Tools | Purpose |
|---|---|---|
| Tier 1 | save_session_feedback, detect_filler_words, analyze_body_language, evaluate_star_method |
Core real-time analysis β fires every 2-3 answers |
| Tier 2 | analyze_voice_confidence, get_improvement_tips, fetch_grounding_data, adjust_difficulty_level |
Deeper coaching β voice/tone/pause analysis, targeted tips, verified knowledge base, dynamic scaling |
| Tier 3 | get_session_history, save_session_recording, generate_session_report |
Session management and comprehensive reporting |
| Grounding | google_search (ADK built-in) |
Prevents hallucination of company-specific interview facts |
InterviewAce uses two grounding mechanisms:
fetch_grounding_dataβ Returns verified interview coaching knowledge from a curated local database (grounding_data.py) covering STAR method, body language tips, voice delivery, and common mistakesgoogle_search(ADK built-in) β When the candidate asks about a specific company's interview process, the agent searches for real, current information rather than guessing
The frontend is a pixel-perfect Google Meet replica built entirely with vanilla JavaScript:
- 3 Video Tiles β Coach Ace (AI interviewer), Elena (AI notetaker), You (with live webcam feed)
- Volume Visualizer Rings β Animated concentric rings pulse in real-time when audio is detected
- Equalizer Bars β 3-bar equalizer animation replaces the mic icon when speaking
- Closed Captions β Real-time CC powered by Gemini's input/output transcription
- Live Analytics Sidebar β Confidence, Clarity, STAR Score, Body Language bars update in real-time
- Filler Word Counter β Running count with detected words and coaching tips
- Body Language Indicators β Eye contact, posture, expression ratings with color-coded dots
- STAR Method Badges β S, T, A, R badges light up green as you hit each component
- Session Timer β MM:SS elapsed time counter during active interviews
- Chat Sidebar β In-call messaging panel (Google Meet style)
- People Panel β Shows all 3 participants with mic status
- Meeting Details β Coupable meeting link and session parameters
- Toast Notifications β Contextual notifications for mic/camera/CC toggles
- Feedback Modal β Full score breakdown with performance tier and downloadable transcript
- Python 3.10+
- A Google API Key with Gemini access
# Clone the repository
git clone https://github.com/SameerAliKhan-git/IntyerviewBit.git
cd IntyerviewBit/interviewace
# Create virtual environment
python -m venv venv
# Activate (Windows)
.\venv\Scripts\activate
# Activate (Mac/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the interviewace/ directory:
GOOGLE_API_KEY=your_gemini_api_key_herepython app/main.pyOpen http://localhost:8080 in your browser.
# Build with Cloud Build
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/interviewace .
# Deploy
gcloud run deploy interviewace \
--image gcr.io/YOUR_PROJECT_ID/interviewace \
--region us-central1 \
--platform managed \
--allow-unauthenticated \
--port 8080 \
--memory 1Gi \
--session-affinity \
--set-env-vars "GOOGLE_API_KEY=YOUR_KEY,GOOGLE_GENAI_USE_VERTEXAI=FALSE"IntyerviewBit/
βββ README.md
βββ cloudbuild.yaml # Google Cloud Build config
βββ interviewace/
βββ Dockerfile # Cloud Run container
βββ .dockerignore
βββ .env.example
βββ requirements.txt
βββ app/
βββ main.py # FastAPI + WebSocket server
βββ interview_coach_agent/
β βββ __init__.py
β βββ agent.py # ADK Agent definition (11 tools)
β βββ prompts.py # Agent persona & instructions
β βββ tools.py # All 10 custom tool implementations
β βββ grounding_data.py # Verified coaching knowledge base
βββ static/
βββ index.html # Single-page app (Google Meet UI)
βββ favicon.ico
βββ css/
β βββ style.css # Complete Meet-style CSS
βββ js/
βββ app.js # Main application logic
βββ audio-player.js # PCM audio playback engine
βββ audio-recorder.js # Mic capture + PCM encoding
βββ camera.js # Webcam frame capture (1 fps)
| Technology | Usage |
|---|---|
| Google ADK | Agent orchestration, tool execution, LiveRequestQueue |
| Gemini 2.5 Flash Native Audio | Real-time voice interaction via Live API |
| Gemini Vision | Body language analysis from webcam frames |
| Google Search (ADK built-in) | Grounding to prevent hallucination |
| Google Cloud Run | Serverless container hosting |
| FastAPI | WebSocket server bridging browser β ADK |
| Uvicorn | ASGI server with WebSocket support |
| Vanilla JavaScript | Frontend UI (no framework dependencies) |
| Web Audio API | PCM audio recording and playback |
| MediaDevices API | Camera frame capture |
| Criterion | How InterviewAce Addresses It |
|---|---|
| Beyond the text box | Fully voice-driven. Camera vision. No text input needed at any point. |
| Live API usage | Native bidiGenerateContent streaming via ADK LiveRequestQueue |
| Multimodal input | Audio (PCM 16kHz) + Vision (JPEG 1fps) streamed simultaneously |
| Tool use | 11 tools across 3 tiers β all called autonomously by the model |
| Grounding | google_search (ADK built-in) + fetch_grounding_data (local KB) |
| Google Cloud | Deployed on Cloud Run with Dockerfile + cloudbuild.yaml |
| User experience | Pixel-perfect Google Meet replica with real-time analytics |
Built with β€οΈ for the Gemini Live Agent Challenge by Sameer
Powered by Google ADK Β· Gemini Live API Β· Google Cloud Run