Skip to content

SameerAliKhan-git/IntyerviewBit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 InterviewAce β€” Real-Time AI Interview Coach

Gemini Live Agent Challenge | Category: Live Agents πŸ—£οΈ

πŸ”— Live Demo Β·πŸ—οΈ Architecture

[Proof_of_GCP_Deployement]

GCP_Deploy_Logs.1.mp4

πŸ’‘ The Problem

Practicing for technical interviews is one of the most stressful parts of a job search:

  • Real mock interviews cost $150–300/session β€” inaccessible for most candidates
  • Candidates make the same mistakes repeatedly β€” filler words, bad posture, unstructured answers β€” without ever knowing it
  • Existing AI chatbots are text-only, missing the critical non-verbal dimensions that real interviewers evaluate

πŸš€ The Solution

InterviewAce is a real-time, multimodal AI interview coach that replicates a genuine Google Meet-style technical interview. A live AI hiring manager ("Coach Ace") sees your body language through the camera, hears your filler words through the microphone, and speaks to you naturally through native voice β€” all powered by the Gemini Live API and Google ADK.

No text boxes. No typing. Just a real conversation with an AI that actually watches and listens.


✨ Key Features

Feature How It Works
πŸ—£οΈ Native Audio Voice Real-time bidirectional audio via Gemini 2.5 Flash Native Audio. Sub-500ms latency. Supports natural interruptions and barge-in.
πŸ‘€ Live Camera Vision Webcam frames streamed at adaptive 0.33-1 fps for real-time body language analysis (posture, eye contact, expression, gestures). Bandwidth-adaptive.
πŸ“Š 13 ADK Background Tools Silent analysis tools fire automatically during the interview β€” filler word detection, STAR method evaluation, voice confidence (pace/volume/tone/pauses), body language (gestures/expressions), and dynamic difficulty scaling.
πŸ” Google Search Grounding ADK's built-in google_search tool prevents hallucination of company interview facts. Expanded grounding for 8+ companies.
🏒 Company-Specific Styles Google, Amazon (Leadership Principles), Meta, Apple, Microsoft, Netflix, Airbnb, Stripe, Uber interview question frameworks.
πŸ“ Session Report & Transcript Full performance breakdown with downloadable transcript after every session.
🎨 Google Meet Replica UI Pixel-perfect Meet interface with closed captions, volume visualizers, participant panel, chat sidebar, and session timer.
⚑ Adaptive Performance Dynamic difficulty scaling based on candidate performance. Bandwidth-adaptive video streaming. Robust interruption handling.

πŸ—οΈ Architecture

InterviewAce follows the official ADK bidirectional streaming pattern (LiveRequestQueue) for real-time voice + vision interaction.

graph TB
    subgraph "πŸ–₯️ Browser (Vanilla JS)"
        MIC[🎀 Microphone<br/>PCM 16kHz] --> WS
        CAM[πŸ“· Camera<br/>JPEG 1fps] --> WS
        WS[WebSocket Client] <--> |Audio + Images + JSON| BACKEND
        WS --> PLAYER[πŸ”Š Audio Player<br/>PCM Playback]
        WS --> CC[πŸ’¬ Closed Captions]
        WS --> ANALYTICS[πŸ“Š Live Analytics<br/>Sidebar]
    end

    subgraph "βš™οΈ FastAPI Backend (Python)"
        BACKEND[WebSocket Server<br/>main.py] --> LRQ[LiveRequestQueue]
        LRQ --> RUNNER[ADK Runner]
        RUNNER --> SESSION[InMemorySessionService]
    end

    subgraph "πŸ€– Google ADK Agent"
        RUNNER <--> |Bidi Stream| GEMINI[Gemini 2.5 Flash<br/>Native Audio + Vision]
        GEMINI --> |Autonomous Tool Calls| TOOLS
    end

    subgraph "πŸ”§ 11 Custom Tools (3 Tiers)"
        direction LR
        subgraph "Tier 1: Core Analysis"
            T1[save_session_feedback]
            T2[detect_filler_words]
            T3[analyze_body_language]
            T4[evaluate_star_method]
        end
        subgraph "Tier 2: Deep Coaching"
            T5[analyze_voice_confidence]
            T6[get_improvement_tips]
            T7[fetch_grounding_data]
        end
        subgraph "Tier 3: Reporting"
            T8[get_session_history]
            T9[save_session_recording]
            T10[generate_session_report]
        end
        subgraph "Grounding"
            T11[google_search<br/>ADK Built-in]
        end
    end

    TOOLS --> T1 & T2 & T3 & T4 & T5 & T6 & T7 & T8 & T9 & T10 & T11

    subgraph "☁️ Google Cloud"
        CR[Cloud Run<br/>Serverless Container]
        GCR[Container Registry]
    end

    CR --> BACKEND
Loading

Data Flow

1. User speaks β†’ Mic captures PCM audio β†’ WebSocket β†’ FastAPI β†’ LiveRequestQueue β†’ Gemini Live API
2. User's camera β†’ JPEG frame (1 fps) β†’ WebSocket β†’ FastAPI β†’ LiveRequestQueue β†’ Gemini Vision
3. Gemini responds β†’ Native audio bytes β†’ WebSocket β†’ Browser AudioPlayer β†’ User hears voice
4. Gemini calls tools β†’ ADK executes silently β†’ Results update sidebar analytics in real-time
5. Gemini transcribes β†’ Input/Output transcription β†’ Closed Captions rendered in UI

πŸ”§ How the Agent Works

The Agent: Coach Ace

Coach Ace is a single ADK Agent with a carefully engineered persona β€” a senior hiring manager with 15 years at Google, Meta, Amazon, and Apple. The agent:

  1. Greets the candidate naturally when they join the meeting
  2. Generates dynamic interview questions based on the selected role, company style, and difficulty level
  3. Listens to answers and provides natural, human-like follow-ups ("Okay, and what was the actual outcome there?")
  4. Silently analyzes the candidate every 2-3 answers using background tool calls:
    • save_session_feedback β€” Scores confidence, clarity, content, body language (0-100)
    • detect_filler_words β€” Counts "um", "uh", "like", "you know", etc.
    • analyze_body_language β€” Rates posture, eye contact, expression from camera frames
    • evaluate_star_method β€” Checks if the answer followed Situation-Task-Action-Result structure
  5. Generates a comprehensive session report when the interview ends

Tool Tier Architecture

Tier Tools Purpose
Tier 1 save_session_feedback, detect_filler_words, analyze_body_language, evaluate_star_method Core real-time analysis β€” fires every 2-3 answers
Tier 2 analyze_voice_confidence, get_improvement_tips, fetch_grounding_data, adjust_difficulty_level Deeper coaching β€” voice/tone/pause analysis, targeted tips, verified knowledge base, dynamic scaling
Tier 3 get_session_history, save_session_recording, generate_session_report Session management and comprehensive reporting
Grounding google_search (ADK built-in) Prevents hallucination of company-specific interview facts

Grounding & Anti-Hallucination

InterviewAce uses two grounding mechanisms:

  1. fetch_grounding_data β€” Returns verified interview coaching knowledge from a curated local database (grounding_data.py) covering STAR method, body language tips, voice delivery, and common mistakes
  2. google_search (ADK built-in) β€” When the candidate asks about a specific company's interview process, the agent searches for real, current information rather than guessing

🎨 UI Features

The frontend is a pixel-perfect Google Meet replica built entirely with vanilla JavaScript:

  • 3 Video Tiles β€” Coach Ace (AI interviewer), Elena (AI notetaker), You (with live webcam feed)
  • Volume Visualizer Rings β€” Animated concentric rings pulse in real-time when audio is detected
  • Equalizer Bars β€” 3-bar equalizer animation replaces the mic icon when speaking
  • Closed Captions β€” Real-time CC powered by Gemini's input/output transcription
  • Live Analytics Sidebar β€” Confidence, Clarity, STAR Score, Body Language bars update in real-time
  • Filler Word Counter β€” Running count with detected words and coaching tips
  • Body Language Indicators β€” Eye contact, posture, expression ratings with color-coded dots
  • STAR Method Badges β€” S, T, A, R badges light up green as you hit each component
  • Session Timer β€” MM:SS elapsed time counter during active interviews
  • Chat Sidebar β€” In-call messaging panel (Google Meet style)
  • People Panel β€” Shows all 3 participants with mic status
  • Meeting Details β€” Coupable meeting link and session parameters
  • Toast Notifications β€” Contextual notifications for mic/camera/CC toggles
  • Feedback Modal β€” Full score breakdown with performance tier and downloadable transcript

πŸ’» Getting Started

Prerequisites

Installation

# Clone the repository
git clone https://github.com/SameerAliKhan-git/IntyerviewBit.git
cd IntyerviewBit/interviewace

# Create virtual environment
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate
# Activate (Mac/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Environment Variables

Create a .env file in the interviewace/ directory:

GOOGLE_API_KEY=your_gemini_api_key_here

Run Locally

python app/main.py

Open http://localhost:8080 in your browser.

Deploy to Google Cloud Run

# Build with Cloud Build
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/interviewace .

# Deploy
gcloud run deploy interviewace \
  --image gcr.io/YOUR_PROJECT_ID/interviewace \
  --region us-central1 \
  --platform managed \
  --allow-unauthenticated \
  --port 8080 \
  --memory 1Gi \
  --session-affinity \
  --set-env-vars "GOOGLE_API_KEY=YOUR_KEY,GOOGLE_GENAI_USE_VERTEXAI=FALSE"

πŸ“ Project Structure

IntyerviewBit/
β”œβ”€β”€ README.md
β”œβ”€β”€ cloudbuild.yaml                    # Google Cloud Build config
└── interviewace/
    β”œβ”€β”€ Dockerfile                     # Cloud Run container
    β”œβ”€β”€ .dockerignore
    β”œβ”€β”€ .env.example
    β”œβ”€β”€ requirements.txt
    └── app/
        β”œβ”€β”€ main.py                    # FastAPI + WebSocket server
        β”œβ”€β”€ interview_coach_agent/
        β”‚   β”œβ”€β”€ __init__.py
        β”‚   β”œβ”€β”€ agent.py               # ADK Agent definition (11 tools)
        β”‚   β”œβ”€β”€ prompts.py             # Agent persona & instructions
        β”‚   β”œβ”€β”€ tools.py               # All 10 custom tool implementations
        β”‚   └── grounding_data.py      # Verified coaching knowledge base
        └── static/
            β”œβ”€β”€ index.html             # Single-page app (Google Meet UI)
            β”œβ”€β”€ favicon.ico
            β”œβ”€β”€ css/
            β”‚   └── style.css          # Complete Meet-style CSS
            └── js/
                β”œβ”€β”€ app.js             # Main application logic
                β”œβ”€β”€ audio-player.js    # PCM audio playback engine
                β”œβ”€β”€ audio-recorder.js  # Mic capture + PCM encoding
                └── camera.js          # Webcam frame capture (1 fps)

πŸ› οΈ Technologies

Technology Usage
Google ADK Agent orchestration, tool execution, LiveRequestQueue
Gemini 2.5 Flash Native Audio Real-time voice interaction via Live API
Gemini Vision Body language analysis from webcam frames
Google Search (ADK built-in) Grounding to prevent hallucination
Google Cloud Run Serverless container hosting
FastAPI WebSocket server bridging browser ↔ ADK
Uvicorn ASGI server with WebSocket support
Vanilla JavaScript Frontend UI (no framework dependencies)
Web Audio API PCM audio recording and playback
MediaDevices API Camera frame capture

πŸ† Hackathon Criteria Alignment

Criterion How InterviewAce Addresses It
Beyond the text box Fully voice-driven. Camera vision. No text input needed at any point.
Live API usage Native bidiGenerateContent streaming via ADK LiveRequestQueue
Multimodal input Audio (PCM 16kHz) + Vision (JPEG 1fps) streamed simultaneously
Tool use 11 tools across 3 tiers β€” all called autonomously by the model
Grounding google_search (ADK built-in) + fetch_grounding_data (local KB)
Google Cloud Deployed on Cloud Run with Dockerfile + cloudbuild.yaml
User experience Pixel-perfect Google Meet replica with real-time analytics

πŸ‘₯ Built by:

Built with ❀️ for the Gemini Live Agent Challenge by Sameer


Powered by Google ADK Β· Gemini Live API Β· Google Cloud Run

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors