Skip to content

aayushjainx/ShopTalk-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

🤖 AI-Powered Mobile Phone Shopping Assistant

A sophisticated AI-powered shopping assistant that helps users discover, compare, and learn about mobile phones through natural language conversations. Built with modern web technologies and powered by Google's Gemini AI.

🌟 Live Demo [cold start]

🎯 Project Overview

This project is an intelligent shopping assistant that leverages AI to provide personalized mobile phone recommendations. Users can interact with the system using natural language queries, and the AI will understand their intent, search through a comprehensive phone database using vector search, and provide detailed comparisons, specifications, and recommendations.

Key Features

  • 🧠 AI-Powered Intent Classification - Understands user queries and categorizes them intelligently
  • 🎯 Intelligent Filter Extraction - Automatically detects brand, minimum price, and maximum price from natural language queries
  • 🔍 Semantic Search - Vector-based search for finding relevant phones with auto-applied filters
  • 💬 Natural Language Processing - Conversational interface for easy interaction
  • 📱 Comprehensive Phone Database - Detailed specifications and comparisons
  • 🔒 Safety-First Approach - Built-in safety filters and content moderation
  • 💾 Session Management - Persistent conversation history
  • Real-time Streaming - Live AI responses for better user experience
  • 📊 Context-Aware Responses - Uses conversation history for better recommendations

🛠️ Tech Stack

Frontend

  • React 19.1.1 - Modern UI framework
  • TypeScript - Type-safe development
  • Vite - Fast build tool and development server
  • Axios - HTTP client for API communication
  • React Router - Client-side routing
  • Lucide React - Beautiful icon library

Backend

  • Node.js 22.12.0 - JavaScript runtime
  • Express.js - Web application framework
  • TypeScript - Type-safe development

Database & AI

  • Convex - Real-time database with vector search capabilities
  • Google Gemini 2.0 Flash - AI model for text generation and embeddings
  • Vector Search - Semantic search using AI embeddings

Deployment

  • Render - Cloud platform for both frontend and backend
  • GitHub - Version control and CI/CD

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend API   │    │   Convex DB     │
│   (React)       │◄──►│   (Express)     │◄──►│   (Vector DB)   │
│                 │    │                 │    │                 │
│ • Chat UI       │    │ • Intent        │    │ • Phone Data    │
│ • Session Mgmt  │    │   Classification│    │ • Vector Search │
│ • State Mgmt    │    │ • AI Integration│    │ • Sessions      │
└─────────────────┘    │ • Safety Filters│    └─────────────────┘
                       │ • API Routes    │
                       └─────────────────┘
                                │
                                ▼
                       ┌─────────────────┐
                       │   Gemini AI     │
                       │   (Google)      │
                       │                 │
                       │ • Text Gen      │
                       │ • Embeddings    │
                       │ • Intent Class  │
                       └─────────────────┘

Vector Search Architecture

┌─────────────────────────────────────────────────────────────┐
│                    User Query                               │
│             "Best camera phone under ₹30,000"               │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Context Enhancement                            │
│         (Last 10 user messages from conversation)           │
│         Combined: "query + previous context"                │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Generate Query Embedding                       │
│            (Gemini text-embedding-004)                      │
│              768 dimensions vector                          │
│              Task type: 'RETRIEVAL_QUERY'                   │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│            CONVEX NATIVE VECTOR SEARCH (ACTION)             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  ctx.vectorSearch('phones', 'by_embedding', {       │    │
│  │    vector: queryEmbedding,                          │    │
│  │    limit: searchLimit * 3,  // 3x for filtering     │    │
│  │    filter: brand ? (q) => q.eq('brand', brand)      │    │
│  │           : undefined                               │    │
│  │  })                                                 │    │
│  │  ANN Algorithm (Approximate Nearest Neighbor)       │    │
│  │  Database-level optimization                        │    │
│  │  Only supports q.eq() for filters                   │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Parallel Document Fetching                         │    │
│  │  await Promise.all(                                 │    │
│  │    phoneIds.map(id =>                               │    │
│  │      ctx.runQuery(api.phones.getPhoneById, {id})    │    │
│  │    )                                                │    │
│  │  )                                                  │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Post-Search Price Range Filtering                  │    │
│  │  phones.filter(p =>                                 │    │
│  │    (!minPrice || p.price >= minPrice) &&            │    │
│  │    (!maxPrice || p.price <= maxPrice)               │    │
│  │  ).slice(0, limit)                                  │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
              ⚡ ~0.5-1 second
              📊 O(log n) similarity search
              🚀 Scales to 1000s of phones
              💡 Returns top N after post-filtering

🔄 Complete User Flow

1. User Input Processing

User Query → Frontend → Backend API → Input Sanitization → Session Management

2. Intent Classification & Safety Check

Message → Gemini Intent Classifier → Safety Validation → Intent Routing

Intent Classification Process:

  • Intelligent Metadata Extraction: Automatically extracts brand, minimum price, and maximum price from user queries during classification
    • Brand Detection: Identifies brand names (Samsung, Apple, OnePlus, etc.) from queries
    • Price Range Extraction: Parses price constraints (under ₹30k, between ₹20k-40k, above ₹50k, etc.)
    • Auto-Applied Filters: Extracted metadata is automatically applied to vector search for more accurate results
  • Safety-First Approach: Checks for unsafe intents before processing
  • Supported Intents:
    • Unsafe: adversarial, off_topic, greeting, offensive, brand_bashing, excessive_length, internal_error
    • Safe: search, compare, details, filter, explain, general

3. Unsafe Intent Handling

Unsafe Intent → Safety Prompt → Gemini Response → User-Friendly Message

Safety Prompts: Dynamic, intent-specific responses that redirect users to smartphone topics

4. Safe Intent Processing

Safe Intent → Conversation History → Vector Search → Phone Context → AI Response

Detailed Flow:

  1. Conversation History: Retrieves last 30 messages from session
  2. Vector Search: Performs semantic search with enhanced query (up to 100 results)
  3. Phone Context: Formats found phones for AI consumption
  4. System Prompt: Builds intent-specific prompt with phone data
  5. AI Response: Generates streaming response with Gemini using all 30 messages as context

5. Vector Search Process

Query + Context → Embedding Generation → Convex Vector Search (Action) → Phone Results

Technical Details:

  • Query Enhancement: Combines current query with last 10 user messages
  • Embedding Model: Gemini text-embedding-004 (768 dimensions, task: 'RETRIEVAL_QUERY')
  • Search Algorithm: Convex native ANN (Approximate Nearest Neighbor)
  • Implementation: Convex Action (not Query) - allows external API calls
  • Performance: O(log n) complexity, 10-100x faster than manual cosine similarity

6. AI Response Generation & Streaming

Context + Phone Data + User Query → Gemini Streaming → Real-time Response

Streaming Process:

  • Server-Sent Events (SSE): Real-time response delivery
  • Chunk Processing: Processes response chunks as they arrive (sent to frontend immediately)
  • Response Accumulation: Builds full message while streaming
  • Response Validation: Validates complete AI response for safety
  • Session Storage: Stores complete validated response in conversation history

🚀 API Endpoints

Chat Endpoints

  • POST /api/chat/stream - Stream AI response in real-time (Primary endpoint)
    • Method: Server-Sent Events (SSE)
    • Request: { sessionId?, message: string }
    • Response: Streaming chunks with final data
    • Features: Real-time response delivery, conversation context

Session Endpoints

  • GET /api/sessions - Get list of all sessions
  • POST /api/sessions - Create a new session
  • GET /api/sessions/:id - Get session details and history
  • GET /api/sessions/:id/history - Get conversation history

Health & Info

  • GET /health - Health check endpoint
  • GET /api - API information and documentation

API Implementation Details

Streaming Endpoint (POST /api/chat/stream):

  • Headers: Content-Type: text/event-stream, Cache-Control: no-cache, Connection: keep-alive
  • Process:
    1. Sanitizes input and validates request
    2. Gets or creates session, adds user message
    3. Classifies intent using Gemini API
    4. If unsafe: Returns safety response immediately
    5. If safe:
      • Fetches conversation history (30 messages)
      • Builds search context (last 10 user messages)
      • Performs hybrid vector search (up to 100 phones)
      • Builds system prompt with phone context
      • Streams Gemini response in real-time
      • Validates and stores complete response
  • Response Format: data: { type: 'chunk'|'done'|'error', content/data/error }

🔧 Environment Variables

Backend (.env)

NODE_ENV=production
PORT=3001
GEMINI_API_KEY=your_gemini_api_key
GEMINI_API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent
CONVEX_URL=your_convex_url
CONVEX_DEPLOY_KEY=your_convex_deploy_key
FRONTEND_URL=https://shopping-chat-frontend.onrender.com

Frontend (.env)

VITE_API_URL=https://shopping-chat-agent.onrender.com/api
NODE_ENV=production

📦 Installation & Setup

Prerequisites

  • Node.js 22.12.0 or higher
  • npm or yarn
  • Convex account
  • Google Gemini API key

Backend Setup

cd backend
npm install
cp .env.example .env
# Add your environment variables
npm run build
npm start

Frontend Setup

cd frontend
npm install
cp .env.example .env
# Add your environment variables
npm run build
npm run preview

Database Setup

cd backend
npx convex dev
# Follow the setup instructions
npx convex deploy

🧠 AI Integration Details

Intent Classification

The system uses Google Gemini 2.0 Flash to classify user intents with a safety-first approach:

Implementation Details:

  • Safety-First: Checks for unsafe intents before processing safe ones
  • JSON Response: Returns structured intent classification with confidence scores
  • Fallback Handling: Graceful degradation for API failures or parsing errors

Classification Process:

  1. Input Validation - Checks message length (max 1000 chars) and basic validation
  2. Gemini Classification - Sends message to Gemini 2.0 Flash with classification prompt
  3. Response Parsing - Parses JSON response with safety-first intent structure
  4. Metadata Extraction - Automatically extracts shopping filters from query:
    • Brand: Detects brand names (Samsung, Apple, OnePlus, Xiaomi, etc.)
    • Min Price: Extracts minimum price constraints (e.g., "above ₹20k" → minPrice: 20000)
    • Max Price: Extracts maximum price constraints (e.g., "under ₹30k" → maxPrice: 30000)
    • Examples:
      • "Samsung phones under 30k" → {brand: "Samsung", maxPrice: 30000}
      • "iPhones between 50k and 80k" → {brand: "Apple", minPrice: 50000, maxPrice: 80000}
  5. Intent Validation - Ensures intent is in supported list (unsafe checked first)
  6. Safety Check - Determines if intent requires safety handling before processing
  7. Filter Application - Extracted metadata automatically applied to vector search for precise results

🔍 Embeddings vs Vector Index/Search - Deep Dive

What is an Embedding?

An embedding is a numerical representation (vector) of text/data that captures its semantic meaning. Think of it as converting words into numbers that a computer can understand and compare mathematically.

In this project:

  • Each phone's specifications are converted into a 768-dimensional vector (array of 768 numbers)
  • Generated using Google Gemini's text-embedding-004 model
  • Stored in the embedding field of each phone document in Convex database

What is a Vector Index?

A vector index is a special database index (like by_embedding) that enables fast similarity search across embeddings. It's a data structure that organizes vectors for efficient nearest-neighbor queries.

In this project:

  • Created on the phones table using Convex's .vectorIndex() method
  • Named by_embedding and points to the embedding field
  • Uses ANN (Approximate Nearest Neighbor) algorithm for O(log n) search performance

Key Difference: Data vs Search Structure

Aspect Embedding Vector Index
What is it? The actual data (768 numbers) A search structure built on embeddings
Stored as embedding: [0.123, -0.456, ...] in phone document Index in Convex database schema
Purpose Represents semantic meaning of phone Enables fast similarity search
Created when During data seeding (one-time per phone) Defined in schema, built automatically
Used for Comparison and matching Finding similar phones quickly
Example [0.123, -0.456, 0.789, ...] by_embedding index with 768 dimensions

📊 Embedding Fields in This Project

17+ Phone Attributes Converted to Embedding:

When generating embeddings, these fields are combined into a text description:

// Fields used for embedding generation:
{
  brand: "Samsung",              // ✅ Included
  model: "Galaxy S23 Ultra",     // ✅ Included
  price: 124999,                 // ✅ Included (formatted as ₹1,24,999)
  description: "Premium flagship...", // ✅ Included

  specs: {
    ram: "12GB",                 // ✅ Included
    storage: "256GB",            // ✅ Included
    battery: "5000mAh",          // ✅ Included
    processor: "Snapdragon 8 Gen 2", // ✅ Included

    camera: {
      rear: "200MP + 12MP + 10MP + 10MP",  // ✅ Included
      front: "12MP",                        // ✅ Included
      features: ["OIS", "Night Mode", ...] // ✅ Included
    },

    display: {
      size: "6.8 inch",          // ✅ Included
      type: "Dynamic AMOLED 2X", // ✅ Included
      resolution: "1440x3088",   // ✅ Included
      refreshRate: "120Hz"       // ✅ Included
    },

    has5G: true,                 // ✅ Included (as "Yes")
    fastCharging: "45W",         // ✅ Included
    os: "Android 14"             // ✅ Included
  }
}

// NOT included in embedding:
imageUrl: "https://...",         // ❌ Not included (not searchable)
_id: "abc123",                   // ❌ Not included (database ID)
createdAt: 1234567890,           // ❌ Not included (metadata)
updatedAt: 1234567890            // ❌ Not included (metadata)

How Text is Generated for Embedding:

function generatePhoneText(phone) {
	return [
		// 1. Brand & Model
		`${phone.brand} ${phone.model}`,

		// 2. Price (formatted)
		`Price: ₹${phone.price.toLocaleString('en-IN')}`,

		// 3. Description
		phone.description,

		// 4. Core Specs
		`RAM: ${specs.ram}`,
		`Storage: ${specs.storage}`,
		`Battery: ${specs.battery}`,
		`Processor: ${specs.processor}`,

		// 5. Display Info
		`Display: ${specs.display.size} ${specs.display.type} ` + `${specs.display.resolution} ${specs.display.refreshRate}`,

		// 6. Camera Info
		`Rear Camera: ${specs.camera.rear}`,
		`Front Camera: ${specs.camera.front}`,
		`Camera Features: ${specs.camera.features.join(', ')}`,

		// 7. Connectivity & Other
		`5G: ${specs.has5G ? 'Yes' : 'No'}`,
		`Fast Charging: ${specs.fastCharging}`,
		`OS: ${specs.os}`,
	].join('. ');
}

// Example output:
// "Samsung Galaxy S23 Ultra. Price: ₹1,24,999. Premium flagship
//  with best-in-class camera. RAM: 12GB. Storage: 256GB. Battery:
//  5000mAh. Processor: Snapdragon 8 Gen 2. Display: 6.8 inch
//  Dynamic AMOLED 2X 1440x3088 120Hz. Rear Camera: 200MP + 12MP +
//  10MP + 10MP. Front Camera: 12MP. Camera Features: OIS, 100x Space
//  Zoom, Night Mode, 8K Video. 5G: Yes. Fast Charging: 45W. OS: Android 14."

🗂️ Vector Index Configuration

The vector index is defined in the Convex schema (backend/convex/schema.ts):

phones: defineTable({
	brand: v.string(),
	model: v.string(),
	price: v.number(),
	// ... all phone fields ...
	embedding: v.optional(v.array(v.float64())), // ← The embedding data
	// ... metadata fields ...
}).vectorIndex('by_embedding', {
	vectorField: 'embedding', // ← Points to embedding field
	dimensions: 768, // ← Gemini embedding size
	filterFields: ['brand'], // ← Can filter by brand at search time
});

Vector Index Properties:

  • Name: by_embedding (used in ctx.vectorSearch('phones', 'by_embedding', {...}))
  • Vector Field: embedding (the field containing the 768 numbers)
  • Dimensions: 768 (must match Gemini's embedding output)
  • Filter Fields: ['brand'] (can filter by brand using q.eq('brand', 'Samsung'))
  • Price Filtering: ❌ NOT supported in vector index (done post-search)

🔄 Complete Embedding & Search Workflow

┌─────────────────────────────────────────────────────────────────┐
│                     ONE-TIME SETUP                              │
├─────────────────────────────────────────────────────────────────┤
│  1. Seed Database with Phones                                   │
│     └─> Phone data (brand, model, specs, etc.)                  │
│                                                                 │
│  2. Run generateEmbeddings.ts Script                            │
│     ├─> Fetches all phones from database                        │
│     ├─> For each phone:                                         │
│     │   ├─> Combines 17+ fields into text                       │
│     │   ├─> Calls Gemini API to generate embedding              │
│     │   └─> Stores 768-dimensional vector in 'embedding' field  │
│     └─> Result: Each phone has embedding field populated        │
│                                                                 │
│  3. Convex Auto-Builds Vector Index                             │
│     └─> 'by_embedding' index ready for searching                │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                     RUNTIME SEARCH                              │
├─────────────────────────────────────────────────────────────────┤
│  1. User Query: "Best camera phone under ₹30,000"               │
│                                                                 │
│  2. Generate Query Embedding                                    │
│     ├─> Enhance with conversation context                       │
│     ├─> Call Gemini API: generateEmbedding(query)               │
│     └─> Result: 768-dimensional query vector                    │
│                                                                 │
│  3. Vector Search via Index                                     │
│     ├─> Use 'by_embedding' vector index                         │
│     ├─> ctx.vectorSearch('phones', 'by_embedding', {            │
│     │     vector: queryEmbedding,                               │
│     │     limit: 30,                                            │
│     │     filter: brand ? (q) => q.eq('brand', brand)           │
│     │   })                                                      │
│     ├─> Index finds nearest embeddings (similarity)             │
│     └─> Returns phone IDs sorted by similarity                  │
│                                                                 │
│  4. Fetch & Filter Results                                      │
│     ├─> Fetch full phone documents                              │
│     ├─> Post-filter by price (₹30,000 max)                      │
│     └─> Return top N results                                    │
└─────────────────────────────────────────────────────────────────┘

🎯 Why This Approach Works

  1. Semantic Understanding:

    • "Best camera phone" matches phones with "200MP", "OIS", "Night Mode" in embedding
    • No need for exact keyword matches
  2. Multi-Field Search:

    • Single query searches across brand, specs, features simultaneously
    • "Gaming phone under ₹50k" matches processor + refresh rate + price
  3. Contextual Search:

    • "Show me something similar" works because embeddings capture meaning
    • Previous conversation enhances current query
  4. Fast Performance:

    • Vector index enables O(log n) search instead of O(n)
    • Can search 1000s of phones in milliseconds

Response Generation

AI-powered response generation with streaming:

Implementation Details:

  • Model: Gemini 2.0 Flash with streaming support (streamGenerateContent API)
  • Context Building: Combines user query with phone data and 30 messages conversation history
  • System Prompt Integration: Prepended to user message (Gemini only accepts 'user'/'model' roles)
  • Prompt Engineering: Uses intent-specific prompts with formatted phone context
  • Streaming: Server-Sent Events (SSE) for real-time chunk-by-chunk response delivery
  • Safety Validation: Validates complete AI response after streaming completes

Streaming Process:

  1. History Fetch - Fetches last 30 messages once (optimized - single DB call)
  2. Context Preparation - Builds search context (10 user msgs) and Gemini context (all 30)
  3. Vector Search - Performs hybrid search with enhanced query (up to 100 results)
  4. Phone Context - Formats found phones into readable text for AI
  5. System Prompt - Creates intent-specific prompt with phone data (prepended to user message)
  6. Gemini Streaming - Sends request with conversation history to streamGenerateContent
  7. Chunk Processing - Processes chunks as they arrive, sends to frontend via SSE
  8. Response Accumulation - Builds complete message while streaming
  9. Response Validation - Validates complete response for safety
  10. Session Storage - Stores validated response in conversation history with metadata

🔒 Safety & Security

Multi-Layer Safety System

  • Input Sanitization - Cleans and validates user input
  • Intent Safety - Blocks malicious or inappropriate intents
  • Response Validation - Validates AI responses for safety
  • Content Moderation - Filters inappropriate content

Privacy & Data Protection

  • Session Management - Anonymous sessions with expiration
  • Data Minimization - Only stores necessary conversation data
  • Secure Communication - HTTPS for all communications
  • No Personal Data - No collection of personal information

📊 Performance Optimizations

Backend Optimizations

  • Connection Pooling - Efficient database connections
  • Streaming - Real-time response streaming
  • Error Handling - Comprehensive error handling and recovery

Frontend Optimizations

  • Code Splitting - Lazy loading for better performance
  • Bundle Optimization - Optimized build output
  • State Management - Efficient state updates
  • Responsive Design - Mobile-first responsive design

🚀 Deployment

Render Deployment

Both frontend and backend are deployed on Render:

  1. Backend - Web service with auto-deploy from GitHub
  2. Frontend - Static site with build optimization
  3. Environment Variables - Secure configuration management
  4. Health Checks - Automated health monitoring

CI/CD Pipeline

  • GitHub Integration - Automatic deployments on push
  • Build Optimization - Optimized build processes
  • Environment Management - Separate environments for dev/prod

If you have any questions or need help, please:

  1. Check the Issues page
  2. Create a new issue with detailed information
  3. Contact us at your-email@example.com

Built with ❤️ and AI 🤖✨