Introduction
PerfectPick is a production-grade AI recommendation system designed to deliver personalized smartphone recommendations from Flipkart's product catalog. It integrates classical information retrieval (BM25), modern semantic embeddings (BGE), and large language model (LLM)-based generation to provide accurate, context-aware recommendations. The system is built for scalability, leveraging a modular architecture deployed on Google Cloud Platform (GCP) using Docker, Kubernetes, Prometheus, and Grafana. The project is live at http://34.14.203.32:8080/, accessible for testing and evaluation.
This README provides a comprehensive guide to PerfectPick’s architecture, components, deployment workflow, and monitoring setup, suitable for researchers, developers, and DevOps engineers. It covers the system from data ingestion to production, with a focus on reproducibility, scalability, and observability.
- Project Overview
- Core Features
- System Architecture
- Directory Structure
- Component-Level Breakdown
- Technology Stack
- Setup and Local Development
- Deployment Workflow
- Monitoring and Observability
- Scalability and Performance
- Future Improvements
- License
PerfectPick is an end-to-end recommendation system that processes user queries (e.g., "best budget phone under 15K with good camera") to suggest relevant smartphones from a dataset of 3903 Flipkart products (1456 unique models). It combines sparse retrieval (BM25), dense vector search (Astra DB), and LLM-based generation (OpenAI/Groq) to achieve high relevance and low latency (<2s per query). The system uses Supabase PostgreSQL for session memory and is deployed on GCP with Kubernetes for orchestration, monitored via Prometheus and Grafana.
Objectives:
- Deliver accurate, personalized recommendations using hybrid retrieval.
- Ensure scalability for high query volumes.
- Provide a modular, maintainable codebase for research and production.
- Enable observability through comprehensive monitoring.
Current Status (October 03, 2025):
- Developed on branch
feat/production-flask-api. - Core functionality (ingestion, retrieval, API) complete.
- Supabase connectivity stabilized with IPv4/IPv6 fallback.
- Live deployment: http://136.116.202.138/.
- Hybrid Retrieval: Combines BM25 (keyword-based) and vector search (semantic) with neural reranking (BAAI/bge-reranker-base).
- Session Memory: Stores user interaction history in Supabase PostgreSQL for multi-turn personalization.
- Vector Storage: Astra DB for efficient embedding storage and retrieval.
- Production API: Flask-based endpoints for recommendations and health checks.
- Cloud-Native Deployment: Dockerized application orchestrated with Kubernetes on GCP.
- Monitoring: Prometheus for metrics collection, Grafana for visualization.
- Scalability: Horizontal Pod Autoscaler (HPA) for dynamic scaling.
- Modularity: Independent modules for ingestion, retrieval, and generation.
PerfectPick follows a layered, microservices-inspired architecture:
- Frontend Layer: Minimal HTML/CSS interface for user queries and results.
- Backend Layer: Flask API handling requests and orchestrating retrieval/generation.
- Data Layer: Processes and validates Flipkart CSV data.
- Vector Store Layer: Astra DB for storing product embeddings.
- Session Layer: Supabase PostgreSQL for session persistence.
- Monitoring Layer: Prometheus and Grafana for system observability.
- Deployment Layer: Docker containers managed by Kubernetes on GCP.
bhupencoD3-PerfectPick/
├── app.py
├── main.py
├── requirements.txt
├── Dockerfile
├── .env.example
├── data/
│  └── Flipkart_Mobiles_cleaned.csv
├── perfectpick/
│  ├── config.py
│  ├── data_converter.py
│  ├── data_ingestion.py
│  ├── retrieval.py
│  ├── generation.py
│  ├── recommender.py
│  ├── service.py
│  ├── session_memory.py
├── utils/
│  ├── logger.py
│  ├── custom_exception.py
│  ├── validators.py
├── templates/
│  └── index.html
├── static/
│  └── style.css
├── tests/
│  ├── test_config.py
│  ├── test_data_converter.py
│  ├── test_data_ingestion.py
│  ├── test_recommender.py
│  ├── test_service.py
├── notebooks/
│  └── exploration.ipynb
├── docs/
│  └── index.html
├── deployment/
│  ├── k8s/
│  │  ├── deployment.yml
│  │  ├── service.yml
│  │  ├── hpa.yml
│  ├── prometheus/
│  │  └── prometheus.yml
│  ├── grafana/
│  │  ├── dashboards/
│  │  ├── datasource.yml
│  ├── deploy-all.sh
│  ├── entrypoint.sh
│  └── SETUP.md
-
config.py-
Purpose: Loads and validates environment variables from
.env. -
Features: Validates
DB_URL, API keys; logs initialization status. -
Dependencies:
os,python-dotenv,logging.
-
Purpose: Loads and validates environment variables from
-
data_converter.py- Purpose: Transforms raw product data into embedding-ready format.
- Features: Normalizes prices, cleans text fields.
-
Dependencies:
pandas.
-
data_ingestion.py- Purpose: Loads CSV data and indexes embeddings in Astra DB.
-
Features: Processes 3903 rows; uses LangChain AstraDBVectorStore (collection:
flipkart_recommendation); Hugging Face embeddings (BAAI/bge-base-en-v1.5). Each product is vectorized based on a concatenated string of its key features (model,RAM,storage,camera,processor). The full product metadata is stored alongside the vector. -
Dependencies:
pandas,langchain_astradb,sentence-transformers.
-
retrieval.py- Purpose: Implements hybrid retrieval (BM25 + vector search).
-
Features: BM25 index (3917 documents) for keyword-based matches; vector search (top-20) for semantic intent (e.g., "good battery life"); neural reranking (BAAI/bge-reranker-base) to re-score the top-40 combined results, yielding a final top-5 product set. Price filtering segments the data (Budget: ₹0-15K, Mid-range: ₹15-30K, Premium: ₹30-70K, Flagship: ₹70K+). The initial hybrid score is calculated as: $\text{Score}{\text{hybrid}}(q, d) = \alpha \cdot \text{Score}{\text{BM25}}(q, d) + (1-\alpha) \cdot \text{Score}_{\text{BGE}}(q, d)$, with
$\alpha = 0.3$ . -
Dependencies:
rank_bm25,FlagEmbedding,langchain.
-
generation.py- Purpose: Generates natural language responses using LLMs.
- Features: Integrates OpenAI/Groq with Retrieval-Augmented Generation (RAG). The LLM uses a system prompt to enforce a persona ("Expert Mobile Consultant") and a strict output format. Groq is prioritized for low-latency inference, with OpenAI as a high-quality fallback.
-
Dependencies:
openai,groq.
-
recommender.py- Purpose: Orchestrates retrieval and generation for recommendations.
- Features: The module follows a clear RAG workflow: 1. Retrieve Session history. 2. Hybrid Search for candidate products. 3. Rerank candidates. 4. Contextualize by assembling the final RAG prompt, including the current query, session memory, and the top-5 documents. 5. Generate the response via LLM. 6. Store Session for personalization.
-
Dependencies: internal modules (
retrieval,generation,session_memory).
-
service.py-
Purpose: Defines Flask API endpoints (
/recommend,/health,/metrics). - Features: Handles JSON requests, returns structured responses.
-
Dependencies:
flask.
-
Purpose: Defines Flask API endpoints (
-
session_memory.py- Purpose: Manages user session history in Supabase PostgreSQL.
-
Features: Uses
psycopg2client with IPv4/IPv6 fallback for reliability. Thesession_memorytable stores the conversational context as a JSONB object, enabling multi-turn recommendations by providing the LLM with past queries and results for context-aware reasoning. -
Dependencies:
psycopg2,collections.deque,socket.
-
logger.py- Purpose: Provides structured JSON logging.
- Features: Timestamps, log levels, JSON formatting.
- Dependencies:
logging,json.
-
custom_exception.py- Purpose: Defines custom exceptions for consistent error handling.
- Features: Structured error messages for debugging.
- Dependencies: None.
-
validators.py- Purpose: Validates input data and API requests.
- Features: Checks query format, data integrity.
- Dependencies: None.
-
app.py- Purpose: Main Flask application entrypoint.
- Features: Initializes routes, logging, services; handles
/recommend,/health,/metrics. - Dependencies:
flask, internal modules.
-
templates/index.html- Purpose: Basic HTML frontend for user interaction.
- Features: Query input, recommendation display.
- Dependencies: None.
-
static/style.css- Purpose: Styles frontend interface.
- Features: Minimal, responsive design.
- Dependencies: None.
-
data/Flipkart_Mobiles_cleaned.csv- Purpose: Source dataset with 3903 products (8 columns:
model,price,RAM,storage,camera,battery,display,processor). - Features: Cleaned, UTF-8 encoded.
- Purpose: Source dataset with 3903 products (8 columns:
-
notebooks/exploration.ipynb- Purpose: Exploratory data analysis and model prototyping.
- Features: Visualizations, embedding experiments.
- Dependencies:
jupyter,pandas,matplotlib.
-
docs/index.html- Purpose: HTML documentation for contributors.
- Features: API specs, module descriptions.
- Purpose: Validates module functionality and integration.
- Files:
test_config.py: Tests environment variable loading.test_data_converter.py: Validates data preprocessing.test_data_ingestion.py: Ensures CSV loading and indexing.test_recommender.py: Tests recommendation pipeline.test_service.py: Verifies API endpoints.
- Dependencies:
pytest.
- Purpose: Configures production deployment and monitoring.
- Files:
Dockerfile: Builds application image (Python 3.12, non-root user).deploy-all.sh: Automates Kubernetes deployment.entrypoint.sh: Container startup script.k8s/deployment.yml: Defines 3-replica deployment.k8s/service.yml: LoadBalancer for API access.k8s/hpa.yml: Autoscales pods (3-10 replicas, 50% CPU).prometheus/prometheus.yml: Configures metrics scraping.grafana/datasource.yml: Links Grafana to Prometheus.grafana/dashboards/: Prebuilt dashboards for CPU, memory, latency.SETUP.md: Deployment instructions.
| Category | Technology | Purpose |
|---|---|---|
| API | Flask | RESTful endpoints |
| Data Storage | Supabase PostgreSQL | Session memory |
| Vector Storage | Astra DB | Product embeddings |
| Retrieval Models | BM25, BGE Embeddings (BAAI/bge-base-en-v1.5), BGE Reranker (BAAI/bge-reranker-base) |
Hybrid retrieval and reranking |
| Generation LLMs | OpenAI, Groq | LLM integration for response generation |
| Containerization | Docker (BuildKit-enabled) | Application packaging |
| Orchestration | Kubernetes (GKE) | Container management |
| Cloud | Google Cloud Platform (GCE, GKE, Secret Manager) | Hosting and infrastructure |
| Monitoring | Prometheus, Grafana | Metrics collection and visualization |
| Testing | Pytest | Unit and integration tests |
| Utilities | python-dotenv, pandas, sentence-transformers |
Configuration, data manipulation, embedding utilities |
Prerequisites:
- OS: Arch Linux (or Ubuntu/Debian)
- Python: 3.12+ (use
pyenv) - Docker: Install via
sudo pacman -S docker - Git: For repository cloning
Setup Steps:
- Clone:
git clone <repo-url>; cd bhupencoD3-PerfectPick - Virtual Env:
python -m venv venv; source venv/bin/activate; pip install -r requirements.txt - Configure
.env:DB_URL=postgresql://postgres.lxbououtadfxarleksun:<password>@aws-1-ap-southeast-1.pooler.supabase.com:6543/postgres?sslmode=require OPENAI_API_KEY=sk-proj-... GROQ_API_KEY=gsk_... ASTRA_DB_API_ENDPOINT=https://ad6829b0-39f3-43aa-9a13-43036ed6bed2-us-east-2.apps.astra.datastax.com ASTRA_DB_APPLICATION_TOKEN=AstraCS:... ASTRA_DB_KEYSPACE=default_keyspace HF_TOKEN=hf_qtSBHF... DATA_FILE_PATH=data/Flipkart_Mobiles_cleaned.csv FLASK_ENV=development FLASK_PORT=8000
- Run:
python app.py - Test:
curl -X POST http://localhost:8000/recommend -H "Content-Type: application/json" -d '{"query": "best budget phone"}'
Docker Setup:
- Build:
export DOCKER_BUILDKIT=1; docker build -t perfectpick:latest . - Run:
docker run --env-file .env -p 8000:8000 --name perfectpick perfectpick:latest - Compose:
docker-compose up --build(usingdocker-compose.yml)
Overview: PerfectPick is deployed on GCP using Google Kubernetes Engine (GKE) for orchestration, Artifact Registry for images, Secret Manager for credentials, and Cloud Build for CI/CD. The live instance is accessible at http://136.116.202.138/.
Steps:
- Enable GCP APIs: GKE, Cloud Build, Artifact Registry, Secret Manager.
- Create GKE Cluster:
gcloud container clusters create perfectpick-cluster --zone us-central1-a --machine-type e2-medium --num-nodes 3 --enable-ip-alias --enable-autoscaling --min-nodes 1 --max-nodes 5
- Store Secrets:
gcloud secrets create db-url --data-file=<file> gcloud secrets create openai-key --data-file=<file>
- Build and Push Image:
gcloud auth configure-docker docker tag perfectpick:latest gcr.io/<project-id>/perfectpick:latest docker push gcr.io/<project-id>/perfectpick:latest
- Apply Kubernetes Manifests:
kubectl apply -f deployment/k8s/
- Verify:
kubectl get pods kubectl port-forward service/perfectpick-service 8080:80
- Access: http://136.116.202.138/
- Scale:
kubectl scale deployment perfectpick --replicas=3 kubectl apply -f deployment/k8s/hpa.yml
Cloud Build CI/CD (cloudbuild.yaml):
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/perfectpick:$COMMIT_SHA', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/perfectpick:$COMMIT_SHA']
- name: 'gcr.io/cloud-builders/gke-deploy'
args:
- run
- --filename=deployment/k8s/
- --image=gcr.io/$PROJECT_ID/perfectpick:$COMMIT_SHA
- --location=us-central1-a
- --cluster=perfectpick-clusterOptional Cloud SQL Migration:
- Create:
gcloud sql instances create perfectpick-db --database-version=POSTGRES_15 --tier=db-f1-micro - Update
DB_URL:postgresql://postgres:<pass>@<cloud-sql-ip>:5432/postgres - Migrate:
pg_dump <supabase-url> | gcloud sql import sql perfectpick-db -
- Enabled:
gcloud container clusters update perfectpick-cluster --monitoring=SYSTEM,WORKLOAD - Config (
deployment/prometheus/prometheus.yml):global: scrape_interval: 15s scrape_configs: - job_name: 'perfectpick' static_configs: - targets: ['perfectpick-service:8000'] metrics_path: /metrics
- Deployment (
deployment/grafana/deployment.yml):apiVersion: apps/v1 kind: Deployment metadata: name: grafana spec: replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: containers: - name: grafana image: grafana/grafana:latest ports: - containerPort: 3000 env: - name: GF_SECURITY_ADMIN_PASSWORD value: "admin" --- apiVersion: v1 kind: Service metadata: name: grafana-service spec: type: LoadBalancer ports: - port: 3000 selector: app: grafana
- Access:
<load-balancer-ip>:3000 - Data Source: Prometheus (
http://prometheus-operated:9090) - Dashboards: CPU usage, memory, query latency, DB connection errors, RAG performance breakdown.
PerfectPick tracks key metrics for the RAG pipeline to ensure recommendation quality and efficiency:
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from flask import Response
# General Metrics
QUERY_COUNTER = Counter('perfectpick_queries_total', 'Total queries')
LATENCY_HISTOGRAM = Histogram('perfectpick_query_duration_seconds', 'Query latency')
SESSION_HITS = Counter('perfectpick_session_memory_retrievals', 'Number of times session memory was retrieved')
# RAG & LLM Specific Metrics
RETRIEVAL_LATENCY = Histogram('perfectpick_retrieval_duration_seconds', 'Duration of hybrid retrieval stage')
RERANKING_TIME = Histogram('perfectpick_reranking_duration_seconds', 'Duration of neural reranking stage')
LLM_PROVIDER_COUNTER = Counter('perfectpick_llm_provider_used', 'Count of responses by LLM provider', ['provider'])
LLM_TOKEN_USAGE = Gauge('perfectpick_llm_tokens_used', 'Tokens used per query (Input/Output)', ['type', 'provider'])
@app.route('/metrics')
def metrics():
return Response(generate_latest(), mimetype='text/plain')
@app.route('/recommend', methods=['POST'])
def recommend():
# ... logic for retrieval, reranking, and generation ...
with LATENCY_HISTOGRAM.time():
QUERY_COUNTER.inc()
# RETRIEVAL_LATENCY and RERANKING_TIME are observed internally
# LLM_PROVIDER_COUNTER and LLM_TOKEN_USAGE are set after generation
return jsonify(results)Critical alerts are configured to monitor both system stability and the quality of the AI service:
- System Health: CPU $>$80%, Latency $>$2s (P95), Pod Restarts $>$3/hr.
- Quality Degradation: DB errors $>$5/min (loss of personalization), Reranker Latency $>$100ms (RAG bottleneck), LLM API Failures $>$2% (critical service interruption).
- Configured via Grafana notifications (Slack/Email).
- Horizontal Scaling: HPA scales pods (3-10) based on 50% CPU utilization.
- Performance: Retrieval <1s, total query <2s; model initialization ~3-5min (CPU).
- Optimization: Pre-downloaded BGE models; cached embeddings in Astra DB.
- Load Handling: Tested for 100 concurrent users with <5% latency increase.
- CI/CD: Integrate GitHub Actions with Cloud Build for automated testing, image building, and Canary Deployments for new LLM/retrieval models.
- Caching: Implement Redis for both exact query caching (for popular queries) and session caching (to reduce load on Supabase).
- Multi-Modal Search: Incorporate image-based search with CLIP embeddings to allow for visual queries (e.g., "find a phone that looks like this").
-
A/B Testing: Implement a solution using Kubernetes Ingress/Service Mesh to A/B test different retrieval strategies (e.g., varying the
$\alpha$ value in hybrid scoring, or testing new rerankers). - Authentication: Add OAuth2 for secure API access.
- Multi-Product Support: Extend to other Flipkart categories (e.g., laptops).
- Personalization: Explore federated learning for user-specific models.
This project is licensed under the MIT License.