Product Review Analyzer

About

The Product Review Analyzer is a powerful AI-driven application designed to help product managers and development teams make data-driven decisions. It analyzes product reviews, user feedback, identifies pain points, extracts feature requests, and highlights positive feedback to transform raw customer input into actionable priorities.

This tool leverages advanced AI technologies including Google's Gemini API (using the Gemini 2.0 Flash model), Hugging Face Transformers, and natural language processing libraries to provide comprehensive analysis of user sentiment and feedback. With features like real-time data scraping, batch processing with parallel execution, circuit breaker pattern for API resilience, and interactive visualizations, it transforms raw feedback into strategic priorities for product development.

Features

Upload CSV files with user feedback
Scrape real-time feedback from Google Play Store
Analyze GitHub repositories for issues and discussions
Advanced sentiment analysis with:
- Context-aware sentiment detection
- Aspect-based sentiment analysis
- Local processing for sentiment analysis to avoid API rate limits
- Parallel processing for improved performance
Categorize feedback into:
- Pain points
- Feature requests
- Positive feedback
Extract keywords from feedback
Generate summaries with top insights
Weekly summaries for product prioritization
Download PDF reports with actionable priorities for product managers
Track analysis history and view past results with full summaries
Dynamic processing time estimation based on historical data
Real-time progress updates via WebSockets
Advanced Gemini API integration with:
- Local processing for sentiment analysis to avoid API rate limits
- Gemini API for insight extraction and summary generation
- Intelligent batch processing for optimal performance
- Adaptive request throttling to prevent rate limits
- Robust JSON parsing with multiple fallback mechanisms
- Multi-level caching system for faster responses
- Circuit breaker pattern for graceful degradation
Multi-language support for analyzing reviews in different languages
MongoDB Atlas integration for scalable data storage
Optimized batch processing with dynamic batch sizing and RAM usage optimization

Tech Stack

Backend

FastAPI (Python)
WebSockets for real-time progress updates
MongoDB Atlas for database storage
Hugging Face Transformers for sentiment analysis
Google Gemini API for advanced text processing (using Gemini 2.0 Flash model)
NLTK and spaCy for NLP tasks
PyMongo for MongoDB integration
google-play-scraper for Play Store data
WeasyPrint for PDF generation
Parallel processing for improved performance

Frontend

React
Tailwind CSS
Headless UI components
Framer Motion for animations
Axios for API calls
WebSocket for real-time updates
React Dropzone for file uploads
React Markdown for rendering markdown content
Recharts for data visualization

Project Structure

product-review-analyzer/
├── backend/               # FastAPI backend
│   ├── app/
│   │   ├── api/          # API endpoints and models
│   │   ├── auth/         # Authentication
│   │   ├── models/       # Data models
│   │   ├── services/     # Business logic (scraper, analyzer)
│   │   └── utils/        # Utilities and exceptions
│   ├── download_nltk_resources.py  # NLTK setup script
│   ├── requirements.txt  # Python dependencies
│   ├── requirements.aws.txt  # AWS-specific dependencies
│   └── main.py           # FastAPI application entry
├── frontend/             # React frontend
│   ├── src/
│   │   ├── components/   # React components
│   │   ├── config/       # Configuration files
│   │   ├── hooks/        # Custom React hooks
│   │   ├── services/     # API services
│   │   ├── App.jsx       # Main application component
│   │   ├── index.css     # Tailwind CSS styles
│   │   └── main.jsx      # Entry point
│   ├── tailwind.config.js # Tailwind configuration
│   ├── vite.config.js    # Vite configuration
│   └── package.json      # Node dependencies
├── docs/                 # Documentation
│   ├── api/              # API documentation
│   ├── deployment/       # Deployment guides
│   │   ├── aws.md        # AWS deployment guide
│   │   └── railway.md    # Railway deployment guide
│   ├── features/         # Feature documentation
│   └── troubleshooting/  # Troubleshooting guides
├── scripts/              # Deployment and setup scripts
│   ├── aws/              # AWS deployment scripts
│   │   ├── build.sh      # AWS build script
│   │   ├── deploy_aws.sh # AWS deployment script
│   │   ├── setup_aws.sh  # AWS setup script
│   │   ├── install_aws_deps.sh # AWS dependencies installer
│   │   └── build_frontend_aws.sh # AWS frontend build script
│   └── build.sh          # Production build script
├── serve.py              # Production server script
├── start.sh              # Production startup script
├── package.json          # Root package.json with scripts
└── README.md             # Project documentation

Prerequisites

Node.js (v16+)
Python (v3.11 recommended, avoid 3.13 due to compatibility issues)
MongoDB Atlas account
Google Gemini API key (for enhanced analysis)
AWS Account (for AWS deployment)

Setup Instructions

Complete Setup (Backend and Frontend)

Install all dependencies at once:

npm run install:all

Start both backend and frontend:

npm run dev

AWS Deployment

For production deployment on AWS:

Configure your AWS credentials and EC2 instance
Build the application for production:

# Make the build script executable
chmod +x scripts/aws/build.sh

# Run the build script
./scripts/aws/build.sh

Deploy to AWS:

# Make the deployment script executable
chmod +x scripts/aws/deploy_aws.sh

# Deploy to AWS
./scripts/aws/deploy_aws.sh -h <EC2_HOST> -k <PEM_FILE>

The deployment script will:

Package your application
Upload it to your EC2 instance
Install all dependencies
Configure Nginx as a reverse proxy
Set up a systemd service for the application
Start the application

For detailed AWS deployment instructions, see the AWS Deployment Guide.

Railway Deployment

For deployment on Railway:

Connect your GitHub repository to Railway
Configure the environment variables in Railway dashboard
Deploy the application

For detailed Railway deployment instructions, see the Railway Deployment Guide.

Manual Setup

Backend Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

cd backend
pip install -r requirements.txt

Download NLTK resources:

python download_nltk_resources.py

Set up environment variables:

# Copy the example .env file
cp .env.mongodb.example .env

# Edit the .env file to add your MongoDB URI and Gemini API key
# You can get a Gemini API key from https://ai.google.dev/

Run the backend server:

uvicorn main:app --reload

Frontend Setup

Install dependencies:

cd frontend
npm install

Run the development server:

npm run dev

API Endpoints

Core Endpoints

POST /api/upload - Upload and analyze a CSV file
GET /api/scrape - Scrape and analyze data from Google Play Store
POST /api/analyze - Analyze a list of reviews
POST /api/summary - Generate a summary from analyzed reviews
POST /api/summary/pdf - Generate a PDF report
POST /api/summary/weekly - Generate a weekly summary

Advanced Sentiment Analysis

POST /api/sentiment/analyze - Analyze sentiment of a single text with advanced features
POST /api/sentiment/batch - Analyze sentiment of multiple texts

Gemini API Integration

POST /api/gemini/sentiment - Analyze sentiment using Google's Gemini API
POST /api/gemini/batch - Batch process multiple texts with Gemini
POST /api/gemini/insights - Extract insights from reviews using Gemini
GET /api/gemini/status - Get detailed Gemini API status with performance metrics

History Tracking

POST /api/history - Record an analysis in the history
GET /api/history - Get analysis history
GET /api/history/{analysis_id} - Get a specific analysis by ID
DELETE /api/history/{analysis_id} - Delete an analysis history record

Processing Time Tracking

POST /api/timing/record - Record processing time for an operation
GET /api/timing/estimate/{operation} - Get estimated processing time
GET /api/timing/history - Get processing time history

Authentication

POST /api/auth/register - Register a new user
POST /api/auth/login - Login and get access token
GET /api/auth/me - Get current user information
GET /api/auth/user - Get user information from token

WebSocket Endpoints

ws://localhost:8000/ws/batch-progress - Real-time batch processing progress updates
ws://localhost:8000/ws/sentiment-progress - Real-time sentiment analysis progress updates

Environment Variables

Required Variables

MONGODB_URI - MongoDB Atlas connection string
SECRET_KEY - Secret key for JWT token generation
ACCESS_TOKEN_EXPIRE_MINUTES - JWT token expiration time in minutes

Optional Variables

GEMINI_API_KEY - Google Gemini API key for enhanced analysis
GEMINI_MODEL - Gemini model to use (default: gemini-2.0-flash)
GEMINI_BATCH_SIZE - Number of reviews to process in each Gemini API batch (default: 10)
GEMINI_SLOW_THRESHOLD - Threshold in seconds to detect slow Gemini API processing (default: 5)
PORT - Server port (default: 8000)
HOST - Server host (default: 0.0.0.0)
DEBUG - Enable debug mode (default: False)
ENABLE_WEBSOCKETS - Enable WebSocket support (default: True)
PARALLEL_PROCESSING - Enable parallel processing (default: True)
MAX_WORKERS - Maximum number of worker threads for parallel processing (default: 4)
BATCH_SIZE_MULTIPLIER - Adjust all batch sizes (default: 1.0)
CIRCUIT_BREAKER_TIMEOUT - Time before resetting circuit breaker (default: 300 seconds)

Production Variables

DEVELOPMENT_MODE - Set to false for production environments (default: True)
FRONTEND_URL - URL of the frontend for CORS configuration (default: varies by environment)
LOG_LEVEL - Logging level (default: INFO)
GUNICORN_WORKERS - Number of Gunicorn workers for production (default: 4)
GUNICORN_TIMEOUT - Timeout for Gunicorn workers in seconds (default: 120)

API Documentation

Once the backend is running, visit http://localhost:8000/docs for the interactive API documentation.

Gemini API Integration

The application includes advanced integration with Google's Gemini API for enhanced text analysis capabilities. For detailed documentation, see Gemini API Integration Documentation.

Key Features

Intelligent Batch Processing: Dynamically adjusts batch sizes based on review length
Multi-Level Caching System: Implements LRU caching for faster responses
Adaptive Request Throttling: Prevents rate limit errors by controlling request rates
Robust JSON Parsing: Multiple fallback mechanisms for handling various response formats
Circuit Breaker Pattern: Gracefully degrades to local processing when API is unavailable

Performance Monitoring

The Gemini API integration includes comprehensive performance monitoring accessible through the /api/gemini/status endpoint, which provides detailed metrics on:

API response times
Cache efficiency
Request throttling
Error rates
Circuit breaker status

For implementation details, troubleshooting tips, and best practices, refer to the complete documentation.

CSV Format

The CSV file should contain the following columns:

text (required) - The feedback content
username (optional) - The user who provided the feedback
timestamp (optional) - When the feedback was provided
rating (optional) - A numerical rating (1-5)

Example:

text,username,timestamp,rating
"The app keeps crashing whenever I try to upload photos",user1,2023-04-29,2
"Would love to have dark mode in the next update!",user2,2023-04-30,4
"Everything works flawlessly. Great job!",user3,2023-04-30,5

MongoDB Collections

The application uses the following MongoDB collections:

users - User accounts and authentication information
reviews - Analyzed reviews and feedback
keywords - Extracted keywords and their frequencies
analysis_history - History of analysis operations with full summaries
processing_times - Processing time records for estimation
weekly_summaries - Weekly summaries for product prioritization
batch_progress - Batch processing progress tracking

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
app/auth		app/auth
backend		backend
docs		docs
frontend		frontend
scripts/aws		scripts/aws
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.python-version		.python-version
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
build.sh		build.sh
build_frontend_aws.sh		build_frontend_aws.sh
check_deployment.sh		check_deployment.sh
deploy_aws.sh		deploy_aws.sh
deployment_guide.md		deployment_guide.md
fix_path.py		fix_path.py
fix_vite_build.sh		fix_vite_build.sh
install_aws_deps.sh		install_aws_deps.sh
nixpacks.toml		nixpacks.toml
package-lock.json		package-lock.json
package.json		package.json
product_planning_overview.md		product_planning_overview.md
sample_test.csv		sample_test.csv
serve.py		serve.py
setup_aws.sh		setup_aws.sh
start.sh		start.sh
static.json		static.json
submission_document.md		submission_document.md
test_nlp.py		test_nlp.py
twitter_training.csv		twitter_training.csv
twitter_training_analysis.json		twitter_training_analysis.json
weekly_summary_submission.md		weekly_summary_submission.md

Folders and files

Latest commit

History

Repository files navigation

Product Review Analyzer

About

Features

Tech Stack

Backend

Frontend

Project Structure

Prerequisites

Setup Instructions

Complete Setup (Backend and Frontend)

AWS Deployment

Railway Deployment

Manual Setup

Backend Setup

Frontend Setup

API Endpoints

Core Endpoints

Advanced Sentiment Analysis

Gemini API Integration

History Tracking

Processing Time Tracking

Authentication

WebSocket Endpoints

Environment Variables

Required Variables

Optional Variables

Production Variables

API Documentation

Gemini API Integration

Key Features

Performance Monitoring

CSV Format

MongoDB Collections

License

GitHub Repository Tags

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages