Skip to content

soumyadeepsarkar-2004/data-analytics-Internship

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Restaurant Data Analytics Product

A beginner-friendly, end-to-end data analytics project for restaurant business insights. This repository demonstrates how to analyze a real-world restaurant dataset using Python, pandas, and visualization libraries. All code and outputs are reproducible and ready for extension or learning.

Features

  • Top Cuisines Analysis: Find the most popular cuisines and their market share.
  • City Analysis: Discover which cities have the most restaurants and the highest average ratings.
  • Price Range Distribution: Visualize how restaurants are distributed across price categories.
  • Online Delivery Impact: See how online delivery availability affects restaurant ratings.
  • Votes & Popularity: Explore the relationship between customer votes and ratings.
  • Service Features: Analyze how price range relates to online delivery and table booking.
  • Review Text Insights: Extract frequent positive & negative keywords, review length distribution, and correlation between review length and rating.

Quick Start

  1. Clone the repository
  2. Install dependencies
    python -m venv .venv
    .\.venv\Scripts\activate
    pip install -r analysis\requirements.txt
  3. Run the analysis
    python analysis\analysis.py
  4. View results
    • Output CSVs and images are in analysis/output/
    • Open PNG files to see visualizations

Example Visualizations

  • Top 3 Cuisines
  • City with Most Restaurants
  • Price Range Distribution
  • Online Delivery Impact
  • Votes vs Rating
  • Price Range vs Services

Core Output Files (default minimal run)

These are always produced unless FULL_MODE=1 is set:

  • level1_city_analysis.csv
  • level1_price_range_distribution.csv
  • level1_online_delivery_summary.csv
  • level1_online_vs_offline_rating.csv
  • level3_price_range_vs_delivery_table.csv
  • level1_top_cuisines.csv
  • Key charts (PNG):
    • level1_top3_cuisines.png
    • level1_city_most_restaurants.png
    • level1_price_range_distribution.png
    • level1_online_vs_offline_rating_bar.png
    • level3_votes_vs_rating.png
    • level3_price_range_vs_services.png

Additional Outputs (when FULL_MODE=1)

Top/Bottom city & cuisine lists, extended plots (top10, top15, pies, highest rating city), votes top/bottom lists, level1 summary, README_generated, review keyword extended artifacts (if provided review data).

Optional Review Analysis

Provide a reviews.csv (recommended columns: review_text, rating) in Data analysis dataset/ to enable:

  • reviews_top_positive_keywords.csv / reviews_top_negative_keywords.csv
  • reviews_length_stats.csv
  • reviews_length_rating_corr.txt (only if rating column present)
  • Associated PNG charts

Run in full mode:

set FULL_MODE=1 & python analysis\analysis.py

(PowerShell: $env:FULL_MODE='1'; python analysis\analysis.py)

How It Works

  • analysis/analysis.py: Main script (supports FULL_MODE flag)
  • analysis/requirements.txt: Python dependencies
  • analysis/output/: Generated results (CSV + PNG)
  • Data placement: put dataset CSV (any of: Dataset .csv, dataset.csv, Dataset.csv) inside Data analysis dataset/.

Extend or Learn

  • Fork and add new analyses (e.g., sentiment, time trends)
  • Use as a template for your own data projects
  • All code is commented for easy understanding

License

MIT License. Free for learning and commercial use.