Skip to content

pervaizasra/-BeautyBytes

Repository files navigation

-BeautyBytes

TRY IT LIVE: Binder

πŸ“Œ Project Summary

BeautyBytes is a full-cycle data analytics project aimed at uncovering trends, customer preferences, and brand insights in the online cosmetics industry using a dataset of 15,000 makeup products. The project includes:

βœ… Market research & sales trend analysis
βœ… Content-based recommendation system
βœ… Brand performance & customer feedback analysis
βœ… Behavioral segmentation for targeted marketing
βœ… End-to-end Power BI dashboard


🧰 Tools Used

  • Python (Pandas, Seaborn, Scikit-learn, TfidfVectorizer)
  • Power BI (for executive visuals & trends)
  • Google Colab Notebook
  • Cosmetology product dataset with 14 features

πŸ“Š 1. Exploratory Data Analysis (EDA)

  • Most common product categories: Serum, Mascara, Face Oil image
  • Ingredient popularity: Glycerin, Retinol, Vitamin C image
  • Highest rated skin-type focus: Combination & Oily image
  • Highest Products sold in : Italy, USA image
  • Packaging Type Preferred: Jar image

🧠 2. Recommendation Engine

  • Uses category, main ingredient, skin type & packaging
  • Built using TF-IDF and cosine similarity
  • Returns 5 similar products for any selected item

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity

Combine relevant text columns

df['Combined_Features'] = df['Category'] + ' ' + df['Main_Ingredient'] + ' ' + df['Skin_Type'] + ' ' + df['Packaging_Type']

Vectorize

tfidf = TfidfVectorizer() tfidf_matrix = tfidf.fit_transform(df['Combined_Features'])

Cosine similarity

cosine_sim = cosine_similarity(tfidf_matrix) def recommend(product_name, top_n=5): idx = df[df['Product_Name'] == product_name].index[0] sim_scores = list(enumerate(cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1] recommended = df.iloc[[i[0] for i in sim_scores]]['Product_Name'].tolist() return recommended print('A user who likes Ultra Face Mask might also enjoy products with similar skin-type targets and ingredients such as: ',recommend('Ultra Face Mask')) A user who likes Ultra Face Mask might also enjoy products with similar skin-type targets and ingredients such as:
['Magic Foundation', 'Super Cc Cream', 'Ultra Foundation', 'Ultra Eye Shadow', 'Divine Face Mask']

πŸ” 3. Brand Popularity & Feedback

  • Popularity score: Rating Γ— log(Number of Reviews)
  • Brands like HourGlass, Milk Makeup, Becca lead in satisfaction

    Create popularity score

import numpy as np df['Popularity_Score'] = df['Rating'] * np.log1p(df['Number_of_Reviews'])

Brand-wise Aggregates

brand_stats = df.groupby('Brand')[['Rating', 'Number_of_Reviews', 'Popularity_Score']].mean().sort_values(by='Popularity_Score', ascending=False)

Top 10 Popular Brands

brand_stats.head(10) Capture


πŸ“Š Power BI Dashboard

πŸ“ Pages:

🟦 PAGE 1: Executive Overview

  • Treemap: Brand dominance
  • Barplot : Category vs Rating
  • Bubble chart: Price vs Rating vs Review count
  • Donut: Product origin distribution image

🟨 PAGE 2: Product Performance and Behavorial insights

  • Barplot: Usage frequency vs rating
  • Skin type Γ— category distribution
  • Skin type x average rating image

πŸŸ₯ PAGE 3: Global Trends & Ethics

  • Country vs Avg Rating & Price
  • Cruelty-Free share by country
  • Ingredient rating heatmap image

🟩 PAGE 4: Supporting Analysis

  • Packaging type breakdown
  • Common product sizes
  • Gender targeting distribution image

πŸ“Œ Key Business Insights

  • Italy & USA lead in product volume, but Japan & France offer higher-rated items
  • Retinol and Glycerin are most associated with high ratings
  • Daily-use products are more affordable and receive more reviews
  • Cruelty-free products show higher pricing and better customer feedback
  • Sensitive-skin products are the highest rated across the board

πŸ“₯ Files in Repo

  • πŸ’„_BeautyBytes_...ipynb β†’ Full notebook with EDA + ML
  • Makeup-Sales-Trend-Analysis.pdf β†’ Power BI dashboard export
  • beauty_products_clean.csv β†’ Preprocessed data
  • README.md β†’ This file

πŸš€ Future Work

  • Integrate real-time reviews via Sephora API
  • Add NLP sentiment scoring on actual text reviews
  • Deploy recommendation engine with Streamlit or Flask

🀝 Let’s Connect!

If you're a data team, beauty brand, or just love analytics & ecommerce β€” let’s talk!
πŸ“§ Email β€’ πŸ’Ό LinkedIn β€’ 🧠 Portfolio

About

With the surge in online beauty purchases, understanding consumer preferences, identifying popular brands, and recommending suitable products is crucial for business growth

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors