-BeautyBytes

TRY IT LIVE:

📌 Project Summary

BeautyBytes is a full-cycle data analytics project aimed at uncovering trends, customer preferences, and brand insights in the online cosmetics industry using a dataset of 15,000 makeup products. The project includes:

✅ Market research & sales trend analysis
✅ Content-based recommendation system
✅ Brand performance & customer feedback analysis
✅ Behavioral segmentation for targeted marketing
✅ End-to-end Power BI dashboard

🧰 Tools Used

Python (Pandas, Seaborn, Scikit-learn, TfidfVectorizer)
Power BI (for executive visuals & trends)
Google Colab Notebook
Cosmetology product dataset with 14 features

📊 1. Exploratory Data Analysis (EDA)

Most common product categories: Serum, Mascara, Face Oil
Ingredient popularity: Glycerin, Retinol, Vitamin C
Highest rated skin-type focus: Combination & Oily
Highest Products sold in : Italy, USA
Packaging Type Preferred: Jar

🧠 2. Recommendation Engine

Uses category, main ingredient, skin type & packaging
Built using TF-IDF and cosine similarity
Returns 5 similar products for any selected item

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity

Combine relevant text columns

df['Combined_Features'] = df['Category'] + ' ' + df['Main_Ingredient'] + ' ' + df['Skin_Type'] + ' ' + df['Packaging_Type']

Vectorize

tfidf = TfidfVectorizer() tfidf_matrix = tfidf.fit_transform(df['Combined_Features'])

Cosine similarity

cosine_sim = cosine_similarity(tfidf_matrix) def recommend(product_name, top_n=5): idx = df[df['Product_Name'] == product_name].index[0] sim_scores = list(enumerate(cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1] recommended = df.iloc[[i[0] for i in sim_scores]]['Product_Name'].tolist() return recommended print('A user who likes Ultra Face Mask might also enjoy products with similar skin-type targets and ingredients such as: ',recommend('Ultra Face Mask')) A user who likes Ultra Face Mask might also enjoy products with similar skin-type targets and ingredients such as:
['Magic Foundation', 'Super Cc Cream', 'Ultra Foundation', 'Ultra Eye Shadow', 'Divine Face Mask']

🔍 3. Brand Popularity & Feedback

Popularity score: Rating × log(Number of Reviews)
Brands like HourGlass, Milk Makeup, Becca lead in satisfaction
Create popularity score

import numpy as np df['Popularity_Score'] = df['Rating'] * np.log1p(df['Number_of_Reviews'])

Brand-wise Aggregates

brand_stats = df.groupby('Brand')[['Rating', 'Number_of_Reviews', 'Popularity_Score']].mean().sort_values(by='Popularity_Score', ascending=False)

Top 10 Popular Brands

brand_stats.head(10)

📊 Power BI Dashboard

📁 Pages:

🟦 PAGE 1: Executive Overview

Treemap: Brand dominance
Barplot : Category vs Rating
Bubble chart: Price vs Rating vs Review count
Donut: Product origin distribution

🟨 PAGE 2: Product Performance and Behavorial insights

Barplot: Usage frequency vs rating
Skin type × category distribution
Skin type x average rating

🟥 PAGE 3: Global Trends & Ethics

Country vs Avg Rating & Price
Cruelty-Free share by country
Ingredient rating heatmap

🟩 PAGE 4: Supporting Analysis

Packaging type breakdown
Common product sizes
Gender targeting distribution

📌 Key Business Insights

Italy & USA lead in product volume, but Japan & France offer higher-rated items
Retinol and Glycerin are most associated with high ratings
Daily-use products are more affordable and receive more reviews
Cruelty-free products show higher pricing and better customer feedback
Sensitive-skin products are the highest rated across the board

📥 Files in Repo

💄_BeautyBytes_...ipynb → Full notebook with EDA + ML
Makeup-Sales-Trend-Analysis.pdf → Power BI dashboard export
beauty_products_clean.csv → Preprocessed data
README.md → This file

🚀 Future Work

Integrate real-time reviews via Sephora API
Add NLP sentiment scoring on actual text reviews
Deploy recommendation engine with Streamlit or Flask

🤝 Let’s Connect!

If you're a data team, beauty brand, or just love analytics & ecommerce — let’s talk!
📧 Email • 💼 LinkedIn • 🧠 Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Beauty.csv		Beauty.csv
Beauty_Sales_Analysis.pbit		Beauty_Sales_Analysis.pbit
README.md		README.md
requirments.txt		requirments.txt
💄_BeautyBytes_Data_Driven_Insights_and_Personalization_in_the_Cosmetics_Industry.ipynb		💄_BeautyBytes_Data_Driven_Insights_and_Personalization_in_the_Cosmetics_Industry.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

-BeautyBytes

📌 Project Summary

🧰 Tools Used

📊 1. Exploratory Data Analysis (EDA)

🧠 2. Recommendation Engine

Combine relevant text columns

Vectorize

Cosine similarity

🔍 3. Brand Popularity & Feedback

Create popularity score

Brand-wise Aggregates

Top 10 Popular Brands

📊 Power BI Dashboard

🟦 PAGE 1: Executive Overview

🟨 PAGE 2: Product Performance and Behavorial insights

🟥 PAGE 3: Global Trends & Ethics

🟩 PAGE 4: Supporting Analysis

📌 Key Business Insights

📥 Files in Repo

🚀 Future Work

🤝 Let’s Connect!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

-BeautyBytes

📌 Project Summary

🧰 Tools Used

📊 1. Exploratory Data Analysis (EDA)

🧠 2. Recommendation Engine

Combine relevant text columns

Vectorize

Cosine similarity

🔍 3. Brand Popularity & Feedback

Create popularity score

Brand-wise Aggregates

Top 10 Popular Brands

📊 Power BI Dashboard

🟦 PAGE 1: Executive Overview

🟨 PAGE 2: Product Performance and Behavorial insights

🟥 PAGE 3: Global Trends & Ethics

🟩 PAGE 4: Supporting Analysis

📌 Key Business Insights

📥 Files in Repo

🚀 Future Work

🤝 Let’s Connect!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages