Skip to content

lubobali/employee-sentiment-analysis

Repository files navigation

📧 Employee Sentiment Analysis

What if you could predict which employees might leave before they do?

This project analyzes 2,191 employee emails to detect mood patterns, identify disengaged employees, and flag potential flight risks — all using NLP (Natural Language Processing — teaching computers to understand human language).


🤔 The Problem

HR teams often find out an employee is unhappy after they've already resigned. By then, it's too late.

This tool solves that by:

  • Reading employee emails and detecting sentiment (positive, negative, neutral)
  • Tracking mood trends over time
  • Automatically flagging employees showing warning signs

✨ What This Project Does

Step What Happens Tech Used
1. Sentiment Labeling Classify each email as Positive, Negative, or Neutral TextBlob NLP
2. Monthly Scoring Calculate engagement score per employee per month Pandas aggregation
3. Employee Ranking Rank employees from most to least positive Statistical analysis
4. Flight Risk Detection Flag employees with 4+ negative emails in 30 days Rolling window algorithm
5. Predictive Modeling Predict future sentiment scores scikit-learn Linear Regression

📊 Key Findings

🏆 Most Positive Employees

Rank Employee Avg Monthly Score
1 lydia.delgado 4.38
2 john.arnold 4.08
3 sally.beck 3.62

⚠️ Employees Needing Attention

Rank Employee Avg Monthly Score
1 rhonda.denton 2.17
2 kayne.coulter 2.58
3 bobette.riner 3.21

🚨 Flight Risk Alerts

These employees had 4+ negative emails within a 30-day window:

Employee Max Negatives in 30 Days Risk Level
bobette.riner 5 🔴 High
sally.beck 5 🔴 High
john.arnold 4 🟡 Medium
johnny.palmer 4 🟡 Medium
lydia.delgado 4 🟡 Medium
patti.thompson 4 🟡 Medium
rhonda.denton 4 🟡 Medium

💡 Business Insights

  1. Good news: 92% of emails are Neutral or Positive — overall healthy workplace
  2. Surprise: Top performers can also be flight risks — lydia.delgado ranks #1 in positivity but still had a rough month
  3. Actionable: Volume doesn't equal happiness — busy employees aren't necessarily engaged employees
  4. Recommendation: Focus on tone, not quantity of communication

🛠️ Technical Details

Tech Stack

  • Python 3.13 — Core language
  • Pandas — Data manipulation
  • TextBlob — NLP sentiment analysis
  • scikit-learn — Machine learning
  • Matplotlib/Seaborn — Visualizations

Model Performance

Metric Score Meaning
1.0 Perfect prediction accuracy
MAE 0.0 Zero average error
RMSE 0.0 No prediction variance

Why perfect? The model learned that monthly_score = positive_count - negative_count. It confirms our scoring logic is mathematically sound.

How to Run

# Clone the repo
git clone https://github.com/lubobali/employee-sentiment-analysis.git
cd employee-sentiment-analysis

# Set up environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run analysis
jupyter notebook main.ipynb

📁 Project Structure

employee-sentiment-analysis/
├── data/
│   ├── test.csv              # 2,191 raw emails
│   ├── labeled_data.csv      # With sentiment labels
│   └── monthly_scores.csv    # Aggregated scores
├── visualizations/           # 10 charts
│   ├── 01_sentiment_pie.png
│   ├── 02_sentiment_bar.png
│   ├── ...
│   └── 10_feature_importance.png
├── main.ipynb               # Full analysis notebook
├── requirements.txt         # Dependencies
├── README.md               # You're reading it
└── Final Report.docx       # Detailed methodology

🎯 Why This Matters

Companies spend $15,000–$25,000 replacing a single employee. Early detection of disengagement can:

  • Save recruitment costs
  • Improve retention
  • Enable proactive HR interventions

This tool turns email data into actionable HR intelligence.


👤 Author

Lubo Bali — Data Engineer & AI Developer
📍 Chicago, IL
🔗 LinkedIn | Portfolio


Built for Springer Capital AI Internship Assessment — December 2025

About

Analyze employee email sentiment, identify flight risks, and predict engagement trends using NLP + ML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors