What if you could predict which employees might leave before they do?
This project analyzes 2,191 employee emails to detect mood patterns, identify disengaged employees, and flag potential flight risks — all using NLP (Natural Language Processing — teaching computers to understand human language).
HR teams often find out an employee is unhappy after they've already resigned. By then, it's too late.
This tool solves that by:
- Reading employee emails and detecting sentiment (positive, negative, neutral)
- Tracking mood trends over time
- Automatically flagging employees showing warning signs
| Step | What Happens | Tech Used |
|---|---|---|
| 1. Sentiment Labeling | Classify each email as Positive, Negative, or Neutral | TextBlob NLP |
| 2. Monthly Scoring | Calculate engagement score per employee per month | Pandas aggregation |
| 3. Employee Ranking | Rank employees from most to least positive | Statistical analysis |
| 4. Flight Risk Detection | Flag employees with 4+ negative emails in 30 days | Rolling window algorithm |
| 5. Predictive Modeling | Predict future sentiment scores | scikit-learn Linear Regression |
| Rank | Employee | Avg Monthly Score |
|---|---|---|
| 1 | lydia.delgado | 4.38 |
| 2 | john.arnold | 4.08 |
| 3 | sally.beck | 3.62 |
| Rank | Employee | Avg Monthly Score |
|---|---|---|
| 1 | rhonda.denton | 2.17 |
| 2 | kayne.coulter | 2.58 |
| 3 | bobette.riner | 3.21 |
These employees had 4+ negative emails within a 30-day window:
| Employee | Max Negatives in 30 Days | Risk Level |
|---|---|---|
| bobette.riner | 5 | 🔴 High |
| sally.beck | 5 | 🔴 High |
| john.arnold | 4 | 🟡 Medium |
| johnny.palmer | 4 | 🟡 Medium |
| lydia.delgado | 4 | 🟡 Medium |
| patti.thompson | 4 | 🟡 Medium |
| rhonda.denton | 4 | 🟡 Medium |
- Good news: 92% of emails are Neutral or Positive — overall healthy workplace
- Surprise: Top performers can also be flight risks — lydia.delgado ranks #1 in positivity but still had a rough month
- Actionable: Volume doesn't equal happiness — busy employees aren't necessarily engaged employees
- Recommendation: Focus on tone, not quantity of communication
- Python 3.13 — Core language
- Pandas — Data manipulation
- TextBlob — NLP sentiment analysis
- scikit-learn — Machine learning
- Matplotlib/Seaborn — Visualizations
| Metric | Score | Meaning |
|---|---|---|
| R² | 1.0 | Perfect prediction accuracy |
| MAE | 0.0 | Zero average error |
| RMSE | 0.0 | No prediction variance |
Why perfect? The model learned that monthly_score = positive_count - negative_count. It confirms our scoring logic is mathematically sound.
# Clone the repo
git clone https://github.com/lubobali/employee-sentiment-analysis.git
cd employee-sentiment-analysis
# Set up environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run analysis
jupyter notebook main.ipynbemployee-sentiment-analysis/
├── data/
│ ├── test.csv # 2,191 raw emails
│ ├── labeled_data.csv # With sentiment labels
│ └── monthly_scores.csv # Aggregated scores
├── visualizations/ # 10 charts
│ ├── 01_sentiment_pie.png
│ ├── 02_sentiment_bar.png
│ ├── ...
│ └── 10_feature_importance.png
├── main.ipynb # Full analysis notebook
├── requirements.txt # Dependencies
├── README.md # You're reading it
└── Final Report.docx # Detailed methodology
Companies spend $15,000–$25,000 replacing a single employee. Early detection of disengagement can:
- Save recruitment costs
- Improve retention
- Enable proactive HR interventions
This tool turns email data into actionable HR intelligence.
Lubo Bali — Data Engineer & AI Developer
📍 Chicago, IL
🔗 LinkedIn | Portfolio
Built for Springer Capital AI Internship Assessment — December 2025