Skip to content

Deep learning | Built a transformer-based email classifier using BERT to detect spam with 93% accuracy, including preprocessing, class balancing, embeddings, model training, and evaluation.

Notifications You must be signed in to change notification settings

srushtin24/BERT-Email-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Email Classification using BERT

A transformer-based deep learning model that classifies email messages as spam or ham using BERT (Bidirectional Encoder Representations from Transformers). This project includes complete preprocessing, class balancing, sentence embeddings, model training, evaluation, and inference using TensorFlow & TensorFlow Hub.

๐Ÿš€ Project Highlights

  • Built using BERT (uncased, L-12, H-768, A-12)
  • Achieves 93% accuracy, ~90โ€“95% precision and 90โ€“95% recall
  • Uses TF Hub BERT Preprocessing + Encoder layers
  • Balanced imbalanced dataset using downsampling
  • Evaluated using confusion matrix, classification report, precision, recall, F1-score
  • Predicts new messages with high confidence
  • Demonstrates semantic similarity using BERT embeddings

๐Ÿ“‚ Dataset
Dataset from Kaggle: SMS Spam Collection Dataset Class distribution:

Ham: 4825 messages
Spam: 747 messages

Strong class imbalance โ†’ handled via downsampling.

๐Ÿงน Data Preprocessing

  • Removed imbalance using random downsampling
  • Converted categories into binary labels (spam = 1, ham = 0)
  • Train-test split using stratified sampling

๐Ÿง  Model Architecture

  • BERT Pipeline
  • BERT Preprocessing Layer
  • BERT Encoder Layer
  • Dropout
  • Dense Layer (Sigmoid Activation)

Only the final layer is trainable โ†’ BERT acts as a feature extractor.

๐Ÿ“Š Model Performance

About

Deep learning | Built a transformer-based email classifier using BERT to detect spam with 93% accuracy, including preprocessing, class balancing, embeddings, model training, and evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published