Skip to content

ndongchrist/ML_Spam_Classification

Repository files navigation

📧 Email Spam Classifier

A machine learning project to classify emails as spam or ham (not spam) using natural language processing (NLP). The model uses techniques like TF-IDF vectorization and compares multiple classification algorithms including Logistic Regression, Naive Bayes, and Support Vector Machines (SVM).


🚀 Features

  • Text preprocessing with TF-IDF
  • Trained and validated on the completespamassasin dataset
  • Compared multiple models: Logistic Regression, Naive Bayes, and SVM
  • Best performance with SVM (98%+ accuracy)
  • Save and reuse model for prediction
  • CLI script to test your own email

🗂 Dataset

  • Source: Kaggle - arXiv Spam Dataset
  • File used: completespamassasin.csv
  • Columns:
    • Body: The content of the email
    • Label: 0 = Ham, 1 = Spam

🛠️ Setup

📦 Prerequisites

  • Python 3.8+
  • Virtual environment (optional but recommended)

🔧 Install dependencies

pip install -r requirements.txt

If requirements.txt is not provided, you can install manually:

pip install pandas numpy scikit-learn matplotlib seaborn

🧪 How to Train the Model

  1. Open the Jupyter notebook spam_classifier.ipynb or your Python script.
  2. Load and preprocess the dataset.
  3. Vectorize text using TF-IDF.
  4. Train using the best model (e.g., SVM).
  5. Save the model:
import joblib
joblib.dump(svm_model, 'spam_classifier_model.pkl')
joblib.dump(vectorizer, 'tfidf_vectorizer.pkl')

🤖 Predict with Your Own Email (CLI)

Run the classify_email.py script to test any email string:

python classify_email.py "Win a brand new iPhone now! Click here to claim."

Sample Output:

Prediction: SPAM

This uses the saved model and vectorizer (.pkl files).


📊 Model Comparison Summary

Model Accuracy
Logistic Regression 96.6%
Naive Bayes 89.7%
SVM (Best) 98.6%

🤝 Contributing

Feel free to fork this repo and improve on:

  • Text preprocessing (lemmatization, stemming)
  • Adding Flask or Streamlit UI
  • Deploying to web

📬 Contact

Author: [Ndongmo Christian] Email: [christianhonore2003@gmail.com] GitHub: @ndongchrist

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published