📧 Email Spam Classifier

A machine learning project to classify emails as spam or ham (not spam) using natural language processing (NLP). The model uses techniques like TF-IDF vectorization and compares multiple classification algorithms including Logistic Regression, Naive Bayes, and Support Vector Machines (SVM).

🚀 Features

Text preprocessing with TF-IDF
Trained and validated on the completespamassasin dataset
Compared multiple models: Logistic Regression, Naive Bayes, and SVM
Best performance with SVM (98%+ accuracy)
Save and reuse model for prediction
CLI script to test your own email

🗂 Dataset

Source: Kaggle - arXiv Spam Dataset
File used: completespamassasin.csv
Columns:
- Body: The content of the email
- Label: 0 = Ham, 1 = Spam

🛠️ Setup

📦 Prerequisites

Python 3.8+
Virtual environment (optional but recommended)

🔧 Install dependencies

pip install -r requirements.txt

If requirements.txt is not provided, you can install manually:

pip install pandas numpy scikit-learn matplotlib seaborn

🧪 How to Train the Model

Open the Jupyter notebook spam_classifier.ipynb or your Python script.
Load and preprocess the dataset.
Vectorize text using TF-IDF.
Train using the best model (e.g., SVM).
Save the model:

import joblib
joblib.dump(svm_model, 'spam_classifier_model.pkl')
joblib.dump(vectorizer, 'tfidf_vectorizer.pkl')

🤖 Predict with Your Own Email (CLI)

Run the classify_email.py script to test any email string:

python classify_email.py "Win a brand new iPhone now! Click here to claim."

Sample Output:

Prediction: SPAM

This uses the saved model and vectorizer (.pkl files).

📊 Model Comparison Summary

Model	Accuracy
Logistic Regression	96.6%
Naive Bayes	89.7%
SVM (Best)	98.6%

🤝 Contributing

Feel free to fork this repo and improve on:

Text preprocessing (lemmatization, stemming)
Adding Flask or Streamlit UI
Deploying to web

📬 Contact

Author: [Ndongmo Christian] Email: [christianhonore2003@gmail.com] GitHub: @ndongchrist

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
templates		templates
README.md		README.md
app.py		app.py
classify_email.py		classify_email.py
completeSpamAssassin.csv		completeSpamAssassin.csv
requirements.txt		requirements.txt
spam_email.ipynb		spam_email.ipynb
svm_spam_classifier.pkl		svm_spam_classifier.pkl
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📧 Email Spam Classifier

🚀 Features

🗂 Dataset

🛠️ Setup

📦 Prerequisites

🔧 Install dependencies

🧪 How to Train the Model

🤖 Predict with Your Own Email (CLI)

Sample Output:

📊 Model Comparison Summary

🤝 Contributing

📬 Contact

About

Uh oh!

Releases

Packages

Languages

ndongchrist/ML_Spam_Classification

Folders and files

Latest commit

History

Repository files navigation

📧 Email Spam Classifier

🚀 Features

🗂 Dataset

🛠️ Setup

📦 Prerequisites

🔧 Install dependencies

🧪 How to Train the Model

🤖 Predict with Your Own Email (CLI)

Sample Output:

📊 Model Comparison Summary

🤝 Contributing

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages