🔍 Phishing and Spam Detection – Email Classification

A complete machine learning pipeline for classifying emails as Ham (legitimate), Spam, or Phishing using multiple datasets.
This project combines data preprocessing, TF-IDF vectorization, SMOTE oversampling, and Random Forest classification, and provides a FastAPI backend for real-time predictions.

📌 About

With the rise of phishing and spam emails, automated detection is crucial for cybersecurity.
This project:

Loads and preprocesses multiple email datasets
Converts email text into TF-IDF vectors
Handles imbalanced classes using SMOTE
Trains a Random Forest classifier
Exposes the trained model through a FastAPI REST API

It is suitable for AI/ML practitioners, cybersecurity enthusiasts, and anyone interested in real-time email classification.

🛠️ Features

📥 Combines datasets: SpamAssassin, CEAS_08, Nazario, Nigerian_Fraud
🧹 Cleans and merges subject & body of emails
📊 Converts text to numerical features with TF-IDF
🔁 Balances data with SMOTE
🤖 Trains a Random Forest classifier
⚡ Serves predictions through FastAPI endpoints
📈 Generates confusion matrix and classification metrics

📁 Project Structure

phishing_spam_api/
│
├── app.py                # FastAPI application
├── model/
│   ├── model.pkl         # Trained model (generated via notebook)
│   ├── vectorizer.pkl    # TF-IDF Vectorizer (generated via notebook)
│   └── train_model.py    # Script to train and save model/vectorizer
├── data/                 # Original CSV datasets
├── phishing-and-spam-detection.ipynb # Notebook for training & downloading artifacts
├── requirements.txt
└── README.md

⚙️ Installation & Setup

git clone https://github.com/Bilal-73/Phishing-and-Spam-Detection.git
cd Phishing-and-Spam-Detection

🧪 Training the Model & Downloading Artifacts

The trained model and vectorizer are not included due to file size. You can generate them using the Jupyter Notebook

👤 Author

Bilal Imran

💼 AI / ML & Full-Stack Enthusiast
🔗 GitHub: https://github.com/Bilal-73

⭐ Show Your Support If you found this project useful:

⭐ Star the repository
🍴 Fork it
💡 Suggest improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Phishing and Spam Detection – Email Classification

📌 About

🛠️ Features

📁 Project Structure

⚙️ Installation & Setup

🧪 Training the Model & Downloading Artifacts

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
model		model
README.md		README.md
app.py		app.py
phishing-and-spam-detection.ipynb		phishing-and-spam-detection.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Phishing and Spam Detection – Email Classification

📌 About

🛠️ Features

📁 Project Structure

⚙️ Installation & Setup

🧪 Training the Model & Downloading Artifacts

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages