Skip to content

Bilal-73/Phishing-and-Spam-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Machine Learning Scikit-learn FastAPI Status License

🔍 Phishing and Spam Detection – Email Classification

A complete machine learning pipeline for classifying emails as Ham (legitimate), Spam, or Phishing using multiple datasets.
This project combines data preprocessing, TF-IDF vectorization, SMOTE oversampling, and Random Forest classification, and provides a FastAPI backend for real-time predictions.


📌 About

With the rise of phishing and spam emails, automated detection is crucial for cybersecurity.
This project:

  • Loads and preprocesses multiple email datasets
  • Converts email text into TF-IDF vectors
  • Handles imbalanced classes using SMOTE
  • Trains a Random Forest classifier
  • Exposes the trained model through a FastAPI REST API

It is suitable for AI/ML practitioners, cybersecurity enthusiasts, and anyone interested in real-time email classification.


🛠️ Features

  • 📥 Combines datasets: SpamAssassin, CEAS_08, Nazario, Nigerian_Fraud
  • 🧹 Cleans and merges subject & body of emails
  • 📊 Converts text to numerical features with TF-IDF
  • 🔁 Balances data with SMOTE
  • 🤖 Trains a Random Forest classifier
  • ⚡ Serves predictions through FastAPI endpoints
  • 📈 Generates confusion matrix and classification metrics

📁 Project Structure

phishing_spam_api/
│
├── app.py                # FastAPI application
├── model/
│   ├── model.pkl         # Trained model (generated via notebook)
│   ├── vectorizer.pkl    # TF-IDF Vectorizer (generated via notebook)
│   └── train_model.py    # Script to train and save model/vectorizer
├── data/                 # Original CSV datasets
├── phishing-and-spam-detection.ipynb # Notebook for training & downloading artifacts
├── requirements.txt
└── README.md

⚙️ Installation & Setup

git clone https://github.com/Bilal-73/Phishing-and-Spam-Detection.git
cd Phishing-and-Spam-Detection

🧪 Training the Model & Downloading Artifacts

The trained model and vectorizer are not included due to file size. You can generate them using the Jupyter Notebook

👤 Author

Bilal Imran

⭐ Show Your Support If you found this project useful:

  • ⭐ Star the repository
  • 🍴 Fork it
  • 💡 Suggest improvements

About

An AI-powered email classification system that detects spam, phishing, and legitimate emails using TF-IDF vectorization, SMOTE for class balancing, and a Random Forest classifier, with a deployable FastAPI interface for real-time predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors