📧 NLP Spam Detection System

👋 Overview

This project aims to build a robust NLP tool that:

Classifies emails as SPAM or NOT SPAM
Identifies key topics in spam emails and measures their semantic distance
Extracts organizations mentioned in non-spam emails

Leveraging libraries such as spaCy and NLTK, this system provides end-to-end solutions for text classification, topic modeling, semantic similarity, and named entity extraction.

🎯 Project Objectives

Train a Classifier for SPAM Identification
- Use the provided dataset to train a machine learning model that accurately labels emails as SPAM or NOT SPAM.
- Evaluate performance with metrics like accuracy, precision, recall, and F1-score.
Identify Main Topics in SPAM Emails
- Perform topic modeling (e.g., Latent Dirichlet Allocation (LDA)) to uncover key themes in spam messages.
Calculate the Semantic Distance Between Topics
- Measure how distinct each topic is using metrics like cosine similarity.
- Assess diversity and overlap of the discovered themes.
Extract Organizations from NON-SPAM Emails
- Apply Named Entity Recognition (NER) (using spaCy or NLTK) to detect and extract organization names in non-spam emails.

🏗️ Project Structure

Data Preprocessing
- Clean and prepare the dataset for model training (e.g., removing noise, normalizing text).
- Implement tokenization, stop-word removal, lemmatization, and stemming.
Classifier Training
- Experiment with various algorithms (e.g., Naive Bayes, SVM, or neural networks).
- Select and fine-tune the best model based on validation metrics.
Topic Modeling (SPAM Emails)
- Use LDA to identify dominant topics in spam emails.
- Visualize and interpret the most common themes.
Semantic Distance Computation
- Implement methods (like cosine similarity) to measure how similar or different identified topics are.
Named Entity Recognition (NER)
- Use spaCy or NLTK to detect and extract organization names from non-spam emails.

🔧 Libraries & Tools

spaCy: Named Entity Recognition, tokenization, lemmatization
NLTK: Tokenization, stop-word removal, text preprocessing

View my code on ipynb files! Happy coding! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Progetto_Spam_Filter.ipynb		Progetto_Spam_Filter.ipynb
README.md		README.md
spam_dataset.csv		spam_dataset.csv
spam_dataset_.csv		spam_dataset_.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📧 NLP Spam Detection System

👋 Overview

🎯 Project Objectives

🏗️ Project Structure

🔧 Libraries & Tools

About

Uh oh!

Releases

Packages

Languages

sylver86/Spam-Detection-System-NLP-project

Folders and files

Latest commit

History

Repository files navigation

📧 NLP Spam Detection System

👋 Overview

🎯 Project Objectives

🏗️ Project Structure

🔧 Libraries & Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages