Skip to content

Mr-Atanu-Roy/Spam-SMS-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPAM SMS CLASSIFICATION

It is a ML project made for classification of messages(SPAM/HAM). Also the model can be directly accesed using a web app build using Streamlit. For detailed model evaluation and analysis see: Analysis

Demos:

Screenshot 2025-10-28 163527 Screenshot 2025-10-28 163535

Tech Stack:

  • Python: The primary programming language used for the project.
  • Pandas: Used for data manipulation and analysis.
  • NumPy: Used for numerical operations.
  • Matplotlib & Seaborn: Used for data visualization.
  • NLTK: Used for natural language processing tasks like tokenization, stop word removal, and stemming.
  • Scikit-learn: Used for machine learning model building, including text vectorization (CountVectorizer, TfidfVectorizer), model training (Naive Bayes classifiers), and evaluation metrics.
  • WordCloud: Used to generate word clouds for visualizing frequent words.
  • Streamlit: Used for making the web app.

Dataset Used: I have used SMS Spam Collection Dataset downloaded from Kaggle. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.

Project setup

Clone the project

git clone https://github.com/Mr-Atanu-Roy/Spam-SMS-Classification

or simply download this project from https://github.com/Mr-Atanu-Roy/BlogZilla-Backend

In project directory Create a virtual environment (env)

  virtualenv env

Activate the virtual environment

For windows:

  env\Script\activate

Install dependencies

  pip install -r requirements.txt

To run the web app locally, run the following command

  streamlit run .\app\app.py

After this you will be automatically redireced to the web app running in your local server

About the Model:

After various analysis and evaluation of different algo, metrices, parameter and hyperparameters I have used the following for the best output, which is also for the web app:

  • TfidfVectorizer is used with max_features hyperparamer as 3000 for vectorizing the text data.
  • Multinomial Naive Bayes Algo is used for classification.
  • Achived accuracy: 97.68% and precision: 99.19%

For detailed model evaluation and analysis see: Analysis

Other Notes

  • I have trained the model in the SMS_detection.ipynb file in google colab.
  • To experiment or run the file upload the file and spam.csv dataset(from dataset folder to google colab). Or you can use jupyter note book.

Author