SPAM SMS CLASSIFICATION

It is a ML project made for classification of messages(SPAM/HAM). Also the model can be directly accesed using a web app build using Streamlit. For detailed model evaluation and analysis see: Analysis

Demos:

Tech Stack:

Python: The primary programming language used for the project.
Pandas: Used for data manipulation and analysis.
NumPy: Used for numerical operations.
Matplotlib & Seaborn: Used for data visualization.
NLTK: Used for natural language processing tasks like tokenization, stop word removal, and stemming.
Scikit-learn: Used for machine learning model building, including text vectorization (CountVectorizer, TfidfVectorizer), model training (Naive Bayes classifiers), and evaluation metrics.
WordCloud: Used to generate word clouds for visualizing frequent words.
Streamlit: Used for making the web app.

Dataset Used: I have used SMS Spam Collection Dataset downloaded from Kaggle. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.

Project setup

Clone the project

git clone https://github.com/Mr-Atanu-Roy/Spam-SMS-Classification

or simply download this project from https://github.com/Mr-Atanu-Roy/BlogZilla-Backend

In project directory Create a virtual environment (env)

  virtualenv env

Activate the virtual environment

For windows:

  env\Script\activate

Install dependencies

  pip install -r requirements.txt

To run the web app locally, run the following command

  streamlit run .\app\app.py

After this you will be automatically redireced to the web app running in your local server

About the Model:

After various analysis and evaluation of different algo, metrices, parameter and hyperparameters I have used the following for the best output, which is also for the web app:

TfidfVectorizer is used with max_features hyperparamer as 3000 for vectorizing the text data.
Multinomial Naive Bayes Algo is used for classification.
Achived accuracy: 97.68% and precision: 99.19%

For detailed model evaluation and analysis see: Analysis

Other Notes

I have trained the model in the SMS_detection.ipynb file in google colab.
To experiment or run the file upload the file and spam.csv dataset(from dataset folder to google colab). Or you can use jupyter note book.

Author

@Atanu Roy

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
app		app
dataset		dataset
.gitignore		.gitignore
SMS_detection.ipynb		SMS_detection.ipynb
analysis.md		analysis.md
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPAM SMS CLASSIFICATION

Project setup

About the Model:

Other Notes

Author

About

Uh oh!

Languages

Mr-Atanu-Roy/Spam-SMS-Classification

Folders and files

Latest commit

History

Repository files navigation

SPAM SMS CLASSIFICATION

Project setup

About the Model:

Other Notes

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages