It is a ML project made for classification of messages(SPAM/HAM). Also the model can be directly accesed using a web app build using Streamlit. For detailed model evaluation and analysis see: Analysis
Demos:
Tech Stack:
- Python: The primary programming language used for the project.
- Pandas: Used for data manipulation and analysis.
- NumPy: Used for numerical operations.
- Matplotlib & Seaborn: Used for data visualization.
- NLTK: Used for natural language processing tasks like tokenization, stop word removal, and stemming.
- Scikit-learn: Used for machine learning model building, including text vectorization (CountVectorizer, TfidfVectorizer), model training (Naive Bayes classifiers), and evaluation metrics.
- WordCloud: Used to generate word clouds for visualizing frequent words.
- Streamlit: Used for making the web app.
Dataset Used: I have used SMS Spam Collection Dataset downloaded from Kaggle. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.
Clone the project
git clone https://github.com/Mr-Atanu-Roy/Spam-SMS-Classification
or simply download this project from https://github.com/Mr-Atanu-Roy/BlogZilla-Backend
In project directory Create a virtual environment (env)
virtualenv env
Activate the virtual environment
For windows:
env\Script\activate
Install dependencies
pip install -r requirements.txt
To run the web app locally, run the following command
streamlit run .\app\app.py
After this you will be automatically redireced to the web app running in your local server
After various analysis and evaluation of different algo, metrices, parameter and hyperparameters I have used the following for the best output, which is also for the web app:
- TfidfVectorizer is used with max_features hyperparamer as 3000 for vectorizing the text data.
- Multinomial Naive Bayes Algo is used for classification.
- Achived accuracy: 97.68% and precision: 99.19%
For detailed model evaluation and analysis see: Analysis
- I have trained the model in the SMS_detection.ipynb file in google colab.
- To experiment or run the file upload the file and spam.csv dataset(from dataset folder to google colab). Or you can use jupyter note book.

