SMS Spam Detection NLP Project

A comprehensive machine learning-based web application for detecting spam SMS messages using Natural Language Processing (NLP) techniques. This project features a trained Random Forest model with 98.92% accuracy and a modern web interface for real-time spam detection.

Features

Frontend

Modern UI: Clean, professional design with responsive layout
SMS Detection: Real-time spam detection with confidence scores
Dashboard: View all analyzed messages with search and pagination
Statistics: Overview of spam detection metrics
Loading States: Visual feedback during processing
Mobile Responsive: Works seamlessly on desktop and mobile devices

Backend

Flask API: RESTful endpoints for prediction and data management
Machine Learning Model: Trained Random Forest classifier with 98.92% accuracy
Feature Engineering: Advanced NLP feature extraction (TF-IDF, text analysis, pattern matching)
SQLite Database: Persistent storage for message history
CORS Support: Cross-origin requests enabled
Error Handling: Comprehensive error handling and validation

Installation

Prerequisites

Python 3.7 or higher
pip (Python package installer)
Git (for cloning the repository)

Setup Instructions

Clone the repository

git clone https://github.com/rk-python5/spam_sms_detection_nlp.git
cd spam_sms_detection_nlp

Create virtual environment (recommended)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Run the application
```
python app.py
```
Access the application Open your browser and navigate to: http://localhost:5001

API Endpoints

POST /predict

Analyze an SMS message for spam detection.

Request:

{
    "sms_text": "Your SMS message here"
}

Response:

{
    "label": "spam" | "not spam",
    "confidence": 0.85,
    "score": 0.75
}

GET /messages

Retrieve paginated list of analyzed messages.

Query Parameters:

page: Page number (default: 1)
limit: Items per page (default: 10)
search: Search term for filtering

Response:

{
    "messages": [...],
    "total_count": 100,
    "page": 1,
    "limit": 10,
    "total_pages": 10
}

GET /stats

Get statistics about analyzed messages.

Response:

{
    "total_messages": 100,
    "spam_count": 25,
    "not_spam_count": 75,
    "spam_percentage": 25.0
}

How It Works

Spam Detection Algorithm

The application uses a trained machine learning model (Random Forest) with 98.92% accuracy that analyzes multiple features:

TF-IDF Features: Term frequency-inverse document frequency analysis
Text Length: Message length, word count, line count
Character Analysis: Uppercase ratio, number ratio, special characters
Spam Keywords: 500+ common spam terms and phrases
Pattern Matching: URLs, excessive punctuation, phone numbers
Word Analysis: Average word length, unique word ratio
Machine Learning: Trained on 5,574 real SMS messages (747 spam, 4,827 ham)

Model Performance

Training Accuracy: 98.92% on test set
Real-world Testing: 86.7% accuracy on diverse test cases
Spam Detection: Excellent at identifying obvious spam patterns
Threshold-based Classification: 80% confidence threshold for "not spam"
Aggressive Spam Detection: Anything below 80% confidence classified as spam
Confidence Scores: High confidence for clear cases, moderate for edge cases

Project Structure

spam_sms_detection_nlp/
├── app.py                    # Flask backend application
├── train_model.py            # Machine learning model training script
├── test_threshold.py         # Threshold testing script
├── test_enhanced_model.py    # Enhanced model testing script
├── test_api.py              # API testing script
├── requirements.txt          # Python dependencies
├── sms_spam_model.pkl        # Trained Random Forest model
├── spam_collection.txt       # Training dataset (5,574 SMS messages)
├── templates/
│   └── index.html           # Main HTML template
├── sms_detection.db          # SQLite database (created automatically)
├── start.sh                 # Linux/Mac startup script
├── start.bat                # Windows startup script
├── PROJECT_SUMMARY.md        # Detailed project summary
└── README.md                # This file

Usage

SMS Detection

Navigate to the "Detection" tab
Enter or paste an SMS message in the text area
Click "Check SMS" to analyze
View the result with confidence score

Dashboard

Navigate to the "Dashboard" tab
View statistics and message history
Use search to filter messages
Navigate through pages using pagination

Customization

Adding New Spam Keywords

Edit the spam_keywords list in app.py:

self.spam_keywords = [
    'free', 'win', 'winner', 'congratulations', 'urgent',
    # Add your keywords here
]

Modifying Detection Rules

Adjust the scoring system in the predict method:

# Example: Increase penalty for spam keywords
score += features[4] * 0.4  # Instead of 0.3

Styling Changes

Modify the CSS in the <style> section of templates/index.html to customize colors, fonts, and layout.

Technical Details

Database Schema

CREATE TABLE messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    sms_text TEXT NOT NULL,
    prediction TEXT NOT NULL,
    confidence REAL,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);

Dependencies

Flask: Web framework
Flask-CORS: Cross-origin resource sharing
SQLite3: Database (built-in with Python)
NumPy: Numerical operations (optional, for future ML models)

Future Enhancements

Machine Learning Model: Replace rule-based detection with trained ML model
User Authentication: Add user accounts and personal dashboards
Bulk Upload: Support for analyzing multiple messages at once
Export Features: Download results as CSV/JSON
Real-time Updates: WebSocket support for live updates
Advanced Analytics: More detailed statistics and visualizations

Troubleshooting

Common Issues

Port already in use
- Change the port in app.py: app.run(port=5001)
Database errors
- Delete sms_detection.db to reset the database
CORS issues
- Ensure Flask-CORS is installed: pip install Flask-CORS
Missing dependencies
- Reinstall requirements: pip install -r requirements.txt

License

This project is open source and available under the MIT License.

GitHub Repository

This project is hosted on GitHub under the rk-python5 organization:

Repository: rk-python5/spam_sms_detection_nlp
Organization: rk-python5

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and commit: git commit -m "Add feature"
Push to your fork: git push origin feature-name
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
sms_spam_model.pkl		sms_spam_model.pkl
spam_collection.txt		spam_collection.txt
start.bat		start.bat
start.sh		start.sh
test_api.py		test_api.py
test_enhanced_model.py		test_enhanced_model.py
test_threshold.py		test_threshold.py
train_model.py		train_model.py

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection NLP Project

Features

Frontend

Backend

Installation

Prerequisites

Setup Instructions

API Endpoints

POST /predict

GET /messages

GET /stats

How It Works

Spam Detection Algorithm

Model Performance

Project Structure

Usage

SMS Detection

Dashboard

Customization

Adding New Spam Keywords

Modifying Detection Rules

Styling Changes

Technical Details

Database Schema

Dependencies

Future Enhancements

Troubleshooting

Common Issues

License

GitHub Repository

Contributing

Development Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages