A comprehensive machine learning-based web application for detecting spam SMS messages using Natural Language Processing (NLP) techniques. This project features a trained Random Forest model with 98.92% accuracy and a modern web interface for real-time spam detection.
- Modern UI: Clean, professional design with responsive layout
- SMS Detection: Real-time spam detection with confidence scores
- Dashboard: View all analyzed messages with search and pagination
- Statistics: Overview of spam detection metrics
- Loading States: Visual feedback during processing
- Mobile Responsive: Works seamlessly on desktop and mobile devices
- Flask API: RESTful endpoints for prediction and data management
- Machine Learning Model: Trained Random Forest classifier with 98.92% accuracy
- Feature Engineering: Advanced NLP feature extraction (TF-IDF, text analysis, pattern matching)
- SQLite Database: Persistent storage for message history
- CORS Support: Cross-origin requests enabled
- Error Handling: Comprehensive error handling and validation
- Python 3.7 or higher
- pip (Python package installer)
- Git (for cloning the repository)
-
Clone the repository
git clone https://github.com/rk-python5/spam_sms_detection_nlp.git cd spam_sms_detection_nlp -
Create virtual environment (recommended)
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
-
Access the application Open your browser and navigate to:
http://localhost:5001
Analyze an SMS message for spam detection.
Request:
{
"sms_text": "Your SMS message here"
}Response:
{
"label": "spam" | "not spam",
"confidence": 0.85,
"score": 0.75
}Retrieve paginated list of analyzed messages.
Query Parameters:
page: Page number (default: 1)limit: Items per page (default: 10)search: Search term for filtering
Response:
{
"messages": [...],
"total_count": 100,
"page": 1,
"limit": 10,
"total_pages": 10
}Get statistics about analyzed messages.
Response:
{
"total_messages": 100,
"spam_count": 25,
"not_spam_count": 75,
"spam_percentage": 25.0
}The application uses a trained machine learning model (Random Forest) with 98.92% accuracy that analyzes multiple features:
- TF-IDF Features: Term frequency-inverse document frequency analysis
- Text Length: Message length, word count, line count
- Character Analysis: Uppercase ratio, number ratio, special characters
- Spam Keywords: 500+ common spam terms and phrases
- Pattern Matching: URLs, excessive punctuation, phone numbers
- Word Analysis: Average word length, unique word ratio
- Machine Learning: Trained on 5,574 real SMS messages (747 spam, 4,827 ham)
- Training Accuracy: 98.92% on test set
- Real-world Testing: 86.7% accuracy on diverse test cases
- Spam Detection: Excellent at identifying obvious spam patterns
- Threshold-based Classification: 80% confidence threshold for "not spam"
- Aggressive Spam Detection: Anything below 80% confidence classified as spam
- Confidence Scores: High confidence for clear cases, moderate for edge cases
spam_sms_detection_nlp/
├── app.py # Flask backend application
├── train_model.py # Machine learning model training script
├── test_threshold.py # Threshold testing script
├── test_enhanced_model.py # Enhanced model testing script
├── test_api.py # API testing script
├── requirements.txt # Python dependencies
├── sms_spam_model.pkl # Trained Random Forest model
├── spam_collection.txt # Training dataset (5,574 SMS messages)
├── templates/
│ └── index.html # Main HTML template
├── sms_detection.db # SQLite database (created automatically)
├── start.sh # Linux/Mac startup script
├── start.bat # Windows startup script
├── PROJECT_SUMMARY.md # Detailed project summary
└── README.md # This file
- Navigate to the "Detection" tab
- Enter or paste an SMS message in the text area
- Click "Check SMS" to analyze
- View the result with confidence score
- Navigate to the "Dashboard" tab
- View statistics and message history
- Use search to filter messages
- Navigate through pages using pagination
Edit the spam_keywords list in app.py:
self.spam_keywords = [
'free', 'win', 'winner', 'congratulations', 'urgent',
# Add your keywords here
]Adjust the scoring system in the predict method:
# Example: Increase penalty for spam keywords
score += features[4] * 0.4 # Instead of 0.3Modify the CSS in the <style> section of templates/index.html to customize colors, fonts, and layout.
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sms_text TEXT NOT NULL,
prediction TEXT NOT NULL,
confidence REAL,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);- Flask: Web framework
- Flask-CORS: Cross-origin resource sharing
- SQLite3: Database (built-in with Python)
- NumPy: Numerical operations (optional, for future ML models)
- Machine Learning Model: Replace rule-based detection with trained ML model
- User Authentication: Add user accounts and personal dashboards
- Bulk Upload: Support for analyzing multiple messages at once
- Export Features: Download results as CSV/JSON
- Real-time Updates: WebSocket support for live updates
- Advanced Analytics: More detailed statistics and visualizations
-
Port already in use
- Change the port in
app.py:app.run(port=5001)
- Change the port in
-
Database errors
- Delete
sms_detection.dbto reset the database
- Delete
-
CORS issues
- Ensure Flask-CORS is installed:
pip install Flask-CORS
- Ensure Flask-CORS is installed:
-
Missing dependencies
- Reinstall requirements:
pip install -r requirements.txt
- Reinstall requirements:
This project is open source and available under the MIT License.
This project is hosted on GitHub under the rk-python5 organization:
- Repository: rk-python5/spam_sms_detection_nlp
- Organization: rk-python5
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and commit:
git commit -m "Add feature" - Push to your fork:
git push origin feature-name - Submit a pull request