💬 Comment Categorization & Reply Assistant Tool

A minimal, student-friendly NLP project that automatically categorizes user comments into 7 distinct categories and suggests appropriate reply templates.

🎯 Project Objective

Help brands and content creators efficiently manage user feedback by automatically categorizing comments as:

Praise - Positive appreciation
Support - Encouragement
Constructive Criticism - Helpful negative feedback
Hate/Abuse - Offensive content
Threat - Dangerous/threatening content
Emotional - Personal/emotional responses
Spam - Irrelevant promotional content
Question - User inquiries

📁 Project Structure

comment_categorise/
│
├── README.md                          # This file
├── requirements.txt                   # Python dependencies
│
├── data/
│   ├── generate_dataset.py           # Generate synthetic training data
│   └── comments_dataset.csv          # Generated dataset (1199+ samples)
│
├── models/                            # Trained models (created after training)
│   ├── comment_classifier.pkl        
│   └── label_encoder.pkl
│
├── src/                               # Source code
│   ├── __init__.py
│   ├── utils.py                      # Preprocessing utilities
│   ├── train.py                      # Model training script
│   └── predict.py                    # Prediction script
│
├── app.py                             # Streamlit web UI
│
└── outputs/                           # Categorized results

🚀 Quick Start

1. Setup Environment

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
.\venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

2. Generate Dataset

python data/generate_dataset.py

This creates data/comments_dataset.csv with ~1200 labeled comments across 8 categories.

3. Train Model

python src/train.py --data data/comments_dataset.csv --output models

Expected output:

Classification report with accuracy metrics
Saved model files in models/ directory

4. Test Predictions

Single comment:

python src/predict.py --text "Amazing work! Loved the animation."

Batch CSV:

python src/predict.py --input data/comments_dataset.csv --output outputs/categorized_comments.csv

5. Launch Web UI (Bonus)

streamlit run app.py

Open browser at http://localhost:8501

📊 Sample Outputs

Single Comment Prediction

Input: "Amazing work! Loved the animation."
Predicted Category: PRAISE
Confidence: 82.29%

Suggested Reply:
"Thank you so much for your kind words! 😊 We're thrilled you enjoyed it. 
Stay tuned for more content!"

Batch Processing Results

The tool generates a CSV file (outputs/categorized_comments.csv) with:

Original comment text
Predicted category
Confidence score

Sample output:

Comment	Predicted Category	Confidence
Amazing work! Loved the animation.	praise	0.8229
The animation was okay but the voiceover felt off.	constructive	0.7863
This is trash, quit now.	hate_abuse	0.7546
Can you make one on topic X?	question	0.7836

UI Screenshots

Single comment analysis with confidence scores and reply template

Batch CSV upload for processing multiple comments

Category distribution visualization

Note: To capture screenshots, run the Streamlit app and use your browser's screenshot tool or Snipping Tool (Windows).

🛠️ Technical Stack

Component	Technology
Language	Python 3.8+
ML Framework	scikit-learn
NLP	NLTK (tokenization, lemmatization)
Feature Extraction	TF-IDF
Classifier	Logistic Regression (multinomial)
Web UI	Streamlit
Visualization	Matplotlib, Seaborn

📊 Model Architecture

Input Comment
    ↓
Text Cleaning (lowercase, remove URLs/mentions)
    ↓
Tokenization (split into words)
    ↓
Lemmatization (reduce to base forms)
    ↓
TF-IDF Vectorization (convert to numerical features)
    ↓
Logistic Regression Classifier
    ↓
Predicted Category + Confidence Scores

📝 Code Explanation

src/utils.py - Preprocessing

def preprocess_comment(text: str) -> str:
    """
    Clean, tokenize, and lemmatize comment text.
    
    Steps:
    1. Lowercase conversion
    2. Remove URLs and @mentions
    3. Remove special characters
    4. Tokenize into words
    5. Lemmatize (running → run)
    6. Join back to string for TF-IDF
    """

src/train.py - Model Training

Loads CSV dataset
Applies preprocessing
Splits into train/test (80/20)
Trains TF-IDF + LogisticRegression pipeline
Evaluates on test set
Saves model and label encoder

src/predict.py - Prediction

Loads trained model
Preprocesses input text
Returns predicted category + confidence scores
Supports single text or batch CSV

app.py - Streamlit UI

Interactive web interface
Single comment analysis with reply templates
Batch CSV upload and categorization
Visual analytics (bar charts)
Download categorized results

🎓 Learning Outcomes

Students will understand:

Text Preprocessing: Cleaning, tokenization, lemmatization
Feature Engineering: TF-IDF vectorization for text
Classification: Multi-class logistic regression
Model Evaluation: Precision, recall, F1-score
Deployment: Building interactive ML apps with Streamlit

🌟 Bonus Features Implemented

✅ Reply templates for each category
✅ Streamlit web UI with upload
✅ Confidence scores for predictions
✅ Visual analytics (category distribution chart)
✅ Batch processing with CSV export
✅ Well-documented, modular code

📈 Results

Expected Performance (on synthetic data):

Overall Accuracy: ~95%+
Per-category F1-scores: 0.90-0.99

Note: Real-world performance depends on training data quality and diversity.

🔄 Next Steps / Improvements

Better Dataset: Use real social media comments (Twitter API, Reddit)
Advanced Models: Try BERT/DistilBERT for better accuracy
Imbalanced Data: Add class weights or SMOTE sampling
More Features: Sentiment scores, toxicity detection
Deployment: Host on Streamlit Cloud or Hugging Face Spaces

📚 Resources

👤 Author

Jay Nagose

📄 License

This project is for educational purposes.

Need help? Check the inline code comments or run scripts with --help flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💬 Comment Categorization & Reply Assistant Tool

🎯 Project Objective

📁 Project Structure

🚀 Quick Start

1. Setup Environment

2. Generate Dataset

3. Train Model

4. Test Predictions

5. Launch Web UI (Bonus)

📊 Sample Outputs

Single Comment Prediction

Batch Processing Results

UI Screenshots

🛠️ Technical Stack

📊 Model Architecture

📝 Code Explanation

src/utils.py - Preprocessing

src/train.py - Model Training

src/predict.py - Prediction

app.py - Streamlit UI

🎓 Learning Outcomes

🌟 Bonus Features Implemented

📈 Results

🔄 Next Steps / Improvements

📚 Resources

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

RADson2005official/Comment-categorisation

Folders and files

Latest commit

History

Repository files navigation

💬 Comment Categorization & Reply Assistant Tool

🎯 Project Objective

📁 Project Structure

🚀 Quick Start

1. Setup Environment

2. Generate Dataset

3. Train Model

4. Test Predictions

5. Launch Web UI (Bonus)

📊 Sample Outputs

Single Comment Prediction

Batch Processing Results

UI Screenshots

🛠️ Technical Stack

📊 Model Architecture

📝 Code Explanation

src/utils.py - Preprocessing

src/train.py - Model Training

src/predict.py - Prediction

app.py - Streamlit UI

🎓 Learning Outcomes

🌟 Bonus Features Implemented

📈 Results

🔄 Next Steps / Improvements

📚 Resources

👤 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages