An end-to-end Natural Language Processing (NLP) project that analyzes student feedback text and classifies sentiment as Positive, Neutral, or Negative using TF-IDF and Logistic Regression.
This project demonstrates the complete NLP pipeline: data preprocessing, feature extraction, supervised text classification, explainability, and deployment-ready inference.
- NLP-based sentiment classification on real student feedback
- TF-IDF vectorization for text feature extraction
- Logistic Regression with class balancing
- Explainable AI using feature coefficient analysis
- Flask-based inference web application
- Deployment-ready project structure
- Python
- pandas
- scikit-learn
- TF-IDF Vectorizer
- Logistic Regression
- Flask
- HTML / CSS
- Gunicorn (deployment-ready)
student-feedback-analyzer/ ├── app/ │ ├── app.py │ ├── sentiment_model.pkl │ ├── vectorizer.pkl │ ├── label_encoder.pkl │ ├── templates/ │ │ └── index.html │ └── static/ │ └── style.css ├── data/ │ └── finalDataset0.2.csv ├── model/ │ └── train.py ├── requirements.txt └── README.md
git clone https://github.com/YOUR_USERNAME/student-feedback-analyzer.git cd student-feedback-analyzer pip install -r requirements.txt cd app python app.py Open in browser:http://127.0.0.1:5000
-
User enters student feedback text
-
Text is transformed using the saved TF-IDF vectorizer
-
Logistic Regression model predicts sentiment class
-
Numeric class is mapped to:
0 → Negative 1 → Neutral 2 → Positive
-
Result is displayed in the UI
The model’s predictions are interpretable by analyzing TF-IDF feature coefficients. Key words contributing to each sentiment class were extracted to validate model reasoning.
This project is deployment-ready and can be hosted on platforms like Render.
Build Command bash: pip install -r requirements.txt
Start Command bash: gunicorn app.app:app
-
Training and inference are fully separated
-
Model artifacts are versioned for reproducibility
-
Designed for clone-and-deploy usage
This sentiment analyzer uses a TF-IDF + Logistic Regression pipeline, which provides fast, interpretable, and deployment-friendly NLP inference.
- Bag-of-words models do not fully capture semantic context or negation.
- Phrases like "not good" or "very bad" may be misclassified in rare cases.
- Minority sentiment classes have limited samples in the dataset, affecting recall.
- Class-weighted Logistic Regression to address imbalance.
- Bigram features (1–2 grams) to improve handling of negation and sentiment phrases.
- Utilize all textual feedback columns by combining them into a unified input.
- Explore hybrid models combining text and structured features.
- Evaluate transformer-based models (e.g., BERT) for deeper semantic understanding.
This repository includes an experimental branch that explores improving sentiment prediction by utilizing all available textual feedback fields in the dataset.
Branch Name: feature/full-text-combination
The baseline model was trained using a single high-signal feedback column to establish a clean and interpretable NLP pipeline. However, the dataset contains multiple complementary text fields (e.g., teaching, course content, lab work, extracurricular feedback), which provide additional context about student experience.
To better leverage this information, an experimental branch was created to combine all textual inputs into a unified document for model training and inference.
-
Combined multiple text columns into a single combined_text feature
-
Retained the same sentiment target (teaching) to avoid label ambiguity
-
Reused the same NLP pipeline: TF-IDF vectorization with unigram + bigram features Class-weighted Logistic Regression
-
Updated the Flask inference app to accept multiple feedback inputs and combine them consistently with training
-
Improved contextual understanding of feedback
-
Better handling of mixed sentiment statements
-
More realistic behavior for negative and neutral feedback cases
The baseline model remains available on the main branch for simplicity and stability, while this branch serves as a documented enhancement and experimentation path.
This branching approach reflects real-world ML development practices:
- Stable baseline maintained on main
- Experimental improvements isolated in feature branches
- Trade-offs documented rather than hidden
This project follows an iterative ML development approach, balancing deployable baselines with documented experimentation using Git branching.