Binary sentiment classification on movie reviews using a frozen DistilBERT encoder with a trainable Dense head.
This project builds a binary sentiment classifier on the Rotten Tomatoes movie-review dataset. A pre-trained DistilBERT model is loaded and frozen; two trainable Dense layers are stacked on top of its [CLS] token output to produce a softmax prediction over positive / negative sentiment. The model is trained for 3 epochs and evaluated on the held-out test split. A second task prints 10 correctly and 10 incorrectly classified examples. A third task computes cosine similarity between sentence-level DistilBERT embeddings for 5 sentence pairs.
- Python 3.8+
- transformers >= 4.30
- datasets >= 2.12
- tensorflow >= 2.12
- numpy >= 1.21
- pandas >= 1.4
pip install -r requirements.txtdistilbert-sentiment-classifier/
├── classify_sentiment.py # Classification, analysis, and similarity script
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
python classify_sentiment.pySteps performed:
- Downloads the DistilBERT base model and the Rotten Tomatoes dataset from HuggingFace.
- Trains a two-layer Dense classifier on top of frozen DistilBERT for 3 epochs.
- Prints test-set accuracy (evaluated on the actual test split).
- Prints 10 correctly classified and 10 incorrectly classified review examples.
- Prints cosine similarity scores for 5 sentence pairs using DistilBERT embeddings.
All results are printed to stdout:
- Test accuracy on the Rotten Tomatoes test set.
- 10 correct / 10 incorrect example reviews with their predictions.
- 5 cosine-similarity scores between pairs of semantically related sentences.
Biswajeet Sahoo
MIT License