Skip to content

Bsahoo99/distilbert-sentiment-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

DistilBERT Sentiment Classifier

Binary sentiment classification on movie reviews using a frozen DistilBERT encoder with a trainable Dense head.

Overview

This project builds a binary sentiment classifier on the Rotten Tomatoes movie-review dataset. A pre-trained DistilBERT model is loaded and frozen; two trainable Dense layers are stacked on top of its [CLS] token output to produce a softmax prediction over positive / negative sentiment. The model is trained for 3 epochs and evaluated on the held-out test split. A second task prints 10 correctly and 10 incorrectly classified examples. A third task computes cosine similarity between sentence-level DistilBERT embeddings for 5 sentence pairs.

Requirements

  • Python 3.8+
  • transformers >= 4.30
  • datasets >= 2.12
  • tensorflow >= 2.12
  • numpy >= 1.21
  • pandas >= 1.4

Installation

pip install -r requirements.txt

Project Structure

distilbert-sentiment-classifier/
├── classify_sentiment.py   # Classification, analysis, and similarity script
├── requirements.txt        # Python dependencies
├── .gitignore
└── README.md

Usage

python classify_sentiment.py

Steps performed:

  1. Downloads the DistilBERT base model and the Rotten Tomatoes dataset from HuggingFace.
  2. Trains a two-layer Dense classifier on top of frozen DistilBERT for 3 epochs.
  3. Prints test-set accuracy (evaluated on the actual test split).
  4. Prints 10 correctly classified and 10 incorrectly classified review examples.
  5. Prints cosine similarity scores for 5 sentence pairs using DistilBERT embeddings.

Results

All results are printed to stdout:

  • Test accuracy on the Rotten Tomatoes test set.
  • 10 correct / 10 incorrect example reviews with their predictions.
  • 5 cosine-similarity scores between pairs of semantically related sentences.

Author

Biswajeet Sahoo

License

MIT License

About

Binary sentiment classification on movie reviews using a frozen DistilBERT encoder with a trainable Dense head

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages