For a live demonstration of the sentiment analysis tool, visit the Movie Sentiment Analyzer
Explore the Project Notebook for an in-depth look at implementation
As a self-proclaimed cinephile, I am always looking for ways to gather the public's consensus opinion on a movie before watching. All of the current major outlets (Rotten Tomatoes, IMDb, Metacritic, etc.) are decent options, but can be arbitrary with bias and varying scoring systems from person to person. Calculating the sentiment of text from a movie review can create a standardized system for understanding the consensus opinion of a movie.
After finding an already-cleaned database of movie reviews, I developed a Sentiment Analysis Model using a Convolutional Neural Network. After testing various parameters and epochs of model training, I fit the model and created a scoring system, weighting the sentiments based on the inverse of their gaussian distribution. Then, ~250 reviews were collected per movie and passed through the model, with each score being uploaded to the website.
Link to the project UI demo
To use the sentiment analysis tool locally, follow these steps:
- Clone the project notebook
- Go through all of the cells in the model building section, making sure there are no errors
- Go through all of the cells in the model testing and scoring section, making sure there are no errors
- Connect your google drive in the appropriate cells in the database building section
- Edit the names variable to the movie names of your choice
- Run the rest of the cells in the database building section.
- Model Training Data: Provided courtesy of Stanford NLP, containing 50,000 data points with binary outputs (positive/negative). In order to reduce model bias, only 25 reviews per movie were used.
- Model Test Results: The outcome of the model test post training is tracked on this sheet. 10,000 points were used for testing, with 83.75% accuracy.
- Movie Score Database: The list of movie scores and poster URLs on the website come from this sheet, which is constantly being updated. Scores are generated off of the top ~250 IMDb reviews.
- Project Notebook: Any intermediary data and the entire process of building the model & gathering data can be found in the project notebook.
This project is licensed under the MIT License.
- Special thanks to Stanford NLP for allowing the use of their dataset for model training.
- Thanks to Streamlit for providing a fantastic platform for building interactive web apps.
- Thanks to Google Colab for providing a free and powerful environment for running Jupyter notebooks.
