Letterboxd Reviews Sentiment Analysis Project

Contributors: Esther Lan and Hannah Wen

This project performs sentiment analysis on movie reviews that are web scraped from the social platform Letterboxd.

Procedure

This project is coded in Python. BeautifulSoup and Selenium were utilized to first webscrape information from the first 500 pages of movies sorted by popularity, then the first 5 pages of reviews sorted by popularity for each movie. We used MYSQL to store the reviews we collected. Using pandas, we cleaned and tokenized the data for sentiment analysis. We used the Vader NLTK and Roberta models to perform our analysis.

Our Findings

We discovered that the sentiment score generated by RoBERTa and Vader NLTK has almost no correlation. The scatter plots show that the positive, neutral, and negative scores generated by the two methods are randomly scattered throughout the plot. The boxplot also showed a significant difference between the spread of the sentiment scores. We analyzed the reviews with and without emojis as well, and the Vader NLTK accuracy improved greatly with the help of emoji contexts. While RoBERTa stayed about the same.

Lessons Learned

We faced many obstacles in the first portion of this project when we were webscraping. Neither of us had webscraped before and we learned how to make our code more efficient and run faster since we were facing issues with runtime. We ended up scraping nearly 100,000 reviews and will aim to further improve our code so that we can collect more reviews since we were unable to scrape reviews for all 36,000 movies we had originally planned for. Due to the volume of our data, we decided to use a MYSQL database, which was also a new experience for us. More edits were made to our code to maximize efficiency in terms of collecting and inserting the data.

If we were to do this project again, we would have also webscraped the ratings from each individual review to analyze our sentiment analysis's accuracy. We plan on further improving this project to include ratings and create a Letterboxd webscraping Python package.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
CSV files		CSV files
Letterboxd_Sentiment_Analysis_Comparison_with_emoji.ipynb		Letterboxd_Sentiment_Analysis_Comparison_with_emoji.ipynb
README.md		README.md
letterboxd_scraper_v3.ipynb		letterboxd_scraper_v3.ipynb
reviews_scrape_final.ipynb		reviews_scrape_final.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Letterboxd Reviews Sentiment Analysis Project

Contributors: Esther Lan and Hannah Wen

Procedure

Our Findings

Lessons Learned

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Letterboxd Reviews Sentiment Analysis Project

Contributors: Esther Lan and Hannah Wen

Procedure

Our Findings

Lessons Learned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages