Sentiment analysis is the process of detecting positive or negative sentiment in textual data. It is a common natural language processing (NLP) technique used to help business track customer satisfaction, understand customer needs, facilitate customer segmentation and more. The tidytext package contains a bundle of functions that allow us to perform NLP tasks, including sentiment analysis, more conveniently in conjunction with tidyverse. In particular, the package contains collections of words in association with their sentiment orientation, known as Sentiment Lexicons. In today’s exercises we will practice performing a sentiment analysis with the help of these Sentiment Lexicons.
The dataset “Disneyland Reviews” contains 42,000 reviews of 3 Disneyland branches - Paris, California and Hong Kong, posted by visitors on Trip Advisor. It is freely available at https://www.kaggle.com/datasets/arushchillar/disneyland-reviews. As downloading the dataset requires a Kaggle account, you can download the data [here].
In today’s tutorial, we will first revisit some of the Alice in Wonderland exercises to familiarise ourselves with functions provided by the tidytext package and see how these functions can help us to perform the same tasks more efficiently. We will then find out the most and least popular Disneyland attractions as well as differences in sentiment of visitors from different countries. By the end of this exercise, we present the results as a [ ] similar to figure 1.