Background:
An article by New York Times wrote:
"The U.S. is falling to the lowest vaccination rates of the world's wealthiest democracies."
- Project Overview
- Code and Resources Used
- Database Schema
- Data Cleaning
- Analysis Results
- Limitations
- Contributors
Problem statement:
- US is experiencing significant decrease in COVID-19 vaccination rate.
- Evaluate President Joe Biden's sentiments towards COVID-19 vaccinations on Twitter and its impact on US vaccination rate.
Python Version: 3.9
Packages: Tweepy, Pandas, Time, RE, Sqlalchemy, Matplotlib, NLTK, Wordcloud, Numpy, Pillow, Itertools, Collections, Textblob, Datetime, Decimal
Dataset: Utilized Our World In Data's COVID Deaths dataset to observe Biden's tweets impact

The data crawled will be loaded into 3 tables and stored in a PostGreSQL database namely:
user_info:
- Twitter user id
- User's name
- User's profile description
- User's location
tweets:
- Tweet ID
- Tweet
- Tweet's like counts
- Tweet's retweet counts
- User's tweet
- User's name
user_social_network:
- User's followers count
- User's follow count
- User's name
- Crawled tweets are filtered based on keywords "vaccinated" and "covid-19"
- Tweets are stored in a list
- Tweets are changed to lowercase and split into individual elements
- Unwanted characters and URLs are removed using Regular Expressions library
- Stop words are removed using NLTK library
- Words are then sent for analysis
- After pre-processing Joe Biden’s tweets by removing stop words and url links, word cloud (on the right) was created using python in Jupyter notebook.
- The size of each word indicates its frequency in his tweets. The bigger the word, the higher the frequency count. Other than the two search keywords “vaccinated” and “covid-19”, the most frequently appearing words are “get”, and “vaccine”.
- Using the python module “Textblob”, we calculated the polarity score of his tweets with a range of -1 to 1. Values closer to 1 indicate more positivity, while values closer to -1 indicate more negativity.
- We then categorized all polarity scores into 3 buckets with a positive tweet having a polarity score > 0, neutral tweet having a score of 0 and negative tweet having a score of < 0
- From the pie chart on the left, we can observe that Joe Biden’s tweets are largely positive at 58%, hence, we can conclude his sentiments about vaccination for covid-19 are more positive.

Red lines indicate when the President tweeted about vaccination
- From the previous observations we know that Joe Biden's sentiment towards getting vaccinated is positive and he also has a high level of influence on Twitter as compared to Boris Johnson on the same issue.
- The assumption then is that Joe Biden's tweets should have a positive and powerful impact on Covid-19's vaccination rate in USA.
- However, as shown in the below graph of Joe Biden's tweet date vs number of new vaccinations, there is no obvious trend to support our assumption. It is therefore inconclusive.
- Limited number of tweets pulled due to twitter constraints (7 days limitation)
- Popularity score and reach score might not be a sufficient influence indicator of JoeBiden. Better indicator would be klout whereby it takes all his social media presence into account
- To better measure the impact of Biden's tweet on vaccination rate, we would need to execute specific statistical analysis such as time series which is not in the scope of this project
- Textblob may not be sophisticated enough to properly determine if a statement is truly "positive" or "negative"
- Accuracy of analysis may not be good enough as usually large amount of data is needed
Hello there! We are a group of mid-career switchers undergoing a 3 months bootcamp in Business Intelligence and Data Analytics. This is our 2 weeks interim project where we showcased what we have learnt and brainstormed a solution to a given problem.
By Team Yellow Minions



