GitHub - panjialalam/2.-Web-Scraping-and-Text-Sentiment-Analysis

=== SENTIMENT ANALYSIS FOR ONE ARTICLE ===

Providing the report, I started by parsing the text at the word level for the analysis. I parsed using the unnest_tokens function for the word-level analysis. Based on the parsed text, the five words with the highest frequency are health, report, December, countries, and infections. Then, I checked for potential valence shifts using bigrams. Then, I joined the words to the three sentiment measurements.

In my analysis of word-level sentiment using NRC, I observed that positive and negative sentiments are prominently represented among words. Following this, I also noted that fear, trust, and sadness were among the top five most common sentiments. Using AFINN and BING, I found significantly more words with negative sentiments.

I tokenized the sentences using the unnest_tokens function to obtain the sentences. Within sentence-level analysis using SentimentR, I observed more negative sentiment sentences, and the mean is -0.166. In conclusion, the January 2023 report mostly has negative overall sentiments.

I used lemmatization to get the lemma and look for the countries. Then, the list of words from the lemma is checked with the country name from the country code library. The countries discussed are Oman, Switzerland, Sudan, Uganda, and Kenya.

Online sources:

• Change histogram fill color: https://learn.saylor.org/mod/book/view.php?id=58485&chapterid=45033

• Find country: https://stackoverflow.com/questions/70787674/is-it-possible-to-get-r-to-identify-countries-in-a-dataframe

=== WEB SCRAPING AND SENTIMENT ANALYSIS FOR ALL WHO ARTICLES (AFRICA REGION) ===

The scraping process has two essential processes: 1). Obtain the hyperlinks and date using the loop mechanism stopped at the year and month designated, and 2). Scrape all the articles based on the available hyperlinks filtered based on the filtered date. I employed the while loop as it will stop the scraping. The essential part of the code is when I need to add new scraped hyperlinks and dates to the stored data. Then, I saved all of them in Txt files with one additional file containing all the texts.

For the text-analysis, I utilized lemmatization as it can generate base words accurately which will later combined with the sentiment categories.

Based on the NRC, positive and trust sentiments are the highest, followed by negative, fear, and anticipation. Using AFINN, there were more negative sentiments in the overall texts. The tones are mixed between those sentiments because I noted a balanced number of words.

I chose Uganda as the country analysis and compared it with overall articles, and I prepared a function to get a country-specific article.

Online sources:

• Accumulate data from inside a loop: https://stackoverflow.com/questions/28553872/in-r-how-do-you-accumulate-data-from-inside-a-loop-to-a-numeric-vector-above-th

• Remove space in the end of a sentence string: https://stackoverflow.com/questions/28553872/in-r-how-do-you-accumulate-data-from-inside-a-loop-to-a-numeric-vector-above-th

• Combine all vector lists into one: https://www.r-bloggers.com/2023/08/the-unlist-function-in-r/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Saved articles		Saved articles
.DS_Store		.DS_Store
.gitattributes		.gitattributes
1. Sentiment Analysis_WHO News.R		1. Sentiment Analysis_WHO News.R
2. Scraping and Sentiment Analysis_WHO Pages.R		2. Scraping and Sentiment Analysis_WHO Pages.R
Pic 1 - News NRC.jpg		Pic 1 - News NRC.jpg
Pic 2 - News AFINN.jpg		Pic 2 - News AFINN.jpg
Pic 3 - News BING.jpg		Pic 3 - News BING.jpg
Pic 4 - News SentimentR.jpg		Pic 4 - News SentimentR.jpg
Pic 5 - Africa News NRC.jpg		Pic 5 - Africa News NRC.jpg
Pic 6 - Africa News AFINN.jpg		Pic 6 - Africa News AFINN.jpg
Pic 7 - Africa News BING Country Comparison.jpg		Pic 7 - Africa News BING Country Comparison.jpg
README.md		README.md
vol101_1_publichealthroundup.txt		vol101_1_publichealthroundup.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages