Skip to content
Open

blog1 #127

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
379 changes: 379 additions & 0 deletions _templates/d_e_art.csv

Large diffs are not rendered by default.

262 changes: 262 additions & 0 deletions _templates/d_f_art.csv

Large diffs are not rendered by default.

334 changes: 334 additions & 0 deletions _templates/k_e_art.csv

Large diffs are not rendered by default.

178 changes: 178 additions & 0 deletions _templates/k_f_art.csv

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions _templates/post1_summit-youngsoo choi.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "Blog 1"
desription: "Topic and Research Problem"
author: "Young Soo Choi"
date: "9/16/2022"
format:
html:
toc: true
code-copy: true
code-tools: true
categories:
- Topic and Research Problem
---

# Topic: Disaster Management and Public Safety Policy

As a person whose main job is to establish and execute policies related to disaster safety management, the research topic of text analysis, which is this class, was naturally set as disaster accident response and public safety management.

While working in the field, I encountered and made numerous documents, including daily safety management status reports, precise analysis reports, evaluation reports, and media press releases, but I have never conducted a quantitative analysis with these documents. Through this research activity, I want to get an analysis technique or inspiration related to this.

# Review existing literatures

First, we looked at some existing studies that conducted text analysis in this field.

First of all, text analysis was conducted on disaster recognition and prediction areas using text data such as SNS such as Amir Karami and Vishal Shah et al. (2019), Hyun Jeong Seo and Minji Son et al. (2021), and Graham Neubig and Yuichiroh Matsubayashi (2011).

Next, there are textual analysis studies related to residents, and there were studies to analyze the contents of residents' reports and use them for disaster prediction, and to understand the contents by analyzing disaster-related educational data.

In addition, there is a textual analysis study on political dynamics related to disaster victims, which was also an interesting topic.

Personally, it was more interesting because there were research materials from experts who collaborated and consulted when I was in the job.

Anyway, I found out that text analysis is being conducted in various ways like this, and based on this, I selected several research questions.

# Finding a Research Problem

The first consideration was to examine the correlation between reports and the actual occurrence of an accident through textual analysis of the contents of the safety management report used internally by various safety management agencies. There are many cases of encountering various activity reports within the institution, because I always wondered if there was a way to use these data for their capacity management. However, after reviewing the actual possibilities, I found that it is difficult to obtain safety management reports that are used internally by each institution from the current standpoint. Safety management activities are almost impossible to secure because they may mainly contain information related to the confidentiality of the institutions.

The second is related to the formation of people's perceptions related to disasters, which we usually classify as natural disasters and human disasters (legalized as social disasters in Korea), and in the case of natural disasters, people's voices of criticism will be less than that of social disasters. From the government's point of view, rapid disaster management and recovery are important, and how people's perceptions are formed in this process is an important issue, and if criticism is high, faster response and settlement are important. According to these assumptions, it is important for the government to show actions such as resolving accidents more quickly and announcing related recurrence measures in the event of a social disaster. With this in mind, the research topic I chose is to check whether the contents of major media outlets that greatly affect the formation of public opinions appear differently in the case of social disasters and natural disasters.

Therefore, my research project of this text analysis class is as follows.

Is there a difference in the attitude of media reports between natural and human disasters?


# Bibliography
Twitter speaks: A case of national disaster situational awareness(Amir Karami and Vishal Shah et al., 2019)

Trends in Civic Engagement Disaster Safety Education Research: Systematic Literature Review and Keyword Network Analysis(Hyun Jeong Seo and Minji Son et al.,2021)

Safety Information Mining — What can NLP do in a disaster—(Graham Neubig and Yuichiroh Matsubayashi et al., 2011)

Research Suggestion for Disaster Prediction using Safety Report of Korea Government(Lee, Jun, Shin, Jindong et al., 2019)

Disaster Risk Reduction in Iranian Primary and Secondary School Textbooks: A Content Analysis(Hamed Seddighi, Sepideh Yousefzadeh et al., 2021)

Politicization of a disaster and victim blaming: Analysis of the Sewol ferry case in Korea(Ji-Bum Chung, Eugene Choi et al., 2022)

The Textual Approach: Risk and Blame in Disaster Sensemaking(Robert P. Gephart, Jr., 2017)
133 changes: 133 additions & 0 deletions _templates/post2_summit-youngsoo choi.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: "Blog 2"
desription: "Scrapping the web"
author: "Young Soo Choi"
date: "09/30/2022"
format:
html:
toc: true
code-copy: true
code-tools: true
categories:
- blog 2
---

# Korean or English?

An important choice remains. It is whether to analyze text in English or text in Korean.

Of course, I am currently in the US school curriculum and the members here speak English. But it is more useful for me to use Korean than English. After I get my degree here, I have to return to the position of national disaster management policy officer in Korea, so it is appropriate to carry out my project on Korean rather than conducting a research project in English. If I have the ability, I can proceed with the project necessary for the class in English and the Korean project separately, but I don't think I can do that yet.

So I am sorry to others, but my project will be carried out in Korean. I will interpret important Korean words in this project as English words, but you will not fully understand my entire literature. If you have any questions, please ask me individually, and I would appreciate it if you could interestingly see that various text analysis activities can be conducted in languages other than English.

# Selection of data

In order to review the selected research topic, I will first import data and proceed with a preliminary procedure to analyze text data. The target data are articles of a major earthquake that exceeded 5 on the scale in Korea in 2017.

Since the earthquake occurred on November 15, 2017, articles reported in the pages of the Seoul Newspaper from November 15 to 22 were analyzed.

A week was arbitrarily set to prevent political issues that were unrelated to the disaster itself as many hours passed after the accident occurred.

# Importing data

First, I will load the necessary package and look for the url that I searched on the portal site under the conditions. 8 pages of web page search results are searched. Looking at the url structure, the basic url is attached with changing numbers such as 11, 21, and 31, and continues to the last 71. Using these points, find each url using the conditional statement and store it in n_d_urls.

```{r}
library(tidyverse)
library(rvest)

# Scraping earthquake-related articles

b_n_url<-"https://search.naver.com/search.naver?where=news&sm=tab_pge&query=%ED%8F%AC%ED%95%AD%20%EC%A7%80%EC%A7%84&sort=0&photo=3&field=0&pd=3&ds=2017.11.15&de=2017.11.25&cluster_rank=10&mynews=1&office_type=1&office_section_code=1&news_office_checked=1081&nso=so:r,p:from20171115to20171125,a:all&start="


# Finding URL
n_d_urls <- NULL
for (x in 0:7) {
n_d_urls <- c(n_d_urls, paste(b_n_url, x*10+1, sep=""))
}
n_d_urls
```

Eight URLs have been saved and I'll look for links to individual articles in each of these URLs. I used the browser's inspection function to find the css of the link, and I used it to find the link of the individual news articles.

```{r}
# Finding individual news link

n_d_news_links <- NULL
for (url in n_d_urls) {
html <- read_html(url)
n_d_news_links <- c(n_d_news_links, html %>%
html_nodes('a.info')%>%
html_attr('href'))
}

n_d_news_links
```

However, individual links include the URL address of the newspaper's website. I'll get rid of this.

```{r}
# Delete unnecessary parts

n_d_news_links = n_d_news_links[n_d_news_links!="https://www.seoul.co.kr"]
n_d_news_links
```

When this was removed, 79 individual news stories were left. Likewise, I will find css containing the text of the article using the inspection function and use it as a conditional sentence to scrap the text of each article.

```{r}
# Saving Individual Articles

n_d_contents <- NULL

for (link in n_d_news_links) {
html <- read_html(link)
n_d_contents <- c(n_d_contents, html %>%
html_nodes("div#dic_area.go_trans._article_content") %>%
html_text())
}

n_d_contents
```

I brought the text of all 79 articles.

# Basic Analysis

## Preprocessing

At first glance, the reporter's e-mail address and tag symbol were included, so basic preprocessing was carried out to exclude all Korean characters.

```{r}
library("stringr")

n_d_contents_pre<-n_d_contents %>%
str_replace_all("[^가-힣]", " ") %>%
str_squish()

n_d_contents_pre
```

## Create WordCloud

I made a word cloud with this data.

The necessary package was loaded, preprocessed data was made into corpus, and then data were framed to create a word cloud.

```{r}
library(quanteda)
library(quanteda.textplots)

# convert to corpus
n_d_corpus <- corpus(n_d_contents_pre)

# create a word cloud
n_d_dfm <- tokens(n_d_corpus, remove_punct=TRUE) %>%
dfm()

textplot_wordcloud(n_d_dfm)

```

The biggest word (I'm sorry in Korean) shows Pohang(포항), the area caused by the earthquake, and the type of disaster(지진, earthquake). It includes dependent nouns such as work(일), be(있다),Seoul Newspaper(서울신문), and Facebook(페이스북), and these words appear to be included in the body of the article as they include the newspaper's display and SNS links on the web page. These are meaningless in text analysis, so we should remove them in the preprocessing process next time.
Loading