risk/report.Rmd at main · cdsbasel/risk · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
---
title: "Report"
description: |
  Using data visualization to understand risk perception
editor_options:
  chunk_output_type: console

output:
  distill::distill_article:
    md_extensions: +emoji
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

# Background

The past years have shown that risk perception and risk communication are key issues in today's societies. A number of voices have argued for the need to understand and estimate the public's risk perceptions in many domains to better target specific interventions and help both laypeople and policy-makers make better, informed decisions, for example, when fighting global pandemics or climate change (e.g., Jenny & Betsch, 2022).

The Risk Hackathon aimed to provide a training opportunity in data analytics while showcasing ongoing work on risk perception by the Center for Cognitive and Decision Sciences. The hackathon proposed a set challenges linked to analysing a new data set on risk perception and evaluating the performance of novel models to predict risk perception.

The hackathon used data from a pre-registered [study](https://osf.io/6m7xr) by Hussain et al. conducted in 2022 designed to compare the performance of the classic psychometric paradigm (Fischhoff et al., 1978) against novel models that rely on language embeddings (e.g., Bhatia, 2019). In this context, embeddings refer to vector representations of words obtained from natural language processing models trained on large amounts of text data or other data sources (e.g., free associations).

The Hussain et al. study involved asking thousands of participants to rate over 1000 words or terms (e.g., handgun, vaccination, artificial intelligence) concerning their perceived risk, with each word being rated on a scale of -100 (safe) to +100 (risky). The same words were also rated on 9 (psychometric) dimensions concerning, for example, whether the risk is typically fatal, controllable, voluntary, etc. The ratings from several respondents were averaged to create an average risk and psychometric ratings. These average risk ratings were utilized in the hackathon to understand average risk perception for different risks and compare the predictions of different models of risk perception.

# Hackathon

A hackathon is a group event that gets individuals to work on a common problem using code. In the CDS Risk Hackathon participants were asked to answer a number of questions pertaining to how people perceive risks. The specific questions and data descriptors are provided [here](hackathon.html). The participants were organized intro groups of 4 to 5 people and collaborated to produce answers to the challenges in the form of data visualizations in R. CDS members provided training prior to the hackathon in the form of an R refresher course in the morning of the event. After the hackthon event, participants discussed their experience over drinks :beers: and pizza :pizza:.

# Results

The results presented below are a subset of the data visualizations produced by the hackathon participants and are used to give an idea of the sort of insights generated by the hackathon. The full analysis of the data used will be provided in a scientific manuscript (Hussain et al., in preparation).

```{r}
library(tidyverse)
require(ggrepel)
library(ggside)
library(fmsb)
library(patchwork)
library(ggplot2)
library(ggpubr)
library(ggcorrplot)
library(ggridges)

risk = read_csv("risk_hackathon.csv")
cols = viridis::cividis(5,end=.9)

```

#### Items related to crime, war, and violence, are seen as most risky

Many of the words (i.e., events or actions) rated as most risky have to do with crime, war, or violence. A number of health-related items and technologies are also seen as quite risky on average. Some hackathon participants used density plots to obtain an overview of risk ratings that make clear the average differences between these different types of risks.

```{r, fig.asp = .6}
groups = c("Crime / War / Violence",
           "Health",
           "Transportation",
           "Nature / Food",
           "Activities / Technologies")
p = risk %>%
  mutate(cluster_fct = factor(cluster, levels = groups)) %>%
  ggplot(mapping = aes(x = risk_ratings,
                       fill = cluster_fct,
                       col = cluster_fct)) +
    geom_density() +
    xlim(c(-100, 100)) +
    theme_minimal() +
  scale_fill_viridis_d(option = "E", alpha = .5, end = .95) +
  scale_color_viridis_d(option = "E", alpha = .7, end = .95) +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "none") +
  labs(x = "Risk rating")
pos = ggplot_build(p)$data[[1]] %>%
  group_by(group) %>%
  summarize(x = x[which.max(y)],
            y = max(y)) %>%
  mutate(label = groups,
         cluster_fct = factor(label, levels = label))

p + geom_label_repel(data = pos,
                     mapping = aes(x = x, y = y, label = groups),
                     fill = "white",nudge_y = .001, force = .01)
```

#### Items differ considerably in their psychometric profile

The different types of risks have different profiles concerning the 9 psychometric dimensions that were considered in the study in line with past work (cf. Bhatia, 2019; Fischhoff et al., 1978). Some hackathon participants used radar charts to explore the profiles of different types of risks, including older ones, like handgungs, and newer ones, like artificial intelligence.

```{r, fig.asp = .6}
dat <- risk %>%
  rename(involuntary = psych_voluntary,
         fatal = psych_fatal,
         delayed = psych_immediate,
         dread = psych_dread,
         catastrophic = psych_catastrophic,
         controllable = psych_controllable,
         unknown_sci = psych_known_sci,
         unknown = psych_known,
         old = psych_new)

df_radarchart <- dat %>% filter(word %in% c("handgun", "vaccination", "artificial intelligence")) %>% select(word, involuntary:old) %>%
  add_row(.before = TRUE, word = "Min", involuntary = 1, fatal = 1, delayed = 1, dread = 1, catastrophic = 1, controllable = 1, unknown_sci = 1, unknown = 1, old = 1) %>%
  add_row(.before = TRUE, word = "Max", involuntary = 7, fatal = 7, delayed = 7, dread = 7, catastrophic = 7, controllable = 7, unknown_sci = 7, unknown = 7, old = 7) %>%
  column_to_rownames(var = "word")

create_radarchart <- function(data, color = "#00AFBB",
                              vlabels = colnames(data), vlcex = 0.7,
                              caxislabels = NULL, title = NULL, ...)
{
  radarchart(
    data,
    # Customize the polygon
    pcol = color, pfcol = scales::alpha(color, 0.5), plwd = 2, plty = 1,
    # Customize the grid
    cglcol = "grey", cglty = 3, cglwd = 0.8,
    # Customize the axis
    axislabcol = "grey",
    # Variable labels
    vlcex = vlcex, vlabels = vlabels,
    caxislabels = caxislabels, title = NA, ...
  )
  mtext(title, line=-3, font=1)
}

colors <- cols[c(5,2,1)]
titles <- rownames(df_radarchart)[3:nrow(df_radarchart)]

par(mfrow=c(1,3), mar=c(0,0,1,0))

# Plot 3 (handgun):
create_radarchart(
  data = df_radarchart[c(1, 2, 5), ], seg = 6,
  color = colors[3], title = titles[3]
)

# Plot 2 (vaccination):
create_radarchart(
  data = df_radarchart[c(1, 2, 4), ],seg = 6,
  color = colors[2], title = titles[2]
)

# Plot 1 (artificial intelligence):
create_radarchart(
  data = df_radarchart[c(1, 2, 3), ], seg = 6,
  color = colors[1], title = titles[1]
)

```

#### Replicating the two-dimensional representation of risk perception

Participants were asked to replicate the classic two-dimensional visual representation of risks first obtained by Fischhoff et al. (1978).

<a href="https://link.springer.com/article/10.1007/BF00143739">
  <img alt="Qries" src="0_website/Fischoff.png"
  width=700" height="500">
</a>

Most participants created a modern version of the plot - including 1004 risks - and highlighting those that were also present in Fischhoff et al. (1978).

```{r}
data <- risk
data$category <- factor(data$cluster, levels=c("Activities / Technologies", "Crime / War / Violence", "Health", "Nature / Food", "Transportation" ))

plot_main <- data %>%
  bind_cols(tibble(risk = scales::rescale(data$risk_ratings))) %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(color = risk), alpha = 0.8, size = 5) +
  scale_colour_viridis_c() +
  geom_label(data = data %>% filter(in_Fisch == TRUE), aes(label = word)) +
  theme_minimal() +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-9.5, 9.5)) +
  annotate("label", x = 5, y = 1, label = "known", fill = "grey20", color = "white") +
  annotate("label", x = 5, y = 0, label = "voluntary", fill = "grey20", color = "white") +
  annotate("label", x = 5, y = -1, label = "immediate", fill = "grey20", color = "white") +
  annotate("label", x = -5, y = 1, label = "unknown", fill = "grey20", color = "white") +
  annotate("label", x = -5, y = 0, label = "involuntary", fill = "grey20", color = "white") +
  annotate("label", x = -5, y = -1, label = "delayed", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = 9.5, label = "dread", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = 9, label = "fatal", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = 8.5, label = "uncontrollable", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = 8, label = "involuntary", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = -8, label = "calm", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = -8.5, label = "non-fatal", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = -9, label = "controllable", fill = "grey20", color = "white") +
  annotate("label", x = 0, y = -9.5, label = "voluntary", fill = "grey20", color = "white")  +
  theme(legend.position="none") +
    theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())

data$alpha <- (data$risk_ratings + 100) / 200

# for individual categories
data.activ <- data %>% filter(category=="Activities / Technologies")
activities <- data.activ %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2, color=alpha)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point( size = 3, alpha=.8) +
  scale_colour_viridis_c() +
  theme_minimal() + labs(x=NULL, y=NULL) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-8, 8)) +
  labs(subtitle="Activities / Technologies", alpha=NULL) + theme(legend.position = "none") +
    theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())

data.crime <- data %>% filter(category=="Crime / War / Violence")
crime <- data.crime %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2, color=alpha)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point( size = 3, alpha=.8) +
  scale_colour_viridis_c() +
  theme_minimal() + labs(x=NULL, y=NULL) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-8, 8)) +
  labs(subtitle="Crime / War / Violence", alpha=NULL) + theme(legend.position = "none") +
    theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())

data.health <- data %>% filter(category=="Health")
health <- data.health %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2, color=alpha)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point( size = 3, alpha=.8) +
  scale_colour_viridis_c() +
  theme_minimal() + labs(x=NULL, y=NULL) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-8, 8)) +
  labs(subtitle="Health", alpha=NULL) + theme(legend.position = "none") +
    theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())

data.nature <- data %>% filter(category=="Nature / Food")
nature <- data.nature %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2, color=alpha)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point( size = 3, alpha=.8) +
  scale_colour_viridis_c() +
  theme_minimal() + labs(x=NULL, y=NULL) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-8, 8)) +
  labs(subtitle="Nature / Food", alpha=NULL) + theme(legend.position = "none") +
    theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())

data.transport <- data %>% filter(category=="Transportation")
transport <- data.transport %>%
  ggplot(aes(y = psych_pc1, x = psych_pc2, color=alpha)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point( size = 3, alpha=.8) +
  scale_colour_viridis_c() +
  theme_minimal() + labs(x=NULL, y=NULL) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-8, 8)) +
  labs(subtitle="Transportation", alpha=NULL) + theme(legend.position = "none") +
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title =element_blank())


layout <- "
AAAAAB
AAAAAC
AAAAAD
AAAAAE
AAAAAF
"
#plot_main + crime + health + transport + nature + activities + plot_layout(design = layout)

plot_main

ggsave("task2.png", width = 24, height = 16, units = "cm", scale = 1.5)
```

#### The results question the two-dimensional representation of risk perception

Fischhoff et al. (1978) conduct a principal component analysis on the psychometric items and concluded that risk could be accounted for by a two-dimensional representation. However, the novel results by Hussain et al. (2022) suggest that a two-dimensional representation may not be the best way to think about this larger set of risks.

As can be seen in the tile plot below, the first principal component (pc1) is highly correlated with the risk ratings (r = .81) but a number of additional components (pc3, pc5, pc2) show similar correlations to risk ratings (r ≈ .2). This questions a two-dimensional representation of risks as initially proposed by Fischhoff and colleagues.

```{r}
dat = risk %>%
  arrange(risk_ratings, in_Fisch) %>%
  mutate(cluster_fct = factor(cluster, levels = groups))

dat %>%
  select(starts_with("psych"),-psych_pred_risk,risk_ratings) %>%
  cor() %>%
  as.data.frame() %>%
  mutate(var = rownames(.)) %>%
  pivot_longer(-var) %>%
  filter(!str_detect(var, "[:digit:]"),
         str_detect(name, "[:digit:]")) %>%
  mutate(var = str_remove(var, "psych_"),
         name = str_remove(name, "psych_"),
         var = as_factor(var),
         name = factor(name,
                       levels = sort(unique(name), decreasing = T))) %>%
  ggplot(aes(x = var, y = name,
             label = round(value,2), fill = value)) +
   geom_tile() +
   geom_text(col = "white") +
   theme_minimal() +
   scale_fill_viridis_c() +
   theme(axis.text.x = element_text(angle = 45, hjust=1),
         axis.title = element_blank()) +
  theme(legend.position = "none") +
  geom_vline(xintercept = 9.5, col = "white", size=1)
```

#### Embedding and psychometric models show similar performance levels

The groups were asked to compare the results of different prediction models. As it stands, the new models do not seem to beat the psychometric paradigm. This suggests that the features identified by Fischhoff and colleagues are important predictors of risk perception. It will be interesting to assess to what extent future, more sophisticated embeddings and ensemble models are able to beat this classical model from the psychology of risk perception.

```{r}
risk_data <- risk
predict_data <- risk_data %>%
  select(risk_ratings, cluster, psych_pred_risk, glove_pred_risk, swow_pred_risk)  %>%
  pivot_longer(cols=c("psych_pred_risk", "glove_pred_risk", "swow_pred_risk"), names_to = "model", values_to = "predictions")

predict_data %>%
  ggplot(aes(x=risk_ratings, y=predictions, colour=model, legend())) +
  geom_point(alpha=0.3) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_colour_viridis_d() +
  theme_bw() +
  facet_wrap(~model)+
  stat_cor(aes(label = ..r.label..), digits = 2, color="black") +
  theme(axis.title = element_text(face = "bold"), legend.position = "none")
```

# Limitations and Future Directions

The CDS Risk Hackathon has mostly a pedagogical character and, consequently, focused providing participants with an opportunity to explore a novel data set. Nevertheless, some of the limitations of the data and analyses conducted should be emphasized.

- While great care was taken to ensure high reliability of the mean risk ratings and psychometric dimensions, the data used are averages of small numbers of English-speaking participants and, therefore, are not representive of global views of risk perception. Past work suggested that are important individual and group differences in risk assessments and it could be important to assess these to understand global risk perceptions as well as phenomena like risk polarization (cf. Wulff & Mata, 2022).
- Despite the data set of words representing the largest set of risks collected, there are nevertheless a number of risks that have not been considered but that are very relevant to public health and safety (e.g., cyberfraud, microplastic, fusion). Future work may consider yet additional sources of risk by pooling experts and laypeople's thoughts on relevant risk sources.

- The models considered in the prediction exercise are only a subset of possible models and more sophisticated ones may yet be developed. For example, ensemble models that make use of several embeddings can be used, which is something that will be explored in Hussain et al. (in preparation). More generally, the models considered were obtained from single words or bigrams and did not consider full sentences or larger amounts of text that can be important for contextualizing certain risks.

# Conclusion and Testimonials

Overall, the event was a success in that it helped participants reflect on the topic of risk perception while providing training on data wrangling and visualization skills. We are very happy to have received some positive testimonials from a number of participants!

"I really liked the close group collaboration and the fact that we had experienced team members who could help us."

"I learned a lot as a R Novice thanks to the instructors' guidance"

"It's a cool idea to split in different groups with one dataset and the same task. As in the end, we have different outcomes of visualizations and insights."

"the pizza was great :))"


```{r, echo = FALSE, message=FALSE, fig.cap='Intro/Conclusion Presentation', fig.align='center', out.width='500', out.height='281'}
knitr::include_graphics("presentation.pdf")
```


# References

Bhatia, S. (2019). Predicting risk perception: New insights from data science. Management Science, 65(8), 3800–3823. https://doi.org/10.1287/mnsc.2018.3121

Fischhoff, B., Slovic, P., Lichtenstein, S., Read, S., & Combs, B. (1978). How safe is safe enough? A psychometric study of attitudes towards technological risks and benefits. Policy Sciences, 9(2), 127–152. https://doi.org/10.1007/BF00143739

Jenny, M. A., & Betsch, C. (2022). Large-scale behavioural data are key to climate policy. Nature Human Behaviour, 6(11), 1444–1447. https://doi.org/10.1038/s41562-022-01479-4

Wulff, D. & Mata, R. (2022). On the semantic representation of risk. Science Advances, 8. https://science.org/doi/10.1126/sciadv.abm1883