book/03-decision-03-working.Rmd at master · StatsWithR/book · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
## Working with Loss Functions

Now we illustrate why certain estimates minimize certain loss functions.

```{example, label="car"}
You work at a car dealership. Your boss wants to know how many cars the dealership will sell per month. An analyst who has worked with past data from your company provided you a distribution that shows the probability of number of cars the dealership will sell per month. In Bayesian lingo, this is called the posterior distribution. A dot plot of that posterior is shown in Figure \@ref(fig:posterior-decision). The mean, median and the mode of the distribution are also marked on the plot. Your boss does not know any Bayesian statistics though, so he/she wants you to report **a single number** for the number of cars the dealership will sell per month.
```

```{r posterior-decision, fig.height=3, fig.width=10, fig.align='center', fig.cap="Posterior", echo=FALSE, message=FALSE}
# posterior ---------------------------------------------------------

posterior <- c(47, 33, 35, 32, 19, 33, 34, 36, 47, 32, 35, 41, 32, 29, 35,
25, 32, 36, 20, 47, 37, 32, 35, 25, 37, 40, 36, 38, 40, 35, 49,
23, 33, 35, 38, 28, 36, 4, 28, 45, 37, 39, 34, 41, 28, 33, 27,
26, 30, 34, 23)

# length(posterior) # 51

# t(sort(posterior))

# mean(posterior) # 33.45098
# median(posterior) # 34
# table(posterior)[which.max(table(posterior))] # 35

# dotplot of posterior ----------------------------------------------

# pdf("posterior.pdf", width = 10, height = 3)
par(mar = c(2, 0, 0, 0), cex.axis = 1.5, cex = 1.5)
BHH2::dotPlot(posterior, pch = 19, xlim = c(0, 50), axes = FALSE)
axis(1, at = seq(from = 0, to = 50, by = 5))
abline(v = mean(posterior), col = "orange", lwd = 4)
abline(v = median(posterior), col = "turquoise4", lwd = 4)
abline(v = 35, col = "pink", lwd = 4)
legend("topleft", col = c("orange", "turquoise4", "pink"),
c("mean", "median", "mode"), lty = 1, lwd = 4,
bty = "n")
# dev.off()
```

Suppose your single guess is 30, and we call this $g$ in the following calculations. If your loss function is $L_0$ (i.e., a 0/1 loss), then you lose a point for each value in your posterior that differs from your guess and do not lose any points for values that exactly equal your guess. The total loss is the sum of the losses from each value in the posterior.

In mathematical terms, we define $L_0$ (0/1 loss) as

$$L_{0,i}(0,g) = \left\{ \begin{array}{cc}
0 & \text{if } g=x_i \\ 1 & \text{otherwise}
\end{array}\right.$$

The total loss is $L_0 = \sum_i L_{0,i}(0,g)$.

Let's calculate what the total loss would be if your guess is 30. Table \@ref(tab:L0-table) summarizes the values in the posterior distribution sorted in descending order.

The first value is 4, which is not equal to your guess of 30, so the loss for that value is 1. The second value is 19, also not equal to your guess of 30, and the loss for that value is also 1. The third value is 20, also not equal to your guess of 30, and the loss for this value is also 1.

There is only one 30 in your posterior, and the loss for this value is 0 -- since it's equal to your guess (good news!). The remaining values in the posterior are all different than 30 hence, the loss for them are all ones as well.

To find the total loss, we simply sum over these individual losses in the posterior distribution with 51 observations where only one of them equals our guess and the remainder are different. Hence, the total loss is 50.

Figure \@ref(fig:L0-mode) is a visualization of the posterior distribution, along with the 0-1 loss calculated for a series of possible guesses within the range of the posterior distribution. To create this visualization of the loss function, we went through the process we described earlier for a guess of 30 for all guesses considered, and we recorded the total loss. We can see that the loss function has the lowest value when $g$, our guess, is equal to **the most frequent observation** in the posterior. Hence, $L_0$ is minimized at the **mode** of the posterior, which means that if we use the 0/1 loss, the best point estimate is the mode of the posterior.

```{r L0-table, echo=FALSE}
i = c(1,2,3,"",14,"",50,51)
xi = c(4,19,20,"...",30,"...",47,49)
L0_loss = c(1,1,1,"...",0,"...",1,1)

temp <- cbind(i,xi,L0_loss)
temp <- rbind(temp, c("","Total",50))
temp <- data.frame(temp)
names(temp) <- c("i","x_i","L0: 0/1")

knitr::kable(
x = temp, booktabs = TRUE,
caption = "L0: 0/1 loss for g = 30", align = 'c'
)
```

```{r L0-mode, fig.height=5, fig.width=10, fig.align='center', fig.cap="L0 is minimized at the mode of the posterior", echo=FALSE, message=FALSE}

# g = 30 ------------------------------------------------------------

g = 30

# L0 for g ----------------------------------------------------------

# (abs(sort(posterior)-g) > 1e-6)[c(1,2,3,12,13,14,50,51)]
# sum( abs(posterior-g) > 1e-6 )

# L0 ----------------------------------------------------------------

# pdf("L0.pdf", width = 10, height = 5)
par(mfrow = c(2, 1), cex = 1.5, mar = c(2, 4, 0.5, 0), las = 1)
s = seq(0, 50, by = 0.01)
L0 = sapply(s, function(s) sum( abs(posterior - s) > 1e-6 ))
plot(s, L0, ylab = "L0", type = "l", axes = FALSE, xlim = c(0, 50))
axis(2, at = c(44, 46, 48, 50))
#
BHH2::dotPlot(posterior, pch = 19, xlim = c(0, 50), axes = FALSE)
axis(1, at = seq(from = 0, to = 50, by = 5))
abline(v = mean(posterior), col = "orange", lwd = 4)
abline(v = median(posterior), col = "turquoise4", lwd = 4)
abline(v = 35, col = "pink", lwd = 4)
legend("topleft", col = c("orange", "turquoise4", "pink"),
c("mean", "median", "mode"), lty = 1, lwd = 4,
bty = "n")
# dev.off()
```

Let's consider another loss function. If your loss function is $L_1$ (i.e., linear loss), then the total loss for a guess is the sum of the **absolute values** of the difference between that guess and each value in the posterior. Note that the absolute value function is required, because overestimates and underestimates do not cancel out.

In mathematical terms, $L_1$ (linear loss) is calculated as $L_1(g) = \sum_i |x_i - g|$.

We can once again calculate the total loss under $L_1$ if your guess is 30. Table \@ref(tab:L1-table) summarizes the values in the posterior distribution sorted in descending order.

The first value is 4, and the absolute value of the difference between 4 and 30 is 26. The second value is 19, and the absolute value of the difference between 19 and 30 is 11. The third value is 20 and the absolute value of the difference between 20 and 30 is 10.

There is only one 30 in your posterior, and the loss for this value is 0 since it is equal to your guess. The remaining value in the posterior are all different than 30 hence their losses are different than 0.

To find the total loss, we again simply sum over these individual losses, and the total is to 346.

Again, Figure \@ref(fig:L1-median) is a visualization of the posterior distribution, along with a linear loss function calculated for a series of possible guesses within the range of the posterior distribution. To create this visualization of the loss function, we went through the same process we described earlier for all of the guesses considered. This time, the function has the lowest value when $g$ is equal to the **median** of the posterior. Hence, $L_1$ is minimized at the **median** of the posterior one other loss function.

```{r L1-table, echo=FALSE}
# i = c(1,2,3,"",14,"",50,51)
# xi = c(4,19,20,"...",30,"...",47,49)
L1_loss = c(26,11,10,"...",0,"...",17,19)

temp <- cbind(i,xi,L1_loss)
temp <- rbind(temp, c("","Total",346))
temp <- data.frame(temp)
names(temp) <- c("i","x_i","L1: |x_i-30|")

knitr::kable(
  x = temp, booktabs = TRUE,
  caption = "L1: linear loss for g = 30", align = 'c'
)
```

```{r L1-median, fig.height=5, fig.width=10, fig.align='center', fig.cap="L1 is minimized at the median of the posterior", echo=FALSE, message=FALSE}
# L1 for g = 30 -----------------------------------------------------

# abs(sort(posterior) - g)[c(1,2,3,12,13,14,50,51)]
# sum(abs(sort(posterior) - g))

# L1 ----------------------------------------------------------------

# pdf("L1.pdf", width = 10, height = 5)
par(mfrow = c(2, 1), cex = 1.5, mar = c(2, 4, 0.5, 0), las = 1)
s = seq(0, 50, by = 0.01)
L1 = sapply(s,function(s) sum( abs(posterior - s) ))
plot(s, L1, ylab = "L1", type = "l", xlim = c(0, 50), axes = FALSE)
axis(2, at = seq(250, 1650, 350))
#
BHH2::dotPlot(posterior, pch = 19, xlim = c(0, 50), axes = FALSE)
axis(1, at = seq(from = 0, to = 50, by = 5))
abline(v = mean(posterior), col = "orange", lwd = 4)
abline(v = median(posterior), col = "turquoise4", lwd = 4)
abline(v = 35, col = "pink", lwd = 4)
legend("topleft", col = c("orange", "turquoise4", "pink"),
       c("mean", "median", "mode"), lty = 1, lwd = 4,
       bty = "n")
# dev.off()
```

If your loss function is $L_2$ (i.e. a squared loss), then the total loss for a guess is the sum of the squared differences between that guess and each value in the posterior.

We can once again calculate the total loss under $L_2$ if your guess is 30. Table \@ref(tab:L2-table) summarizes the posterior distribution again, sorted in ascending order.

The first value is 4, and the squared difference between 4 and 30 is 676. The second value is 19, and the square of the difference between 19 and 30 is 121. The third value is 20, and the square difference between 20 and 30 is 100.

There is only one 30 in your posterior, and the loss for this value is 0 since it is equal to your guess. The remaining values in the posterior are again all different than 30, hence their losses are all different than 0.

To find the total loss, we simply sum over these individual losses again and the total loss comes out to 3,732. We have the visualization of the posterior distribution. Again, this time along with the squared loss function calculated for a possible serious of possible guesses within the range of the posterior distribution.

Creating the visualization in Figure \@ref(fig:L2-mean) had the same steps. Go through the same process described earlier for a guess of 30, for all guesses considered, and record the total loss. This time, the function has the lowest value when $g$ is equal to the **mean** of the posterior. Hence, $L_2$ is minimized at the **mean** of the posterior distribution.

```{r L2-table, echo=FALSE}
# i = c(1,2,3,"",14,"",50,51)
# xi = c(4,19,20,"...",30,"...",47,49)
L2_loss = c(676,121,100,"...",0,"...",289,361)

temp <- cbind(i,xi,L2_loss)
temp <- rbind(temp, c("","Total",3732))
temp <- data.frame(temp)
names(temp) <- c("i","x_i","L2: (x_i-30)^2")

knitr::kable(
  x = temp, booktabs = TRUE,
  caption = "L2: squared loss for g = 30", align = 'c'
)
```

```{r L2-mean, fig.height=5, fig.width=10, fig.align='center', fig.cap="L2 is minimized at the mean of the posterior", echo=FALSE, message=FALSE}
# L2 for g = 30 -----------------------------------------------------

# ((sort(posterior) - g)^2)[c(1,2,3,12,13,14,50,51)]
# sum((sort(posterior) - g)^2)

# L2 ----------------------------------------------------------------

# pdf("L2.pdf", width = 10, height = 5)
par(mfrow = c(2, 1), cex = 1.5, mar = c(2, 4, 0.5, 0), las = 1)
s = seq(0, 50, by = 0.01)
L2 = sapply(s,function(s) sum( (posterior - s)^2 ))
plot(s, L2, ylab = "L2", type = "l", xlim = c(0, 50), ylim = c(0, 61000), axes = FALSE)
axis(2, at = seq(0, 60000, 20000))
#
BHH2::dotPlot(posterior, pch = 19, xlim = c(0, 50), axes = FALSE)
axis(1, at = seq(from = 0, to = 50, by = 5))
abline(v = mean(posterior), col = "orange", lwd = 4)
abline(v = median(posterior), col = "turquoise4", lwd = 4)
abline(v = 35, col = "pink", lwd = 4)
legend("topleft", col = c("orange", "turquoise4", "pink"),
       c("mean", "median", "mode"), lty = 1, lwd = 4,
       bty = "n")
# dev.off()
```

To sum up, the point estimate to report to your boss about the number of cars the dealership will sell per month **depends on your loss function**. In any case, you will choose to report the estimate that minimizes the loss.

* $L_0$ is minimized at the **mode** of the posterior distribution.
* $L_1$ is minimized at the **median** of the posterior distribution.
* $L_2$ is minimized at the **mean** of the posterior distribution.