scan-Book/ch_randomization_test.qmd at master · researchtool/scan-Book · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# Randomization tests {#sec-rand}

```{r}
#| results: asis
#| echo: false

function_structure("rand_test")
```

The `rand_test` function computes a randomization test for single or multiple baseline single-case data. The function is based on an algorithm from the SCRT package [@bulté2008; @bulté2009], but has been rewritten and extended.

The basic idea of a randomization test is to think counterfactually: "Assuming that the phase had no effect on the measured data, what would be the difference between the phases of my case if I had started phase B at a different time? Given the possible phase differences under the assumption that phase had no effect, how likely are the actual phase differences of the original case?"

Therefore, a number of new cases are generated with a random start for each phase. This means that these new cases have the same data as the original case, but different starting points for each phase. A specific statistic (e.g., the mean difference between Phase A and Phase B data) is now calculated for each new case. When enough random cases have been generated, we also generate a series of new statistics (e.g., mean differences). The statistic for the original case is now compared to these new statistic values. The percentile of the original statistic within the new generated statistic values is the probability of the original statistic assuming a random distribution of starting points for each phase. This percentile is returned as the p-value in the randomization test analyses.

```{r}
#| echo: false
#| label: fig-ex-rand-test
#| fig-cap: Illustration of a randomization test

df <- scdf(c(44, 51, 55, 38, 52, 63, 57, 57, 57, 56, 58, 55, 56, 60, 65), phase_design = list(A = 5, B = 10), name = "Original case")
df2 <- scdf(c(44, 51, 55, 38, 52, 63, 57, 57, 57, 56, 58, 55, 56, 60, 65), phase_design = list(A = 4, B = 11), name = "Random variation 1")
df3 <- scdf(c(44, 51, 55, 38, 52, 63, 57, 57, 57, 56, 58, 55, 56, 60, 65), phase_design = list(A = 6, B = 9), name = "Random variation 2")
df4 <- scdf(c(44, 51, 55, 38, 52, 63, 57, 57, 57, 56, 58, 55, 56, 60, 65), phase_design = list(A = 7, B = 8), name = "Random variation 3")
df5 <- scdf(c(44, 51, 55, 38, 52, 63, 57, 57, 57, 56, 58, 55, 56, 60, 65), phase_design = list(A = 8, B = 7), name = "Random variation 4")

study <- c(df, df2, df3, df4, df5)

d <- describe(study)$descriptives
diff_m <- paste0("Difference B - A: ", round(d$m.B - d$m.A, 1))

p <- scplot(study) %>% set_theme("basic")

for(i in 1:length(study))
  p <- p %>%
    add_text(diff_m[i], case = i, x = 9, y = 45, hjust = 0, color = "darkred")

p

```

## Arguments of the `rand_test()` function

The `statistics` argument defines the statistics on which the phase comparison is based. The following comparisons can be made:

|                |                                                                                                                                          |
|----------------|-----------------------------------------------------------|
| Mean A-B       | The difference between the mean of Phase A and the mean of Phase B. This is appropriate if a decrease in scores is expected for Phase B. |
| Mean B-A       | The difference between the mean of Phase B and the mean of Phase A. This is appropriate if a increase in scores is expected for Phase B. |
| Mean \|A-B\|   | The absolute value of the difference between the means of Phases A and B.                                                                |
| Median A-B     | The same as *Mean A-B*, but based on the median.                                                                                         |
| Median B-A     | The same as *Mean B-A*, but based on the median.                                                                                         |
| Median \|A-B\| | The same as *Mean \|A-B\|*, but based on the median.                                                                                     |
|                | **Some further experimental statistics are also available:**                                                                             |
| SMD hedges     | The standardized difference between phases as Hedge's g.                                                                                 |
| SMD glass      | The standardized difference between phases as Glass's delta.                                                                             |
| NAP            | The Nonoverlap of all pairs.                                                                                                             |
| NAP decreasing | The Nonoverlap of all pairs assuming decreasing values in phase B.                                                                       |
| Slope B-A      | The difference of a linear slope line of the phases.                                                                                     |
| Slope A-B      | The difference of a linear slope line of the phases expecting a decreasing effect in phase B.                                            |
| W-test         | Calculates a Wilcoxon test and takes the W statistic.                                                                                    |
| T-test         | Calculates a T-test test and takes the t statistic.                                                                                      |

Further argument are

| Argument        | What it does ...                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|-----------|-------------------------------------------------------------|
| *number*        | Sample size of the randomization distribution. The exactness of the p-value can not exceed 1/number (i.e., number = 100 results in p-values with an exactness of one percent). Default is number = 500. For faster processing use number = 100. For more precise p-values set number = 1000.                                                                                                                                                         |
| *complete*      | If TRUE, the distribution is based on a complete permutation of all possible starting combinations. This setting overwrites the number Argument. The default setting is FALSE.                                                                                                                                                                                                                                                                       |
| *limit*         | Minimal number of data points per phase in the sample. The first number refers to the A-phase and the second to the B-phase (e.g., limit = c(5, 3)). If only one number is given, this number is applied to both phases. Default is limit = 5.                                                                                                                                                                                                       |
| *startpoints*   | Alternative to the limit-parameter, startpoints exactly defines the possible start points of phase B (e.g., startpoints = 4:9 restricts the phase B start points to measurements 4 to 9. startpoints overwrite the limit-parameter.                                                                                                                                                                                                                  |
| *exclude.equal* | If set to FALSE, which is the default, random distribution values equal to the observed distribution are counted as null-hypothesis conform. That is, they decrease the probability of rejecting the null-hypothesis (increase the p-value). exclude.equal should be set to TRUE if you analyse one single-case design (not a multiple baseline data set) to reach a sufficient power. But be aware, that it increases the chance of an alpha-error. |

: Arguments of the randomization test function.

## Example

```{r rand}
res <- rand_test(exampleAB)
res
```

## Visuaization

The `plot_rand()` function plots a distribution of the random sample against the observed statistic:

```{r}
#| label: plot_rand()
plot_rand(res)
```

A more sophisticated histogram is available from the `scplot` package since version 0.4.1.

```{r}
scplot(res)
```

Another visualization is available since scplot version 0.5.1 for a randomization test with one case.
It gives you an overview of the respective statistic for each available startpoint of phase B.

```{r}
rand_test(
  byHeart2011$`Lisa (Turkish)`,
  statistic = "Median B-A",
  limit = 1,
  complete = TRUE) |>
  scplot(type = "xy")
```

## Providing a custom function

If you want to experiment around with new statistics for the randomization test, you can add your own analyses functions.

Use the `statistic_function argument` to provide your own function in a list. This list must have an element named `statistic` with a function that takes two arguments `a` and `b` and returns a single numeric value. For example, `list(statistic = function(a, b) mean(a) - mean(b))`. The second element of the list is named `aggregate` which takes a function with one numeric argument that returns a numeric argument. This function is used to aggregate the values of a multiple case design. If you do not provide this element, the default `function(x) sum(x) / length(x)`.  is used. The third optional argument is `name` which provides a name for your custom function.

Here is an example that implements the PND statistic and uses the median to aggregate the PNDs in a multiple case study.

```{r}
new_statistic <- list(
  statistic = function(a, b) sum(b > max(a), na.rm = TRUE) / sum(!is.na(b)),
  aggregate = function(x) median(x),
  name = "PND"
)

rand_test(Huber2014, statistic_function = new_statistic)
```