-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdistributions.qmd
More file actions
145 lines (95 loc) · 3.45 KB
/
distributions.qmd
File metadata and controls
145 lines (95 loc) · 3.45 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "Parametric survival distributions"
bibliography: references.bib
format:
html:
code-copy: true
editor_options:
chunk_output_type: console
---
## Introduction
There are a wide range of parametric models available, and each have their own characteristics which make them suitable for different data sets. Exponential, Weibull, Gompertz, log-logistic, log normal and Generalised Gamma parametric models should all be considered. These models, and methods to assess which of these models are suitable for particular data sets are described below. Further details on the properties of the individual parametric models that should be considered can be found in [@Collett2013], including diagrams of hazard, survivor and probability density functions which show the variety of shapes that the different models can take, depending upon their parameters. The hazard function is the event rate at time t conditional upon survival until time $t$. The survivor function is the probability that the survival time is greater than or equal to time $t$.
General equations for survival analysis are the following. The hazard is given by
$$
h(t) = \frac{f(t)}{S(t)}
$$
The survival (or survivor) curve is
$$
S(t) = 1 - F(t) = \int_t^{\infty} f(t) \; \mbox{d}t
$$
and the cumulative hazard is given by
$$
H(t) = \int h(t) \; \mbox{d}t
$$
where $f(t)$ is the density function, and $F(t)$ is the cumulative distribution function.
We now introduce the separate functions for the most common distributions.
## Exponential
The hazard is constant i.e.
$$
h(t) = \lambda
$$
and this then corresponds to
$$
S(t) = \exp \left(- \int \lambda \;\mbox{d}t \right) = e^{-\lambda t}
$$
In R random samples from the exponential distribution can be obtained using `rexp` e.g. make 10 draws using `rexp(n = 10, rate = 2)`. The other functions are `pexp` fore the cumulative distribution, `qexp` is the quantile function and density is `dexp`.
We can easily view a curve as follows.
```{r}
curve(dexp(x, rate = 0.5), from=0, to=10, col='blue')
```
For the following we will also give an example density plot.
Where there is a base R function available to do this we will use this.
If not the we will use the equivalent functions from the `flexsurv` package.
## Weibull
Hazard
$$
h(t) = \lambda \gamma t^{\gamma-1}
$$
Survival function
$$
S(t) = \exp \left( - \int_0^t \lambda \gamma u^{\gamma-1} \; \mbox{d}u \right) = \exp(- \lambda t^{\gamma})
$$
```{r}
curve(dweibull(x, shape = 2, scale = 4), from=0, to=10, col='blue')
```
## Gompertz
Hazard function
$$
h(t) = \lambda e^{\theta t}
$$
Survival functions
$$
S(t) = \exp \left( \frac{\lambda}{\theta} \left(1 - e^{\theta t} \right) \right)
$$
```{r}
curve(flexsurv::dgompertz(x, shape = 0.3, rate = 0.25), from=0, to=10, col='blue')
```
## Log-Logistic
Hazard function
$$
h(t) = \frac{e^{\theta} \kappa t^{\kappa - 1}}{1 + e^{\theta} t^{\kappa}}
$$
Survival function
$$
S(t) = \left( 1 + e^{\theta} t^{\kappa} \right)^{-1}
$$
```{r}
curve(flexsurv::dllogis(x, shape = 3, log = TRUE), from=0, to=10, col='blue')
```
## Log normal
Survival function
$$
S(t) = 1 - \Phi\left( \frac{\log t - \mu}{\sigma} \right)
$$
```{r}
curve(dlnorm(x, meanlog = 2, sdlog = 0.5), from=0, to=10, col='blue')
```
## Generalised Gamma
Survival function
$$
S(t) = 1 - \Gamma_{\lambda t}^{\theta} (\rho)
$$
where $\Gamma_{\lambda t} (\rho)$ is the incomplete gamma function.
```{r}
curve(flexsurv::dgengamma(x, mu = 5, sigma = 2, Q = 0.1), from=0, to=10, col='blue')
```