QuantitativeMethodsEcologyEvolution/BIO708_CentralLimitTheorem.Rmd at main · DworkinLab/QuantitativeMethodsEcologyEvolution · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "Central Limit THeorom"
author: "Ian Dworkin"
date: "`r format(Sys.time(),'%d %b %Y')`"
output:
  pdf_document:
    toc: yes
  html_document:
    toc: yes
    number_sections: yes
    keep_md: yes
editor_options:
  chunk_output_type: console
---

# Central limit theorem, demonstration via simulation

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(list(digits = 4, show.signif.stars = F, show.coef.Pvalues = FALSE))
```


Let's look at the quote from Vasihsth and Broe (2010) again:

*Provided the sample size is large enough, the sampling distributionof the sample mean will be close to normal irrespective of what the population’s distribution looks like.*


What is this telling us?

Her we have a normal distribution $\sim N(\mu = 0, \sigma = 1)$

```{r}
par(mfrow=c(2,1))
curve(dnorm(x, mean = 0, sd = 1), -5, 5,
      lwd = 2, ylab = '~density',
      main = "ye olde Normal/Gaussian Distribution")
```


So it is of no great surprise that if repeatedly generate samples from this distribution, calculate the sample means each iteration, and then draw a histogram of alll the sample means, we get another approximately normal distribution for the sample means.

```{r}
random_normal_sample_means <- replicate(1000,
    expr = mean(rnorm(n = 50, mean = 0, sd = 1)))

plot(density(random_normal_sample_means),
    xlim = c(-5, 5),
    main = "hey, why is this distribution so much narrower?",
    lwd = 2, xlab = "x")

hist(random_normal_sample_means, add = T, col = "grey", freq = F)

par(mfrow = c(1,1))
```


Because it is the distribution of the sampling means!!!! Not the data distribution of an individual sample.


## non-Gaussian distributions of data

 What happens to the distribution of sample means when the distribution of the sample is not normal? This is where we start to see one of the deeper meanings of the central limit theorem (though not the only one).


 Here we try a poisson distribution, ($\sim poisson(\lambda = 1)$). The poisson is a discrete distribution, bounded by zero.

```{r}
par(mfrow = c(2, 1))

random_poisson_data <- rpois(n = 10000, lambda = 1)

hist(random_poisson_data, freq = F)
```

### sampling distribution for the mean of poisson distributed samples.

What do we expect to see if we look at the sampling distribution of the means for this distribution?


```{r}
random_poisson_sample_means <- replicate(10000,
  expr = mean(rpois(n = 100, lambda = 1)))

plot(density(random_poisson_sample_means),
    main = "This looks pretty normal....",
    lwd = 2, xlab = "x")

hist(random_poisson_sample_means, add = T , col = "grey", freq = F)

par(mfrow = c(1, 1))
```


## exponential


```{r}
par(mfrow = c(2, 1))
random_exponential_sample <- rexp(1000, rate = 1)

plot(density(random_exponential_sample),
    ylim = c(0, 1))

hist(random_exponential_sample, add = T, col = "grey", freq = F)
```


```{r}
random_exponential_sample_means <- replicate(10000,
    expr = mean(rexp(n = 1000, rate = 1)))

plot(density(random_exponential_sample_means),
    main = "This looks pretty normal....",
    lwd = 2, xlab = "x")

hist(random_exponential_sample_means , add = T, col = "grey", freq = F)
par(mfrow = c(1, 1))
```


## try for yourself.

Individually or in groups, play with this, you can try some other distributions (gamma, beta, binomial...), play with them and see what you get....