6.4 Summary Statistics and Independence -> Example 6.4 could be wrong.

Hello. 

I noticed something that could be wrong in your notebook where you replicate the example 6.4.

I'm not quite sure what you did here :
```
first = np.round(np.random.multivariate_normal(mean1, cov1, int(n/4))*.4,3) # n/4 to adjust distribution to book figure for countour plot.
second = np.round(np.random.multivariate_normal(mean2, cov2, n)*.6,3)
data = np.vstack([first,second])
```

but this isn't a mixture of gaussian distribution that matches the book description. The coefficients should not be applied on the random variable itself but it's pdf!

Furthermore, according to the book, the mean/expected value of a gaussian mixture is given by : 

E(x) = alpha1 * mu1 + alpha2 * mu2

One plugs the corresponding means into the equation and should find that the (analytical) mean is : 

E(x) = 0.4 * [10,2] + 0.6 * [0,0] = [4,0.8]

Checked against the plot in your notebook, this doesn't match.

<img width="825" alt="Screenshot 2020-08-09 at 14 29 26" src="https://user-images.githubusercontent.com/26815719/89732098-c1157500-da4c-11ea-848e-8a3a84c9b762.png">

It looks E(x) is around [0.7,0.1].

Finally, the actual distribution pdf you describe is : 
p(x) = .2 * N1 + .8 * N2
where N1 is a random variable where a transformation f(x) = 0.4 * x is applied
for N2 it is g(x) = 0.6 * x
The mean of N1 is given by 0.4 * [10,2] = [4,0.8]
The mean of N2 is given by 0.6 * [0,0] = [0,0]
The mean of your actual distribution is given by 0.2 * mean_of_N1 + 0.8 * mean_of_N2 = 0.2 * [4,0.8] = [0.8,0.16]
Which is rather close to what's on your notebook!
The 0.2 and 0.8 coefficient are found from your notebook. n = 3000, there are n/4 = 750 samples for N1, and n=3000 samples for N2. 750 / 3750 = 0.2 and 3000 / 3750 = 0.8

Here is the simple change I propose : 
```
first = np.round(np.random.multivariate_normal(mean1, cov1, int(n*0.4)),3)
second = np.round(np.random.multivariate_normal(mean2, cov2, int(n*0.6)),3)
```

Instead of applying the coefficients on the random variables, we apply the coefficient on their sample size. It should be analogous to applying the coefficients to their respective pdf.

With those changes, we get this new plot : 
<img width="801" alt="Screenshot 2020-08-09 at 14 31 47" src="https://user-images.githubusercontent.com/26815719/89732136-118cd280-da4d-11ea-8350-ef292574fcdd.png">

There might be some work needed for the contour lines which I am not familiar with, but now the empirical mean checks with the analytical one!

I could be wrong since I've only carefully read this particular section of the notebook, and am open to any discussion regarding this matter.

Best,
Lam






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6.4 Summary Statistics and Independence -> Example 6.4 could be wrong. #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

6.4 Summary Statistics and Independence -> Example 6.4 could be wrong. #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions