Skip to content

6.4 Summary Statistics and Independence -> Example 6.4 could be wrong. #1

@lthiet

Description

@lthiet

Hello.

I noticed something that could be wrong in your notebook where you replicate the example 6.4.

I'm not quite sure what you did here :

first = np.round(np.random.multivariate_normal(mean1, cov1, int(n/4))*.4,3) # n/4 to adjust distribution to book figure for countour plot.
second = np.round(np.random.multivariate_normal(mean2, cov2, n)*.6,3)
data = np.vstack([first,second])

but this isn't a mixture of gaussian distribution that matches the book description. The coefficients should not be applied on the random variable itself but it's pdf!

Furthermore, according to the book, the mean/expected value of a gaussian mixture is given by :

E(x) = alpha1 * mu1 + alpha2 * mu2

One plugs the corresponding means into the equation and should find that the (analytical) mean is :

E(x) = 0.4 * [10,2] + 0.6 * [0,0] = [4,0.8]

Checked against the plot in your notebook, this doesn't match.

Screenshot 2020-08-09 at 14 29 26

It looks E(x) is around [0.7,0.1].

Finally, the actual distribution pdf you describe is :
p(x) = .2 * N1 + .8 * N2
where N1 is a random variable where a transformation f(x) = 0.4 * x is applied
for N2 it is g(x) = 0.6 * x
The mean of N1 is given by 0.4 * [10,2] = [4,0.8]
The mean of N2 is given by 0.6 * [0,0] = [0,0]
The mean of your actual distribution is given by 0.2 * mean_of_N1 + 0.8 * mean_of_N2 = 0.2 * [4,0.8] = [0.8,0.16]
Which is rather close to what's on your notebook!
The 0.2 and 0.8 coefficient are found from your notebook. n = 3000, there are n/4 = 750 samples for N1, and n=3000 samples for N2. 750 / 3750 = 0.2 and 3000 / 3750 = 0.8

Here is the simple change I propose :

first = np.round(np.random.multivariate_normal(mean1, cov1, int(n*0.4)),3)
second = np.round(np.random.multivariate_normal(mean2, cov2, int(n*0.6)),3)

Instead of applying the coefficients on the random variables, we apply the coefficient on their sample size. It should be analogous to applying the coefficients to their respective pdf.

With those changes, we get this new plot :
Screenshot 2020-08-09 at 14 31 47

There might be some work needed for the contour lines which I am not familiar with, but now the empirical mean checks with the analytical one!

I could be wrong since I've only carefully read this particular section of the notebook, and am open to any discussion regarding this matter.

Best,
Lam

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions