-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
Forgive me if this is a known counterintuitive point deemed irrelevant, but I noticed total variance explained by all components is greater than one. That's true in my dataset with missing values, but also in the complete example below.
import numpy as np
from ppca import PPCA
x = np.random.randn(50,20)
m = PPCA()
m.fit(data=x)
print(m.var_exp)
[0.11460246 0.21691676 0.30977113 0.40169889 0.4885789 0.56032857
0.62697946 0.68458968 0.73693932 0.78439966 0.82519526 0.86416853
0.89399395 0.9215888 0.94654215 0.9696493 0.98877802 1.00211534
1.01186025 1.02040816]
It seems to be related to the fact that the sum of all eigenvalues is greater than the number of dimensions in the original dataset. Since sum of eigenvalues should be equal to trace of correlation matrix, I would not expect that to be the case.
print(np.cumsum(m.eig_vals)/20.)
[0.11460246 0.21691676 0.30977113 0.40169889 0.4885789 0.56032857
0.62697946 0.68458968 0.73693932 0.78439966 0.82519526 0.86416853
0.89399395 0.9215888 0.94654215 0.9696493 0.98877802 1.00211534
1.01186025 1.02040816]
print(m.eig_vals.sum())
20.40816326530614
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels