Total variance explained > 1

Forgive me if this is a known counterintuitive point deemed irrelevant, but I noticed total variance explained by all components is greater than one. That's true in my dataset with missing values, but also in the complete example below.

```
import numpy as np
from ppca import PPCA

x = np.random.randn(50,20)
m = PPCA()
m.fit(data=x)

print(m.var_exp)
[0.11460246 0.21691676 0.30977113 0.40169889 0.4885789  0.56032857
 0.62697946 0.68458968 0.73693932 0.78439966 0.82519526 0.86416853
 0.89399395 0.9215888  0.94654215 0.9696493  0.98877802 1.00211534
 1.01186025 1.02040816]
```
It seems to be related to the fact that the sum of all eigenvalues is greater than the number of dimensions in the original dataset. Since sum of eigenvalues should be equal to trace of correlation matrix, I would not expect that to be the case.
```
print(np.cumsum(m.eig_vals)/20.)
[0.11460246 0.21691676 0.30977113 0.40169889 0.4885789  0.56032857
 0.62697946 0.68458968 0.73693932 0.78439966 0.82519526 0.86416853
 0.89399395 0.9215888  0.94654215 0.9696493  0.98877802 1.00211534
 1.01186025 1.02040816]

print(m.eig_vals.sum())
20.40816326530614
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total variance explained > 1 #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Total variance explained > 1 #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions