plsmfa

plsmfa is a package to perform partial least squares multiple factor analysis (PLSMFA). PLSMFA is an extension of partial least squares correlation (PLSC) for the case when the data contains groups of variables called subtables. The subtables are normalized by their first singular value, so that each contributes equally to the analysis. This is package is in development.

Installation

You can install plsmfa from GitHub with:

# install.packages("devtools")
devtools::install_github("LukeMoraglia/plsmfa")

plsmfa depends on several packages that are mostly on CRAN, but there are some packages that are available only through GitHub. Download them with the code below:

# Available through GitHub
devtools::install_github("HerveAbdi/PTCA4CATA")
devtools::install_github("HerveAbdi/data4PCCAR")
devtools::install_github("derekbeaton/GSVD")

# Optional modified version of ExPosition
devtools::install_github("LukeMoraglia/ExPosition1/ExPosition")

Example: Simulated Data

The sim_data contains two data tables, X and Y, which have groups of variables, stored in X_design and Y_design. The subtables of X are SubX1 and SubX2 and of Y are SubY1 and SubY2.

library(plsmfa)
data("sim_data")
X <- sim_data$X
Y <- sim_data$Y
X_design <- sim_data$X_design
Y_design <- sim_data$Y_design

head(X) %>% kable()

X1	X2	X3	X4	X5	X6	X7	X8	X9
1.4298969	1.5087200	1.5593433	1.8359701	1.2409016	1.2009654	-2.0009292	-0.0046208	1.3349126
-0.4861499	-0.4196816	-0.6144210	-0.4598737	-0.7653348	1.0447511	0.3337772	0.7602422	-0.8692718
0.1629597	0.4066045	0.3824242	0.5572751	0.2561056	-1.0032086	1.1713251	0.0389909	0.0554870
0.5677172	0.5925313	0.5460764	0.7082573	0.6107797	1.8484819	2.0595392	0.7350721	0.0490669
0.2025986	0.1311304	0.8400019	0.2050816	0.5243544	-0.6667734	-1.3768616	-0.1464726	-0.5783557
-0.2332108	-0.1679120	-0.6978804	-0.2256211	-0.0229555	0.1055138	-1.1508556	-0.0578873	-0.9987387

head(Y) %>% kable()

Y1	Y2	Y3	Y4	Y5	Y6
0.6244420	0.1854115	-1.4745524	-2.3617590	0.0129692	1.8980617
-0.5280921	-1.2230875	-0.7302457	0.2522875	1.9208308	0.8632615
0.6864380	1.4526364	1.1439127	0.5789615	0.5414974	-0.9467181
1.0125386	1.1416485	1.8226159	2.5586943	0.6238643	0.9114952
1.2808248	0.2683617	-1.4536954	-1.8538973	0.1735249	-0.6694729
0.8272635	-0.2149073	-0.4931129	-0.9465768	1.0130864	0.3821033

t(X_design)

#>      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]    [,9]   
#> [1,] "SubX1" "SubX1" "SubX1" "SubX1" "SubX1" "SubX2" "SubX2" "SubX2" "SubX2"

t(Y_design)

#>      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]   
#> [1,] "SubY1" "SubY1" "SubY2" "SubY2" "SubY2" "SubY2"

Run `plsmfa`

res_plsmfa <- plsmfa(data1 = X, 
                     data2 = Y,
                     column_design1 = X_design,
                     column_design2 = Y_design,
                     center1 = TRUE, center2 = TRUE,
                     scale1 = "SS1", scale2 = "SS1")

The cross-product matrix of the normalized data

Once each subtable is normalized by its first singular value, PLSMFA computes the cross-product matrix of the normalized data tables and decomposes this cross-product matrix using the SVD. We can see the cross-product matrix using:

X_var_color <- sim_data$X_var_color$oc
Y_var_color <- sim_data$Y_var_color$oc

normed_R <- t(res_plsmfa$normed_X) %*% res_plsmfa$normed_Y
R_bound <- max(max(normed_R), abs(min(normed_R)))

superheat::superheat(normed_R,
                     membership.rows = X_design,
                     membership.cols = Y_design,
                     left.label = 'variable',
                     bottom.label = 'variable',
                     heat.lim = c(-R_bound, R_bound),
                     heat.pal = c("#67001F", "white", "#053061"),
                     heat.pal.values = c(0, 0.4, 0.5, 0.6, 1),
                     left.label.text.col = X_var_color,
                     bottom.label.text.col = Y_var_color
                     )

Statistical inference

The package contains permutation testing to test the eigenvalues, and bootstrap resampling to evaluate the stability of the loadings. These features are fully developmental, and they have not been fully validated. Note that if your data are large, these can take a while to run.

res_perm <- perm_plsmfa(data1 = X, 
                        data2 = Y,
                        column_design1 = X_design,
                        column_design2 = Y_design,
                        center1 = TRUE, center2 = TRUE,
                        scale1 = "SS1", scale2 = "SS1",
                        n_iter = 1000,
                        compact = TRUE,
                        bootstrap_first_singval = TRUE)

res_boot <- boot_plsmfa(data1 = X, 
                        data2 = Y,
                        column_design1 = X_design,
                        column_design2 = Y_design,
                        center1 = TRUE, center2 = TRUE,
                        scale1 = "SS1", scale2 = "SS1",
                        n_iter = 1000,
                        n_dimensions = 3,
                        boot_ratio_threshold = 2)

Some graphs

Scree

First up, check the eigenvalues.

PTCA4CATA::PlotScree(ev = res_plsmfa$pls$l,
                     p.ev = res_perm$p_eig,
                     plotKaiser = TRUE, 
                     alpha = 0.01)

The first 3 dimensions explain nearly all of the variance, and each one has p < .01.

Loadings (a.k.a saliences)

The loadings of the variables show their importance for a dimension (in PLSC, these loadings are also called saliences). We can look at the loadings as arrows on a 2D plot, where arrows with a small angle between them are positively related on these dimensions. Loadings for X are stored in res_plsmfa$pls$u and for Y in res_plsmfa$pls$v.

X_load_12 <- loading_plot(loadings = res_plsmfa$pls$u,
                          axis1 = 1, axis2 = 2,
                          col_points = X_var_color,
                          arrows = TRUE)

X_load_12$plot

Y_load_12 <- loading_plot(loadings = res_plsmfa$pls$v,
                          axis1 = 1, axis2 = 2,
                          col_points = Y_var_color,
                          arrows = TRUE)

Y_load_12$plot

Latent Variables

The observations of the analysis can be plotted on latent variables (akin to principal components in PCA). Observations can be colored by a design variable (i.e., a categorical grouping variable) and the means of the groups can be plotted. Bootstrap confidence intervals of the means give an idea of how stable the group means are on the latent variables.

lv1 <- latent_variable_XY_map(res_plsmfa,
                              lv_num = 1,
                              design = sim_data$obs_design,
                              col_obs = sim_data$obs_color,
                              col_group = sim_data$group_color)
lv1$plot

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
R		R
data-raw		data-raw
data		data
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
plsmfa.Rproj		plsmfa.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plsmfa

Installation

Example: Simulated Data

Run `plsmfa`

The cross-product matrix of the normalized data

Statistical inference

Some graphs

Scree

Loadings (a.k.a saliences)

Latent Variables

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

plsmfa

Installation

Example: Simulated Data

Run plsmfa

The cross-product matrix of the normalized data

Statistical inference

Some graphs

Scree

Loadings (a.k.a saliences)

Latent Variables

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Run `plsmfa`

Packages