The R package
rrdaprovides functions for performing ridge redundancy analysis (rrda) for high-dimensional datasets. It is useful for modeling the relationship between a matrix of response variables (Y; n × q ) and a matrix of explanatory variables (X; n × p ) with ridge penalty and rank restraint. The method is designed to handle high-dimensional data, allowing efficient computation and storage optimization.
Hello / Bonjour / Konichiwa
- rrda/script_rrda: Scripts and my own functions used in our article.
- rrda/RDAdata: Application data used in our article.
- rrda/src: Source code for the package "rrda"
The link to our article -> https://doi.org/10.1101/2025.04.16.649138
You can install the package from CRAN.
install.packages("rrda")
# rrda is updated if the version is old
required_version <- "0.2.3"
if (!requireNamespace("rrda", quietly = TRUE) ||
packageVersion("rrda") < required_version) {
message("rrda will be updated")
install.packages("rrda", repos = "https://cloud.r-project.org", type = "source")
}
rdasim1 function generates rank-restricted matrices X and Y.
library(rrda)
set.seed(10)
simdata<-rdasim1(n = 50,p = 100,q = 100,k = 5)
X <- simdata$X
Y <- simdata$Y
rrda.fit function solves the rrda (ridge redundancy) for X and Y.
This is equivalent to the prediction from X to Y, where Y = XB + E.
nrank indicates the rank restrictions for the model. Here, it is the value of 1 to 5.
lambda indicates the ridge penalty for the model. Here, it is the value of 0.1, 1, 10.
The model solves several ranks and lambdas efficiently. In the default setting, the model returns all the combinations of 15 ranks and 50 lambda grid.
Bhat <- rrda.fit(Y = Y, X = X)
names(Bhat)
When you see the Bhat, you will see the list composed of each lambda. In each lambda value, you have the coefficient B according to each rank.
(Note! The Bhat is stored in a decomposed form. This is because the function is designed for high-dimensional settings.)
Here we illustrate the parameter tuning process (regularization path), which helps identify the optimal parameter for maximizing prediction accuracy from X to Y.
How do we know the best lambda and rank for the model??
-> Cross-validation by rrda.cv function
cv_result<- rrda.cv(Y = Y, X = X)
rrda.summary(cv_result = cv_result)
p <- rrda.plot(cv_result) # cv result plot
print(p)
rrda.summary tells you the parameters suggested via CV.
=== opt_min ===
MSE:
[1] 3.179695
rank:
[1] 5
lambda:
[1] 22.43
Also, rrda.plot and rrda.heatmap show you the figures to select the parameters.
# Choose the best parameter sets which gives the minimum MSE
best_lambda<-cv_result$opt_min$lambda # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter
# Fitting with the best parameters
Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank,lambda = best_lambda)
# Prediction
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X)
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values
plot(Yhat, Y)
abline(0, 1, col = "red")
If you want to plot X and Y matrix in two-dimensional space (like classic RDA approach) :
ud<-Bhat$Bhat_comp[[1]][[1]] # SVD component of B (UD) for lambda=0.1
v <-Bhat$Bhat_comp[[1]][[2]] # SVD component of B (V). for lambda=0.1
ud12 <- ud[, 1:2]
v12 <- v[, 1:2]
# Base plot: ud (e.g., site scores)
plot(v12,
xlab = "RRDA1", ylab = "RRDA2",
xlim = range(c(ud12[,1], v12[,1])) * 1.1,
ylim = range(c(ud12[,2], v12[,2])) * 1.1,
pch = 19, col = "darkgreen",
main = "RRDA")
# Add v (e.g., species scores) as arrows from origin
arrows(0, 0, ud12[,1], ud12[,2], col = "blue3", length = 0.1)
# Optionally add text labels
text(ud12, labels = paste0("X", 1:nrow(ud12)), pos = 3, col = "blue3", cex = 0.6)
text(v12, labels = paste0("Y", 1:nrow(v12)), pos = 3, col = "darkgreen", cex = 0.6)
For better interpretations, we visualize the feature–feature matrix using a selected dimensionality, highlighting the most informative features based on L2 norm.
best_lambda<-cv_result$opt_min$lambda
best_rank<-cv_result$opt_min$rank
rrda.top(Y=Y,X=X,nrank=best_rank,lambda=best_lambda,mx=20,my=20)
Go to Rpubs (https://rpubs.com/Yoska393/1351133).
The application data of breast cancer and soybean are stored as .rds file in a folder (rrda/RDAdata). For methylation data, you can refer to the R package (MEAL, Ruiz-Arenas and González 2024)
Please cite :)
-
Yoshioka, H., Aubert, J., Iwata, H., and Mary-Huard, T., 2025. Ridge Redundancy Analysis for High-Dimensional Omics Data. bioRxiv, doi: 10.1101/2025.04.16.649138
-
Yoshioka H, Aubert J, and Mary-Huard T (2025). rrda: Ridge Redundancy Analysis for High-Dimensional Omics Data. https://CRAN.R-project.org/package=rrda (CRAN R Package)




