Skip to content

Marie-PerrotDockes/Fus2mod

Repository files navigation

Install and load the package

devtools::install_github("Marie-PerrotDockes/Fus2mod")
## Downloading GitHub repo Marie-PerrotDockes/Fus2mod@master
## from URL https://api.github.com/repos/Marie-PerrotDockes/Fus2mod/zipball/master

## Installing Fus2mod

## '/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore  \
##   --quiet CMD INSTALL  \
##   '/tmp/RtmpAzRAn9/devtools12aa5107a63f/Marie-PerrotDockes-Fus2mod-0b60b6f'  \
##   --library='/home/perrot-dockes/R/x86_64-pc-linux-gnu-library/3.4'  \
##   --install-tests

## 
require(Fus2mod)
## Loading required package: Fus2mod

## Loading required package: Matrix

## Loading required package: glmnet

## Loading required package: foreach

## Loaded glmnet 2.0-16

## Loading required package: parallel

## Loading required package: tidyverse

## ── Attaching packages ────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ purrr::accumulate() masks foreach::accumulate()
## ✖ tidyr::expand()     masks Matrix::expand()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ purrr::when()       masks foreach::when()

## Loading required package: stabs

Dataset Simulation

We start by generate a toy exemple, where we have two modalities H and I, 10 regressors and 30 individuals. For now we only have one response. We generate our dataset such that the 3 first regressors have no effect on the response , then the regressors 4, 5 and 6 have an effect on the response onl y on the sample that comes from the modality I and the regressors 7 and 8 have an effect on the response for the modality N but not for the modlity I.

n <- 40
p <- 10
K <- 2
q <- 1


# B <- Simul_B(p, q, s, f, k)
B <- c(0, 0,
       0, 0,
       0, 0,
       0, 0,
       0, 3,
       0, 2.5,
       0, 2,
       2.2, 0,
       1.8, 0,
       2, 2,
       2, 2
       )

group <- c(rep("H", n / 2), rep("I", n / 2))
K <- nlevels(as.factor(group))
regressors     <- matrix(rnorm(n * p), ncol = (length(B) / K - 1))

X             <- model.matrix(~group + group:regressors - 1)
y <- as.matrix(X %*% B + matrix(rnorm(n * q ), ncol = q))

y <- scale(y)

Model Selection

Here we foccus on the model Y = X**B + E, where Y, B and E are vectors and X is a one-way ANCOVA design matrix. We have two objectifs : we want to find which collumns of X can explain the response y in the first hand. In the second hand we want to observe if the differents modalities influence the values of the coefficient of the regressor on the response, in other terms if the collumn group1:regressorsi will have the same coefficient than the collum group2:regressorsi. To do that we propose to apply a lasso criterion to the model : Y = X2B + E where X2 is the concatenation of X and the matrix of the regressors. To be abble to arrange the level of fusion we propose to put weight on the penalties a weight b if the coefficients is the coefficient of a couple regressors modalities and a weight a if the coefficients is the coefficients of a whole regressors. Like 2p**b + p**a must be equal to 3p + 2 fixing a will give us a value of b. Thus, the more a is small the more the model will encourage the coefficient of two different modalities for the same regressors to be the same. For more detail about this model we confer the reader to the file "2_modalities.pdf".

We first propose a Cross-validation step. For different values of a we will apply a 5-fold CV on our data and keep the minimal error (cvm)and the degree of freedom (ddl) that match this minimal error. We will do that nre**p = 10 times for each value of a the mean on this 10 replicats are display bellow.

 CV <- cv.fl2(response = y, regressors = regressors, group = group, mina = 0.1, nfold = 5, nrep= 10,
                   nb.cores = 3, plot = TRUE)

By watching this plot we propose to take the a that minimise the degree of freedom because it is the lower (or really close to ) the lower error of prediction.

ddl <- CV[CV$Criterion =="ddl", ]
a <- ddl[which.min(ddl$mean), "a"]
a
## # A tibble: 1 x 1
## # Groups:   a [1]
##       a
##   <dbl>
## 1  1.20

If we want we can add a stability selection step to keep the more stable variable. We will perform the Fusion before the stability selection.

stab <- stab.fl2_fixa(response = y, regressors, group, a,
                   lambda = NULL,  nrep= 1000,
                   nb.cores = 3, plot = TRUE)
stab
##  Stability Selection with unimodality assumption
## 
## Selected variables:
##             groupH             groupI groupI:regressors4 
##                  1                  2                  7 
## groupH:regressors8        regressors9 
##                 13                 20 
## 
## Selection probabilities:
##  groupI:regressors2  groupH:regressors5  groupH:regressors2 
##              0.1840              0.1930              0.2880 
##  groupI:regressors3  groupH:regressors3  groupI:regressors7 
##              0.3830              0.4060              0.4145 
##  groupI:regressors8         regressors1  groupH:regressors9 
##              0.4350              0.4945              0.4975 
## groupI:regressors10  groupI:regressors5  groupI:regressors6 
##              0.5425              0.6145              0.6930 
##         regressors6         regressors4        regressors10 
##              0.7060              0.7290              0.8970 
##  groupH:regressors7  groupH:regressors8         regressors9 
##              0.9270              0.9805              0.9810 
##  groupI:regressors4              groupH              groupI 
##              0.9875              1.0000              1.0000 
## 
## ---
## Cutoff: 0.98; q: 16; PFER (*):  0.999 
##    (*) or expected number of low selection probability variables
## PFER (specified upper bound):  1 
## PFER corresponds to signif. level 0.0476 (without multiplicity adjustment)
par(mf.row=c(1,2))
## Warning in par(mf.row = c(1, 2)): "mf.row" n'est pas un paramètre graphique
plot(stab, type="maxsel", main="Maximum selection frequencies")

plot(stab, type="path", main="Stability paths")

sel <-names(stabsel(stab, cutoff=0.8)$selected)[-c(1,2)]
sel 
## [1] "groupI:regressors4" "groupH:regressors7" "groupH:regressors8"
## [4] "regressors9"        "regressors10"
names(B) <- colnames(X)

B_long <- c(0, 0,
       0, 0,
       0, 0,
       0, 0,
       0, 1.4,
       0, 1.5,
       0, 2,
       1.2, 0,
       1.8, 0,
       0, 0,
       0, 0, 
       rep(0 ,8),1,1
       )
colnames(regressors) <- paste0('regressors', 1:p)
names(B_long) <- c(colnames(X), colnames(regressors))
names(B_long[B_long != 0])
## [1] "groupI:regressors4" "groupI:regressors5" "groupI:regressors6"
## [4] "groupH:regressors7" "groupH:regressors8" "regressors9"       
## [7] "regressors10"

If we take a threshold of 0.8, we have a True Positive Rate equal to 0.71 and a False Positive Rate equal to 0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages