GLEANR: GWAS latent embeddings accounting for noise and regularization

GLEANER is a GWAS matrix factorization tool to estimate sparse latent pleiotropic genetic factors. Factors map traits to a distribution of SNP effects that may capture biological pathways or mechanisms shared by these traits. This repo contains the gleanr R package (in development), which we recommend using in conjunction with the gleanr_workflow repository. The bioRxiv preprint describing the gleanr method in detail is avaialable here:

Sparse matrix factorization robust to sample sharing across GWAS reveals interpretable genetic components.

Installing GLEANR

This can be done directly from github using the devtools package as follows:

devtools::install_github("aomdahl/gleanr")

GLEANR method:

This is an ongoing project to develop a flexible, interpretable, and sparse factorization framework to integrate GWAS data across studies and cohorts. We employ a basic alternating least-squares matrix factoriztion algorithm with sparse priors on learned matrices, while accounting for study uncertainty. Our approach was inspired by work from Yuan He here.

Running GLEANR

Development of tutorials/vignettes for gleanr are ongoing. For a basic interactive use case in R, see the vignette associated with this package. If you'd like to run gleanr directly from the command line (our recommended use), use the script src/gleanr_run.R available in the gleanr_workflow repository after installing this package to run analysis directly on input matrices of summary statistics.

GLEANR inputs:

To run GLEANR, a user must provide:

a matrix $B$ of $N$ SNPs by $M$ studies of GWAS effect sizes (e.g. $\beta$'s) (required)
- Each SNP and trait should have a label, as in the example file here
an $N \times M$ matrix of GWAS standard error estimates, with the same order as $B$ (required, example file here)
an $M \times M$ matrix of estimated correlation due to sample sharing ($C$); this may be estimated using LDSC and should have (optional, example file here)
an $N \times M$ matrix of esitmation error correlation due to sample sharing; this will be used to regularize $C$ (optional, example file here)
an $M \times 1$ list of trait names corresponding to $M$ (required). This can be used to specify cleaner names for columns in $B$. These should be unique.
an $M \times 1$ list of standard deviation estimates across trait Z-scores (optional; only provide if using XT- LDSC to estimate degree of sample sharing)

Development versions of gleanr (preceeding Nov 2024)

To review development versions of gleanr prior to the reorgnization of this github in Nov. 2024, please see the gleanr_source_backup directory in the gleanr_workflow repository.

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
R		R
inst/extdata		inst/extdata
man		man
tests		tests
vignettes		vignettes
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
Rplots.pdf		Rplots.pdf
gleanr.Rproj		gleanr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLEANR: GWAS latent embeddings accounting for noise and regularization

Installing GLEANR

GLEANR method:

Running GLEANR

GLEANR inputs:

Development versions of gleanr (preceeding Nov 2024)

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GLEANR: GWAS latent embeddings accounting for noise and regularization

Installing GLEANR

GLEANR method:

Running GLEANR

GLEANR inputs:

Development versions of gleanr (preceeding Nov 2024)

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages