Feature roadmap

As our work on this package progress, this issue can help us enumerate possible future features of the package depending on the time and interests of contributors.  Some features will be needed for the manuscript submission, and others will make more sense to consider for future releases.

## TWAS
### Individual
- [x] normal prior with SuSiE
- [x] `mr.ash`
- [x] elastic net with `glmnet`
- [x] LASSO with `glmnet`
- [x] Bayesian alphabet from `qbayes`
- [ ] Rcpp wrapper for Dirichlet process regression ([manuscript](https://www.nature.com/articles/s41467-017-00470-2), [GitHub](https://github.com/biostatpzeng/DPR))
- [ ] MCP and SCAD from `ncvreg`
- [ ] L0Learn from `L0Learn`
- [ ] BayesB and Bayesian Lasso from `BGLR`

### Summary
- [x] normal prior with SuSiE
- [x] `mr.ash`
- [x] Bayesian alphabet from `qbayes`
- [x] Rcpp reimplementation of existing summary-based PRS-cs (see links above)
- [x] Rcpp wrapper for summary-based Dirichlet process regression ([manuscript](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009697), [GitHub](https://github.com/eldronzhou/SDPR))
- [ ] `lassosum` for LASSO and elastic net ([manuscript](https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.22050), [GitHub](https://github.com/getian107/PRScs))

### Longer term
- Determine if the continuous shrinkage prior in PRS-cs ([manuscript](https://www.nature.com/articles/s41467-019-09718-5), [GitHub](https://github.com/getian107/PRScs)) can be extended from summary statistics to individual-level data, and implement in Rcpp.
- explore feasibility of `ncvreg`, `L0Learn`, and `BGLR` for summary data - might be a lot of work for little gain if `mr.ash` generalizes all of these
- Extension to genome-wide TWAS (this will be a *separate manuscript*) - see discussion about genome-wide extension for MR and polygenic risk scores.
- Extend `mr.ash` to work with other `ebnm` priors - `deconvolveR` is most interesting because it is a smooth approximation of NPMLE instead of a scale mixture of normals

## Mendelian randomization 
- Egger regression as an additional horizontal pleiotropy test to complement heterogeneity tests - only useful with enough independent instruments
- EDIT: verify that this is not already how we are doing MR ~~"Omnigenic model" that incorporates all variants as instruments (this will be a **separate manuscript**, possibly in combination with the trans-QTL extension) - inspired by [OMR] (https://academic.oup.com/bib/article/22/6/bbab322/6347949), and could exploit the fact that SuSiE gives us posterior effect sizes and standard errors, unlike most other fine mapping methods.~~
    - ~~How will this method handle weak instrument bias without removing variants - consider debiasing estimators like [dIVW](https://projecteuclid.org/journals/annals-of-statistics/volume-49/issue-4/Debiased-inverse-variance-weighted-estimator-in-two-sample-summary-data/10.1214/20-AOS2027.full) and [pIVW](https://onlinelibrary.wiley.com/doi/10.1111/biom.13732) - does OMR have this issue too?~~
    - ~~Should show that SuSiE does a comparable job in terms of adjusting for LD as LD scores (used by OMR and [MRAID](https://www.science.org/doi/10.1126/sciadv.abl5744), and the variant selection methods used by [MR.LDP](https://academic.oup.com/nargab/article/2/2/lqaa028/5828855), [MR-Corr2](https://academic.oup.com/bioinformatics/article/38/2/303/6367765) and [MR-CUE](https://www.nature.com/articles/s41467-022-34164-1).~~
    - ~~Are the heterogeneity tests and Egger regression still valid for testing for horizontal pleiotropy in the presence of so many weak instruments?~~
- Extension to genome-wide analysis with trans-QTLs (this will definitely be a **separate manuscript**):
    - Proper handling of correlated horizontal pleiotropy (CHP) is **critical**.  The most conservative existing approach is to just remove pleiotropic variants - other solutions are provided by [cause](https://www.nature.com/articles/s41588-020-0631-4), [MRAID](https://www.science.org/doi/10.1126/sciadv.abl5744), [MR-Corr2](https://academic.oup.com/bioinformatics/article/38/2/303/6367765), [MR-CUE](https://www.nature.com/articles/s41467-022-34164-1) and [MRcML](https://www.cell.com/ajhg/pdfExtended/S0002-9297(21)00219-6).  See also [this review](https://www.sciencedirect.com/science/article/pii/S2001037022001738), which does not include some of the more recent methods but does discuss CHP.
    - Can we estimate CHP in trans-QTLs by looking at the effect of the same variant across all tested molecular traits?
    - Existing methods for handling CHP do not seem to use empirical Bayes methods - can we use SuSiE to help us do this?

## Colocalization
- Other model of colocalization (this will definitely be a **separate manuscript**) - can we treat gene-level colocalization as a Kullbeik-Leibler divergence between two multivariate normal distributions?  Can we penalize this divergence for LD using the entropy of the distribution of the (top or all?) eigenvalues of the LD matrix?  How does this compare to correlating PIPs?

## Polygenic molecular risk scores (PMRS)
- SuSiE model is ready made for prediction of molecular trait - make it easy to do predictions from new genotype data
   - Genome-wide prediction will have the same concerns about CHP - this doesn't matter for predicting traits from PMRS, but does matter for model interpretation
   - Could also use SuSiE to predict traits from PMRS - similar idea to CTWAS, definitely a **separate manuscript** and probably a separate package, would want to extend to survival models.
 - `mr.ash` and other penalized regression methods can be used for prediction for genome-wide TWAS but not MR, because penalized regression doesn't produce valid standard errors
 
## Interfaces with other packages
- [x] `mvsusier`/`mvsusiF`
   - Straight forward for integrating with TWAS and MR - we use just the posterior effect size estimates as we normally do
   - Challenging for colocalization - `colocBoost` is the current solution, hopefully we can figure something out here later
- [x] `susiF`
- [ ] vignette for INTACT
- [ ] vignette for CTWAS - currently challenging to run CTWAS

## Other
- [x] Easy approach to adjust fine mapping to remove variants that were not tested in the GWAS but were tested in the QTL - this doesn't work for TWAS with penalized regression!
- [ ] Vignette on imputing GWAS summary statistics (and QTL summary statistics if not using individual level QTL data)   - this would ideally be tied to future efforts to improve this approach methodologically.
- [ ] Data package for LD blocks for GWAS fine mapping
    - What about windows for QTL summary stats?
    - Could pre-computed LD windows be stored on queryable server?  Alternatively, could download 1000 Genomes population as a reference, and compute LD matrix for user?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature roadmap #36

TWAS

Individual

Summary

Longer term

Mendelian randomization

Colocalization

Polygenic molecular risk scores (PMRS)

Interfaces with other packages

Other

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature roadmap #36

Description

TWAS

Individual

Summary

Longer term

Mendelian randomization

Colocalization

Polygenic molecular risk scores (PMRS)

Interfaces with other packages

Other

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions