| layout | title | permalink |
|---|---|---|
default |
Thomas H. Keefe | Research |
research |
This work develops the singular value composition in the Bayesian modeling language Stan. The Bayesian approach allows us to model the uncertainty in singular values and singular vectors from an observed dataset. Because the matrices of left and right singular vectors are orthonormal, this model requires sampling from the Stiefel manifold of orthonormal matrices. It is a "full Bayes" approach in that every parameter in the model has a prior; none are plug-in estimates. Poster presented at Bayes, Fiducial, and Frequentist Conference in May 2023; a manuscript is under preparation.
Clustering is a large topic within machine learning, with many algorithms available and applications in diverse scientific areas. However, there are few methods for statistically determining if the clusters are "really there" or if they represent a noise aspect in the sample at hand. The SigClust methodology is one approach that has been useful in validating clusters, but it fails in the important case when the clusters of interest are of very unbalanced sizes, such as in the case of rare subtypes of disease. We develop a method that is statistically powerful in both the balanced and unbalanced regimes, using a novel measure of cluster quality that accounts for cluster size. The preprint is available here.
Using a novel dataset of knee cartilage thickness maps estimated from MRI, we use the data-integration methodology Angle-based Joint and Individual Variation Explained (AJIVE) to find modes of variation expressed simultaneously across femoral cartilage, tibial cartilage, and clinical/demographic features. We found three significant and interpretable modes of variation: the first corresponding to overall size; the second to extent of arthritic cartilage thinning in weight bearing regions of the knee; and the third to medial/lateral predominance of cartilage thinning. Our manuscript was published in Osteoarthritis and Cartilage Open in February 2023, and I presented a poster at OARSI Connect 2021 (online).
This work uses biclustering, a family of methods that cluster both the rows and columns of a dataset, to identify candidate phenotypes of knee osteoarthritis from a large dataset of clinical and demographic measurements from the Osteoarthritis Initiative. Our manuscript was published in PLOS ONE in May 2022.