Data and code for the manuscript “Identity‑by‑descent captures shared environmental factors at biobank scale.”
Pre‑print: https://doi.org/10.1101/2025.05.03.652048
This repository is part of the Biorepository, Informatics & Genomics (BIG) initiative at the University of Tennessee Health Science Center.
Project home: https://uthsc.edu/cbmi/big/
Data resource publication: https://www.nature.com/articles/s41467-025-59375-0
Live dashboard: https://francomarsico.shinyapps.io/BIG_Communities/
- Identity‑by‑descent (IBD) clustering at biobank scale, summarising four continent‑level communities and seventeen sub‑communities.
- Aggregated environmental indicators (for example PM2.5) aligned to the same ZIP code geography used for the communities.
- Interactive Shiny application to compare communities, exposures, and health outcomes side by side.
app.r– Shiny dashboard that ties together the data layers and visualisations.data/– Aggregated data products used by the dashboard (no individual‑level records).rsconnect/– Deployment metadata for shinyapps.io.
- Install R (≥ 4.3) and RStudio or your preferred IDE.
- Install the required packages:
install.packages(c(
"shiny", "shinyWidgets", "shinydashboard", "shinydashboardPlus",
"leaflet", "leaflet.extras", "sf", "geojsonio", "dplyr", "tidyr",
"readr", "DT", "ggplot2", "viridisLite", "plotly", "tibble",
"forcats", "stringr", "RColorBrewer"
))- Launch the dashboard from the repository root:
shiny::runApp(".")The app reads the pre‑computed objects in data/ by default. No credentials are required.
The joint analysis of genetic (G) and environmental factors (E) and their impact on health outcomes (H) enables probabilistic modeling of causal pathways. The diagram illustrates a gene-environment mediation framework where genetic factors G influence health outcome H through direct effects and indirect effects mediated by environmental factors E, which comprise both measured (Em) and unmeasured (Eu) components. The indirect pathway reflects the phenomenon whereby individuals sharing greater identity-by-descent (IBD) segments also tend to share environmental exposures due to geographic and social proximity. The direct pathway can indicate genetic variants involved or social/cultural traits. This framework facilitates the decomposition of genetic relatedness into direct and environmentally-mediated components, enabling identification of potentially modifiable pathways in the genotype-phenotype relationship.
- All data are aggregated to protect participant privacy; counts are filtered so that ZIP codes with fewer than 100 individuals are removed from the map.
- Environmental layers (for example PM2.5) come from public sources referenced in the manuscript; see the pre‑print for complete methodology.
- If you need to regenerate any tables, follow the scripts bundled in
data/or reach out via the contact below.
If you use this code or dashboard, please cite the pre‑print above and acknowledge the BIG Initiative.
Questions or feedback: fmarsico@uthsc.edu