Interpretable Machine Learning on Soybean Multi-Omics Data Reveals Drought-Driven Shifts of Plant–Microbe Interactions
This repository contains all scripts and data used for the analysis in “Interpretable Machine Learning on Soybean Multi-Omics Data Reveals Drought-Driven Shifts of Plant–Microbe Interactions.”
📄 Preprint available at: https://doi.org/10.1101/2025.08.13.670005
All scripts
| File | Description |
|---|---|
c0_RF-markers.Rmd |
Random Forest marker evaluation |
c1_RF_prediction.Rmd |
Random Forest prediction |
c2_BLUP_prediction.Rmd |
BLUP (Best Linear Unbiased Prediction) |
c3_GWAS.Rmd |
GWAS (Genome-Wide Association Study) analysis |
c4_SNP_Mapping.Rmd |
SNP selection by RF and GWAS |
c5_SHAP_single.Rmd |
SHAP analysis for single feature selection |
c6_SHAP_interaction.Rmd |
SHAP for interaction feature selection |
c7_SHAP_difference.Rmd |
SHAP difference matrix visualization |
c8_hyperparameter_xgboost.Rmd |
Hyperparameter Tuning for XGBoost |
c9_hyperparameter_ranger.Rmd |
Hyperparameter Tuning for ranger |
c10_rf_null.Rmd |
Null distribution of RF prediction accuracy |
Multi-omics data (Genome, Microbiome, Metabolome, Phenotype) ready for analysis.
| File | Description |
|---|---|
SoyData_Drought2.RDS |
Multi omics data - Drought |
SoyData_Control2.RDS |
Multi omics data - Control |
Genome marker data (with different LD thresholds)
| File | Description |
|---|---|
genoMarker_LD0.001_SNP3078.RDS |
SNP markers filtered at LD = 0.001 |
genoMarker_LD0.01_SNP10143.RDS |
SNP markers filtered at LD = 0.01 |
genoMarker_LD0.1_SNP16419.RDS |
SNP markers filtered at LD = 0.1 |
genoMarker_LD0.3_SNP34632.RDS |
SNP markers filtered at LD = 0.3 |
genoMarker_LD0.001_SNP3078.csv |
SNP markers filtered at LD = 0.001 |
genoMarker_LD0.01_SNP10143.csv |
SNP markers filtered at LD = 0.01 |
genoMarker_LD0.1_SNP16419.csv |
SNP markers filtered at LD = 0.1 |
genoMarker_LD0.3_SNP34632.csv |
SNP markers filtered at LD = 0.3 |
Multi-Omics data raw files. Data include both control and drought conditions.
| File | Description |
|---|---|
metabolome.csv |
Metabolome (265 features) |
microbiome.csv |
Microbiome (16457 features) |
phenotype.csv |
Phenotype (9 features) |
For microbiome and metabolome, also refer to Dang et al. (2025). "I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data" https://doi.org/10.1093/bib/bbaf132
For full SNPs data, also refer to Kanegae et al. (2021). "Whole-genome sequence diversity and association analysis of 198 soybean accessiens in mini-core collections" https://doi.org/10.1093/dnares/dsaa032
