Skip to content

Yoska393/ShapPlantMicro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Interpretable Machine Learning on Soybean Multi-Omics Data Reveals Drought-Driven Shifts of Plant–Microbe Interactions

This repository contains all scripts and data used for the analysis in “Interpretable Machine Learning on Soybean Multi-Omics Data Reveals Drought-Driven Shifts of Plant–Microbe Interactions.”

📄 Preprint available at: https://doi.org/10.1101/2025.08.13.670005



📂 script_ShapPlantMicro

All scripts

File Description
c0_RF-markers.Rmd Random Forest marker evaluation
c1_RF_prediction.Rmd Random Forest prediction
c2_BLUP_prediction.Rmd BLUP (Best Linear Unbiased Prediction)
c3_GWAS.Rmd GWAS (Genome-Wide Association Study) analysis
c4_SNP_Mapping.Rmd SNP selection by RF and GWAS
c5_SHAP_single.Rmd SHAP analysis for single feature selection
c6_SHAP_interaction.Rmd SHAP for interaction feature selection
c7_SHAP_difference.Rmd SHAP difference matrix visualization
c8_hyperparameter_xgboost.Rmd Hyperparameter Tuning for XGBoost
c9_hyperparameter_ranger.Rmd Hyperparameter Tuning for ranger
c10_rf_null.Rmd Null distribution of RF prediction accuracy

📂 data

Multi-omics data (Genome, Microbiome, Metabolome, Phenotype) ready for analysis.

File Description
SoyData_Drought2.RDS Multi omics data - Drought
SoyData_Control2.RDS Multi omics data - Control

📂 data/genome

Genome marker data (with different LD thresholds)

File Description
genoMarker_LD0.001_SNP3078.RDS SNP markers filtered at LD = 0.001
genoMarker_LD0.01_SNP10143.RDS SNP markers filtered at LD = 0.01
genoMarker_LD0.1_SNP16419.RDS SNP markers filtered at LD = 0.1
genoMarker_LD0.3_SNP34632.RDS SNP markers filtered at LD = 0.3
genoMarker_LD0.001_SNP3078.csv SNP markers filtered at LD = 0.001
genoMarker_LD0.01_SNP10143.csv SNP markers filtered at LD = 0.01
genoMarker_LD0.1_SNP16419.csv SNP markers filtered at LD = 0.1
genoMarker_LD0.3_SNP34632.csv SNP markers filtered at LD = 0.3

📂 data/raw

Multi-Omics data raw files. Data include both control and drought conditions.

File Description
metabolome.csv Metabolome (265 features)
microbiome.csv Microbiome (16457 features)
phenotype.csv Phenotype (9 features)

For microbiome and metabolome, also refer to Dang et al. (2025). "I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data" https://doi.org/10.1093/bib/bbaf132

For full SNPs data, also refer to Kanegae et al. (2021). "Whole-genome sequence diversity and association analysis of 198 soybean accessiens in mini-core collections" https://doi.org/10.1093/dnares/dsaa032

About

Shapley Additive exPlanations (SHAP) for plant-microbe interactions

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors