GitHub - AIBreeding/EXGEP: EXGEP: a framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models

EXGEP

A framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models

Related Software and Tools

DNNGP – Deep neural network for genomic prediction.
AutoGS – A framework for predicting genotype-by-environment interactions using ensem)bles of explainable machine-learning models.
GxEtoolkit – An automated and explainable machine learning framework for Genome Prediction.

Getting started

Requirements

Python 3.9
pip

Installation

Install packages:

Create a python environment.

conda create -n exgep python=3.9
conda activate exgep

Clone this repository and cd into it.

git clone https://github.com/AIBreeding/EXGEP.git
cd ./exgep
pip install -r requirements.txt

Base Usage

import os
import time
import argparse
import pandas as pd
from datetime import datetime
from exgep.data import datautils
from exgep.model import RegEXGEP
from exgep.data.reg_metrics import (mae_score as mae, 
                           mse_score as mse, 
                           rmse_score as rmse, 
                           r2_score as r2,
                           rmsle_score as rmsle, 
                           mape_score as mape, 
                           medae_score as medae, 
                           pcc_score as pcc)


geno = './data/genotype.csv'
phen = './data/pheno.csv'
soil = './data/soil.csv'
weather = './data/weather.csv'

data = datautils.merge_data(geno, phen, soil, weather)
X = pd.DataFrame(data.iloc[:, 3:])
y = data['Yield']
y = pd.core.series.Series(y)
    
regression = RegEXGEP(
    y=y,
    X=X,
    test_frac=0.1,
    n_splits=10,
    n_trial=5,
    reload_study=True,
    reload_trial_cap=True,
    write_folder=os.getcwd()+'/result/',
    metric_optimise=r2,
    metric_assess=[mae, mse, rmse, pcc, rmsle, mape, medae],
    optimisation_direction='maximize',
    models_to_optimize=['LightGBM'],
    models_to_assess=['LightGBM'],
    boosted_early_stopping_rounds=5,
    random_state=2024
)

start = time.time()
regression.train()
end = time.time()
print(end - start)

🌲 Optional Tree Models

DTR – Decision Tree Regressor
ETR – Extra Trees Regressor
LightGBM – Light Gradient Boosting Machine
XGBoost – Extreme Gradient Boosting
CatBoost – Categorical Boosting
AdaBoost – Adaptive Boosting
GBDT – Gradient Boosting Decision Tree
Bagging – Bagging Regressor
RF – Random Forest Regressor
HistGradientBoosting – Histogram-based Gradient Boosting

Training Example （Parameter Configuration）

python train_exgep.py \
--geno ./data/geno.csv \
--phen ./data/pheno.csv \
--soil ./data/soil.csv \
--weather ./data/weather.csv \
--target Yield \
--test_size 0.1 \
--n_splits 10 \
--n_trial 5 \
--models_optimize XGBoost LightGBM \
--models_assess XGBoost LightGBM

Training Parameters Details

Parameter	Description	Default	Type
`--geno`	Path to genotype CSV file	`./data/genotype.csv`	string
`--phen`	Path to phenotype CSV file	`./data/pheno.csv`	string
`--soil`	Path to soil CSV file	`./data/soil.csv`	string
`--weather`	Path to weather CSV file	`./data/weather.csv`	string
`--target`	Target column name in the dataset	`Yield`	string
`--test_size`	Fraction of data to be used for testing	`0.1`	float
`--n_splits`	Number of splits for cross-validation	`10`	integer
`--n_trial`	Number of optimization trials	`5`	integer
`--models_optimize`	Models to use for hyperparameter optimization	`['XGBoost']`	list
`--models_assess`	Models to use for performance assessment	`['XGBoost']`	list

Model Explainable Example

python test_explain.py \
--geno ./data/genotype.csv \
--phen ./data/pheno.csv \
--soil ./data/soil.csv \
--weather ./data/weather.csv \
--target Yield \
--model EXGEP \
--job_id 20240813103950 \
--sample 2 \
--feature_i pc2 \
--feature_j RH2M \
--top_features 10 \
--top_interactions 20 \
--test_size 0.1 \
--random_state 2024

Model Explanation Parameters Details

Parameter	Description	Default	Type
`--model`	Model to explain	-	string
`--job_id`	Job directory containing trained models	-	string
`--geno`	Path to genotype CSV file	`./data/genotype.csv`	string
`--phen`	Path to phenotype CSV file	`./data/pheno.csv`	string
`--soil`	Path to soil CSV file	`./data/soil.csv`	string
`--weather`	Path to weather CSV file	`./data/weather.csv`	string
`--target`	Target column name	`Yield`	string
`--test_size`	Test set fraction	`0.2`	float
`--random_state`	Random seed for reproducibility	`2024`	integer
`--sample`	Sample index for waterfall plot	`2`	integer
`--feature_i`	Primary feature for dependence plots	`pc2`	string
`--feature_j`	Secondary feature for interaction plots	`RH2M`	string
`--top_features`	Number of top features to display	`20`	integer
`--top_interactions`	Number of top interactions for network	`20`	integer
`--cluster`	Use KMeans clustering for background data	`False`	flag
`--n_train_points`	Training points for background	`200`	integer
`--n_test_points`	Test samples to explain	`None`	integer

📚 Citation

You can read our paper explaining EXGEP.

Yu T, Zhang H, Chen S, et al. EXGEP: a framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models. Brief Bioinform, 2025. https://doi.org/10.1093/bib/bbaf414

📜Copyright and License

This project is free to use for non-commercial purposes - see the LICENSE file for details.

👥Contacts

For more information, please contact with Huihui Li (lihuihui@caas.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
exgep		exgep
EXGEP.png		EXGEP.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_exgep.py		test_exgep.py
test_explain.py		test_explain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EXGEP

A framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models

Related Software and Tools

Table of Contents

Getting started

Requirements

Installation

Base Usage

🌲 Optional Tree Models

Training Example （Parameter Configuration）

Training Parameters Details

Model Explainable Example

Model Explanation Parameters Details

📚 Citation

📜Copyright and License

👥Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AIBreeding/EXGEP

Folders and files

Latest commit

History

Repository files navigation

EXGEP

A framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models

Related Software and Tools

Table of Contents

Getting started

Requirements

Installation

Base Usage

🌲 Optional Tree Models

Training Example （Parameter Configuration）

Training Parameters Details

Model Explainable Example

Model Explanation Parameters Details

📚 Citation

📜Copyright and License

👥Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages