AutoGS

Leveraging Automated Machine Learning for Environmental Dimension-Data Genetic Analysis and Genomic Prediction in Maize Hybrids

Automated Machine Learning for Environmental Data-Driven Genome Prediction.An automated machine learning framework integrating environmental and genomic data enhances genetic analysis and genomic prediction in maize. By leveraging dimension-reduced environmental parameters, it reveals trait-environment relationships and identifies genetic markers that govern phenotypic plasticity and genotype-by-environment interactions. The combined use of markers and environmental features improves genomic prediction accuracy, offering a scalable solution for developing climate-resilient maize varieties.

Related Software and Tools

DNNGP – Deep neural network for genomic prediction.
EXGEP – A framework for predicting genotype-by-environment interactions using ensem)bles of explainable machine-learning models.
GxEtoolkit – An automated and explainable machine learning framework for Genome Prediction.

🌐Operation systems

Windows
Linux

Getting started

Requirements

Python 3.9
pip

Installation

Install packages:

Create a python environment.

conda create -n autogs python=3.9
conda activate autogs

Clone this repository and cd into it.

git clone https://github.com/AIBreeding/AutoGS.git
cd ./AutoGS
pip install -r requirements.txt

Test the AutoGS.

python test_autogs.py

Basic Usage

import os
import sys
import pprint
import sklearn
import pandas as pd
from scipy.stats import pearsonr
from matplotlib import pyplot as plt
from sklearn.preprocessing import StandardScaler
from autogs.data.tools.reg_metrics import (mae_score as mae,
                             mse_score as mse,
                             rmse_score as rmse,
                             r2_score as r2,
                             rmsle_score as rmsle,
                             mape_score as mape,
                             medae_score as medae,
                             pcc_score as pcc)

from autogs.model import RegAutoGS
from autogs.data import datautils

# read data
phen_file_path = "./dataset/trainset/Pheno/"
env_file_path = "./dataset/trainset/Env/"
geno_file_path = "./dataset/trainset/Geno/YI_All.vcf"
ref_path = "./docs/maizeRef(ALL).csv"
file_names = ["DEH1_2020", "DEH1_2021", "IAH2_2021", "IAH3_2021", "IAH4_2021", "WIH2_2020", "WIH2_2021"]

com_phen_data, com_env_data, dynamic_window_avg, env_transformed_data, \
gendata, PGE = datautils.process_data(phen_file_path, env_file_path, geno_file_path, ref_path, file_names)

# Access to phenotype （Yield_Mg_ha） and feature data （Gneo and Env）
columns_to_extract = [0, 1, 8] # Get Columns Env, Hybrid, and Yield_Mg_ha

columns_from_11_to_end = list(range(11, PGE.shape[1])) 
columns_indices = columns_to_extract + columns_from_11_to_end
extracted_columns = PGE.iloc[:, columns_indices]
extracted_columns = pd.DataFrame(extracted_columns.dropna().reset_index(drop=True))

snp = pd.DataFrame(extracted_columns.iloc[:,3:])
scaler = StandardScaler()
scaled_snp = scaler.fit_transform(snp)
X = pd.DataFrame(scaled_snp,columns=snp.columns)
y = extracted_columns['Yield_Mg_ha']
y = pd.core.series.Series(y)

# train AutoGS model for reg prediction
reg = RegAutoGS(
    y=y,
    X=X, 
    test_size=0.2, 
    n_splits=5, 
    n_trial=5, 
    reload_study=True,
    reload_trial=True, 
    write_folder=os.getcwd()+'/results/', 
    metric_optimise=r2, 
    metric_assess=[mae, mse, rmse, pcc, r2, rmsle, mape, medae],
    optimization_objective='maximize', 
    models_optimize=['LightGBM','XGBoost','CatBoost','BayesianRidge'], 
    models_assess=['LightGBM','XGBoost','CatBoost','BayesianRidge'], 
    early_stopping_rounds=5, 
    random_state=2024
)
reg.train() # train model
reg.CalSHAP(n_train_points=200,n_test_points=200,cluster=False) # AutoGS SHAP interaction

Supported Evaluation Metrics

mae — Mean Absolute Error (MAE)
mse — Mean Squared Error (MSE)
rmse — Root Mean Squared Error (RMSE)
r2 — R² Score (Coefficient of Determination)
rmsle — Root Mean Squared Logarithmic Error (RMSLE)
mape — Mean Absolute Percentage Error (MAPE)
medae — Median Absolute Error (MEDAE)
pcc — Pearson Correlation Coefficient (PCC)

🔍 Supported Base Regression Models

The AutoGS framework currently supports 28 base regression models, including ensemble models, linear models, and other widely-used regressors. Each model can be optionally selected via the selected_regressors list.

🌲 Tree Models

DTR – Decision Tree Regressor
ETR – Extra Trees Regressor
LightGBM – Light Gradient Boosting Machine
XGBoost – Extreme Gradient Boosting
CatBoost – Categorical Boosting
AdaBoost – Adaptive Boosting
GBDT – Gradient Boosting Decision Tree
Bagging – Bagging Regressor
RF – Random Forest Regressor
HistGradientBoosting – Histogram-based Gradient Boosting

📈 Linear and Sparse Models

BayesianRidge – Bayesian Ridge Regression
LassoLARS – Lasso Least Angle Regression
ElasticNet – Elastic Net Regression
SGD – Stochastic Gradient Descent Regressor
Linear – Ordinary Least Squares Linear Regression
Lasso – Lasso Regression
Ridge – Ridge Regression
OMP – Orthogonal Matching Pursuit
ARD – Automatic Relevance Determination Regression
PAR – Passive Aggressive Regressor
TheilSen – Theil-Sen Estimator
Huber – Huber Regressor
KernelRidge – Kernel Ridge Regression
RANSAC – RANSAC (RANdom SAmple Consensus) Regressor

🔍 Other Models

KNN – k-Nearest Neighbors Regressor
SVR – Support Vector Regressor
Dummy – Dummy Regressor (Baseline)
MLP – Multi-Layer Perceptron (Deep Neural Network)

🍀For more detailed analysis steps see ./test/test_autogs.ipynb

⚠️When using Jupyter Notebook/Lab, please ensure your Python environment is properly configured and added to Jupyter Notebook/Lab.

👉Add the created autogs environment to Jupyter Notebook/Lab:

conda create -n autogs python=3.9
conda activate autogs
conda install ipykernel
python -m ipykernel install --user --name autogs --display-name "autogs"

📚 Citation

You can read our paper explaining AutoGS here.

He K, Yu T, Gao S, et al. Leveraging Automated Machine Learning for Environmental Data-Driven Genetic Analysis and Genomic Prediction in Maize Hybrids. Adv Sci (Weinh), e2412423, 2025. https://doi.org/10.1002/advs.202412423

📜Copyright and License

This project is free to use for non-commercial purposes - see the LICENSE file for details.

👥Contacts

For more information, please contact with Huihui Li (lihuihui@caas.cn).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoGS

Leveraging Automated Machine Learning for Environmental Dimension-Data Genetic Analysis and Genomic Prediction in Maize Hybrids

Related Software and Tools

🌐Operation systems

🏁Table of Contents

Getting started

Requirements

Installation

Basic Usage

Supported Evaluation Metrics

🔍 Supported Base Regression Models

🌲 Tree Models

📈 Linear and Sparse Models

🔍 Other Models

🍀For more detailed analysis steps see ./test/test_autogs.ipynb

⚠️When using Jupyter Notebook/Lab, please ensure your Python environment is properly configured and added to Jupyter Notebook/Lab.

👉Add the created autogs environment to Jupyter Notebook/Lab:

📚 Citation

📜Copyright and License

👥Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
autogs		autogs
test		test
AutoGS.png		AutoGS.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_autogs.ipynb		test_autogs.ipynb
test_autogs.py		test_autogs.py

License

AIBreeding/AutoGS

Folders and files

Latest commit

History

Repository files navigation

AutoGS

Leveraging Automated Machine Learning for Environmental Dimension-Data Genetic Analysis and Genomic Prediction in Maize Hybrids

Related Software and Tools

🌐Operation systems

🏁Table of Contents

Getting started

Requirements

Installation

Basic Usage

Supported Evaluation Metrics

🔍 Supported Base Regression Models

🌲 Tree Models

📈 Linear and Sparse Models

🔍 Other Models

🍀For more detailed analysis steps see ./test/test_autogs.ipynb

⚠️When using Jupyter Notebook/Lab, please ensure your Python environment is properly configured and added to Jupyter Notebook/Lab.

👉Add the created autogs environment to Jupyter Notebook/Lab:

📚 Citation

📜Copyright and License

👥Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages