Explainable and Evolutionary AutoML for Biochemical Property Prediction

By Alex G. C. de Sá, Gisele L. Pappa, Alex A. Freitas and David B. Ascher

AutoML for Biochemical Property Prediction

This repository contains the code and resources for Evolutionary AutoML for Biochemical Property Prediction. It focuses on interpreting the machine learning pipelines generated by the AutoML method for biochemical property prediction, particularly for ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties.

📦 Project Structure

├── datasets/             # Raw and processed datasets for biochemical property prediction
├── grammar/              # The context-free grammar (CFG) describing the AutoML search space in a Backus–Naur form (.bnf) file.
├── automl_biochem.py     # The Python code implementing the evolutionary AutoML method (i.e., Bayesian Optimisation Algorithm) for biochemical property prediction
├── requirements.yml      # Conda environment specification
└── README.md             # Project documentation

🛠️ Environment Setup

To set up the project environment using Conda, follow the steps below:

1. Create the conda environment

Make sure you have Anaconda or Miniconda installed.

Then run:

conda env create -f requirements.yml

2. Activate the environment

conda activate automl_biochem

3. Deactivate the environment (when you're done)

conda deactivate

📖 How to use the proposed AutoML method?

After activating automl_biochem environment, run:

python automl_biochem.py training_file.csv testing_file.csv grammar output_directory

E.g., using:

"datasets/01_caco2_train.csv" as the training file.csv
"datasets/01_caco2_blindtest.csv" as the testing file.csv
"grammar/automl.bnf" as the grammar defining the AutoML search space
"." as the output directory

python automl_biochem.py datasets/01_caco2_train.csv datasets/01_caco2_blindtest.csv grammar/automl.bnf .

Other parameters are available, including:

"-s": define the seed and control the method's pseudorandom variables. Default: 1.
"-m": define the optimisation metric to be used. Options: "auc", "mcc", "recall", "precision", "auprc", "accuracy". Default: "auc".
"-e": define the experiment name. Default: "Exp_ADMET".
"-t": define the time budget (in minutes) to run the method. Default: 5 (min).
"-n": define the number of cores to run the evolutionary AutoML method. Default: 20.
"-ta": define the time budget (in minutes) to run each individual (i.e., each ML pipeline). Default: 1 (min).
"-p": define the population size. Default: 100.
"-mr": define the mutation rate. Default: 0.15.
"-cr": define the crossover rate. Default: 0.80.
"-cmr": define the rate on applying crossover followed by mutation. Default: 0.05.
"-es": define the elitism size. Default: 1.

To run with all options set, you would need to:

python automl_biochem.py datasets/01_caco2_train.csv datasets/01_caco2_blindtest.csv grammar/automl.bnf . -s 1 -m auc -e Exp_ADMET -t 5 -n 20 -ta 1 -p 100 -mr 0.15 -cr 0.80 -cmr 0.05 -es 1

📚 Publication

This work is associated with a paper accepted for the workshop Evolutionary Computing and Explainable Artificial Intelligence at the GECCO conference.

de Sá, A. G. C., Pappa, G. L., Freitas, A. A., & Ascher, D. B. (2025). Interpreting machine learning pipelines produced by evolutionary AutoML for biochemical property prediction. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ’25 Companion) (pp. 1–9). ACM. https://doi.org/10.1145/3712255.3734339

📬 Contact

For questions or contributions, please open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explainable and Evolutionary AutoML for Biochemical Property Prediction

By Alex G. C. de Sá, Gisele L. Pappa, Alex A. Freitas and David B. Ascher

AutoML for Biochemical Property Prediction

📦 Project Structure

🛠️ Environment Setup

1. Create the conda environment

2. Activate the environment

3. Deactivate the environment (when you're done)

📖 How to use the proposed AutoML method?

📚 Publication

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
datasets		datasets
grammar		grammar
src		src
README.md		README.md
automl_biochem.py		automl_biochem.py
requirements.yml		requirements.yml

alexgcsa/auto-admet

Folders and files

Latest commit

History

Repository files navigation

Explainable and Evolutionary AutoML for Biochemical Property Prediction

By Alex G. C. de Sá, Gisele L. Pappa, Alex A. Freitas and David B. Ascher

AutoML for Biochemical Property Prediction

📦 Project Structure

🛠️ Environment Setup

1. Create the conda environment

2. Activate the environment

3. Deactivate the environment (when you're done)

📖 How to use the proposed AutoML method?

📚 Publication

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages