Official implementation of "ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods" [NeurIPS 2025]
Paper link
conda env create -f environment.yml
conda activate prospero_env
pip install -e .
chmod +x bash/*
./bash/download_oracles.sh
To reproduce the results of ProSpero across all the protein landscapes (Section 5.1 & Section 5.3) run:
./bash/run_all_landscapes.sh <path_to_results_dir>
To reproduce the results of ProSpero under the noisy surrogate setting (Section 5.4) run:
./bash/run_all_noise_levels.sh <path_to_results_dir>
To run ProSpero on a single landscape run:
python ./src/prospero/runners/run_protein.py
with following command-line parameters available (but not limited to):
| Option | Type | Default | Description |
|---|---|---|---|
--task |
str | – | Choices: 8 protein fitness landscapes + 3 covariate shifts on UBE2I |
--results_dirpath |
str | – | Directory where results are saved |
--n_queries |
int | 128 |
Oracle per-round evaluation budget |
--n_iters |
int | 10 |
Number of active learning iterations |
--full_deterministic |
flag | false |
Enable deterministic behavior for reproducibility |
--batch_size |
int | 256 |
SMC batch size |
--alphabet |
str | CHARGE |
RAA used in the biologically-constrained SMC |
--kappa_scan |
float | 1.0 |
UCB exploitation-exploration hyperparam used in the targeted masking |
--kappa_guidance |
float | 0.1 |
UCB exploitation-exploration hyperparam used in the biologically-constrained SMC |
--n_checks_multiplier |
int | 16 |
Number of scans used in the targeted masking (in the paper denoted by "S") |
--min_corruptions |
int | 3 |
Minimum number of alanine substitutions in the targeted masking |
--max_corruptions |
int | 10 |
Maximum number of alanine substitutions in the targeted masking |
See ./src/prospero/inference.py for code annotated with references to the paper's algorithms
If you encounter a problem or have a question, please either open an issue in this repository or email us at michal.kmicikiewicz@helmholtz-munich.de
If you find this work useful, please cite:
@inproceedings{
kmicikiewicz2025prospero,
title={ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods},
author={Michal Kmicikiewicz and Vincent Fortuin and Ewa Szczurek},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=wSDE3karoF}
}