Structure prediction and design of proteins with noncanonical amino acids.
RareFold predicts single-chain protein structures containing rare noncanonical amino acids and enables the design of novel peptide binders through the EvoBindRare framework.
RareFold supports 49 different amino acid types.
The 20 regular ones, and 29 rare ones:
MSE, TPO, MLY, CME, PTR, SEP,SAH, CSO, PCA, KCX, CAS, CSD, MLZ, OCS, ALY, CSS, CSX, HIC, HYP, YCM, YOF, M3L, PFF, CGU,FTR, LLP, CAF, CMH, MHO
EvoBindRare designs both linear and cyclic binders directly from a protein target sequence, no prior knowledge of binding sites is required. The framework enables rapid and flexible design with expanded chemical diversity through the incorporation of noncanonical amino acids. EvoBindRare has been experimentally validated for both linear and cyclic designs, achieving high-affinity binding in each case.
- RareFold
- LICENSE
- Colab
- Installation
- Predict using RareFold
- Design using EvoBindRare
- Citation
- Data
- The EvoBind ecosystem
RareFold is available under the Apache License, Version 2.0.
The RareFold parameters for prediction are made available under the terms of the CC BY 4.0 license.
The design protocol EvoBindRare and the parameters for design are made available under the terms of the CC BY-NC 4.0 license.
You may not use these files except in compliance with the licenses.
It is possible to run EvoBindRare online in the Google colab here
The entire installation takes <1 hour on a standard computer.
We assume you have CUDA12. For CUDA11, you will have to change the installation of some packages.
The runtime will depend on the GPU you have available and the size of the protein you are predicting.
On an NVIDIA A100 GPU, the prediction time is a few minutes on average.
First install miniconda, see: https://docs.conda.io/projects/miniconda/en/latest/miniconda-install.html or https://docs.conda.io/projects/miniconda/en/latest/miniconda-other-installer-links.html
bash install_dependencies.sh
- Install the RareFold environment
- Get the RareFold parameters for single-chain structure prediction
- Get the EvoBindRare parameters for binder design
- Get Uniclust for MSA search
- Install HHblits for MSA search
Run the test case (a few minutes)
conda activate rarefold
bash predict.sh
EvoBindRare designs novel peptide binders based only on a protein target sequence. It is not necessary to specify any target residues within the protein sequence - let RareFold choose! EvoBindRare is the only protocol with experimentally verified cyclic-noncanonical amino acid design capacity.
The target structure is shown in green, with canonical peptide residues in blue and noncanonical residues in magenta.
For the original version of EvoBind with regular amino acids, see: https://github.com/patrickbryant1/EvoBind
conda activate rarefold
bash design_eff.sh
- This runs lengths 10-15 using 5 initialisations per length (on a single GPU!). Compare this with using 6 lengths x 5 = 30 GPUs. (with 40 GB RAM this fits, if you have less, reduce the number of lengths/initialisations)
The design_eff.sh script also calls many CPU instances to handle data processing, mutation and feature updates efficiently and simultaneously. This reduces GPU off-time from minutes to seconds for each iteration, resulting in many hours saved.
- For your own design, take your time to optimise the GPU utilisation before starting the design. Try to set different number of lengths and initalisations and check the GPU utilisation (until you run out of RAM).
By consolidating all 30 (6 lengths x 5 initializations) design threads onto a single device, we not only slash the GPU requirement but also eliminate a substantial amount of redundant CPU overhead per design step.
| Design Step Component | Old Time (30x Multiplier) | New Time (Consolidated) | Time Saved (per step) |
|---|---|---|---|
| Prediction (GPU) | 91.29 s (per process, on 30 separate GPUs) | 91.29 s (on 1 consolidated GPU) | 30× Hardware Reduction |
| Non-Prediction CPU Overhead | 600.60 s | 20.02 s | 580.58 s |
| Total Step Time |
|
111.31 s | (Faster wall-clock time + massive resource savings) |
| Step | Time (s) |
|---|---|
| Mutating sequences | 3.16 |
| Loss calcs | 3.41 |
| Making new feats | 0.04 |
| Adding metrics | 0.01 |
| Saving structures | 13.40 |
| Total | 20.02 |
If you use RareFold in your research, please cite
Li Q, Daumiller D, Zuo F, Marcotte H, Pan-Hammarstrom Q and Bryant P. RareFold: Structure prediction and design of proteins with noncanonical amino acids. bioRxiv. 2025. p. 2025.05.19.654846. doi:10.1101/2025.05.19.654846 link to paper
https://zenodo.org/records/17071355
Unfortunately the entire train data with MSAs is too large to share through zenodo. This repo mainly contains predictions and metrics.
EvoBind - designs novel [cyclic] peptide binders based only on a protein target sequence.
RareFold - prediction & design with noncanonical amino acids
RareFoldGPCR - GPCR agonist design with noncanonical amino acids

