Ultra-high-throughput inference with AlphaFold 3. Replaces Jackhmmer with MMseqs2-GPU for over 68x speedup in homology search and over 22x speedup in end-to-end inference on a single H200 GPU.
AlphaFast has multi-GPU capabilities capable of reaching throughput of 8s per input on 4 H200 GPUs, 4.5s per input on 8 H200 GPUs, and even higher throughput on larger systems, scaling approximately linearly with number of devices.
For minimal setup or those without significant computational resources, see our Modal Setup section for serverless inference at a cost of $0.035 and time of 28s per input.
Check out our bioarxiv preprint here!
Also check out the MMSeqs2-GPU paper here!
Disclaimer: AlphaFast requires AlphaFold 3 model weights, which are subject to Google DeepMind's Terms of Use. You must apply for and receive weights directly from Google. This is not an officially supported Google product.
Note: Protein MSA uses MMseqs2-GPU. RNA MSA uses MMseqs-CPU by default when present, and can optionally fall back to nhmmer via RNA FASTA databases. DNA chains use empty MSA, matching AlphaFold 3's native behavior.
Request access to AlphaFold 3 model parameters via
this form from Google. Approval typically takes 2-5
business days. You will receive a file of compressed weights named af3.bin.zst.
| Environment | Requirements | Jump to |
|---|---|---|
| Local Server | Docker, Sudo Access | Docker Setup |
| HPC Cluster | Singularity, SLURM | HPC Setup |
| Serverless | Modal Billing Account | Modal Setup |
Downloads AlphaFast databases. By default this installs pre-built protein MMseqs2, RNA MMseqs2, and mmCIF data from HuggingFace.
Important: Point
path/to/databasesto a fast data drive (NVMe recommended). The default pre-built install includes protein MMseqs2, RNA MMseqs2, and mmCIF data. Add--include-nhmmeronly if you want RNA FASTA fallback files for forced--use_nhmmerruns. Use--from-sourceonly for advanced rebuild workflows.Prerequisite: Pre-built mode requires
hf,zstd, andtar.--from-sourceadditionally requireswgetandmmseqs. See docs/building.md for MMseqs2 installation instructions.
# Default: protein + RNA MMseqs + mmCIF from HuggingFace
./scripts/setup_databases.sh /path/to/databases
# Add RNA FASTA fallback files for forced nhmmer runs
./scripts/setup_databases.sh /path/to/databases --include-nhmmer
# Protein-only pre-built install
./scripts/setup_databases.sh /path/to/databases --protein-only
# RNA-only pre-built install
./scripts/setup_databases.sh /path/to/databases --rna-only
# Build from Google-hosted source data instead of using pre-built artifacts
./scripts/setup_databases.sh /path/to/databases --from-sourceAlternatively, download pre-built databases from HuggingFace (no padded conversion necessary):
# install HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash
# run script
./scripts/setup_databases.sh /path/to/databases --from-prebuiltOptional: To build the container from source instead, see docs/building.md.
docker pull romerolabduke/alphafast:latestNote: There are several ways to move the weights to your server such as direct download from the link provided by the DeepMind team, SSH transfer via utilities like rsync or scp.
cp /path/to/downloaded/af3.bin.zst /path/to/weights/Create a directory of input .json files. See docs/input_format.md for the full format reference. Minimal example:
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGG..."
}
}
],
"modelSeeds": [1,2,3],
"dialect": "alphafold3",
"version": 3
}RNA-Protein Complex:
{
"name": "rna_protein",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
},
{
"rna": {
"id": ["B"],
"sequence": "GGGGACUGCGUUCGCGCUUUCCCC"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 3
}Note: Performance gains from AlphaFast scale from: input batch size, GPU compute capability/VRAM, and number of GPUs.
Single GPU:
./scripts/run_alphafast.sh \
--input_dir /path/to/inputs \
--output_dir /path/to/outputs \
--db_dir /path/to/databases \
--weights_dir /path/to/weightsOn HPC systems with slow shared storage, add --temp_dir /scratch/$USER/alphafast_tmp to place MMseqs temporary files on fast node-local storage.
Multi-GPU:
./scripts/run_alphafast.sh \
--input_dir /path/to/inputs \
--output_dir /path/to/outputs \
--db_dir /path/to/databases \
--weights_dir /path/to/weights \
--jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
--gpu_devices 0,1,2,3Use --jax_compilation_cache_dir to persist JAX/XLA compilations across runs. This reduces repeated cold-start compile time, especially for repeated inference-only batches on the same GPU model.
Force nhmmer for RNA:
./scripts/run_alphafast.sh \
--input_dir /path/to/inputs \
--output_dir /path/to/outputs \
--db_dir /path/to/databases \
--weights_dir /path/to/weights \
--use_nhmmerThis requires RNA FASTA fallback files to be present, e.g. from ./scripts/setup_databases.sh /path/to/databases --include-nhmmer.
When multiple devices are specified via --gpu_devices, AlphaFast runs a phase-separated parallel pipeline:
- Partition — Inputs are distributed round-robin across GPUs. Identical protein sequences are deduplicated within each partition.
- Phase 1: Parallel MSA — All N GPUs run batched MMseqs2-GPU search simultaneously.
- Phase 2: Parallel Fold — AlphaFast waits until MSAs are complete then data files are re-distributed and all N GPUs run inference simultaneously.
At large batch sizes, every GPU is 100% utilized in each phase, achieving near-linear scaling.
Important: Point
path/to/databasesto a high speed volume with fast network transfer. The defaultsetup_databases.shmode downloads pre-built protein MMseqs2, RNA MMseqs2, and mmCIF data from HuggingFace. Add--include-nhmmeronly if you also want RNA FASTA fallback files. AlphaFast will spend roughly ~1 hour to copy the databases to a local NVMe volume (often called/scratchon HPC systems). If this is not available, then make sure the databases are on the fastest I/O partition possible. Note: You may need to edit the SLURM directives to match your university's specific HPC formatting.
# Submit as SLURM job (CPU node, no GPU needed)
sbatch scripts/setup_databases.sbatch /path/to/databases
# Or run directly in an interactive session:
./scripts/setup_databases.sh /path/to/databases
# Optional: include RNA FASTA fallback for forced nhmmer runs
sbatch scripts/setup_databases.sbatch /path/to/databases --include-nhmmerAlternatively, download pre-built databases from HuggingFace (no padded conversion necessary):
# install HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash
# run script
./scripts/setup_databases.sh /path/to/databases --from-prebuiltImportant: Most university HPC systems contain apptainer or singularity rather than Docker for permission management. Depending on your HPC setup, your home directory may very small; therefore, you should ensure your apptainer or singularity cache directory is set to an appropriately sized and speed volume. For more information, see docs/hpc.md for specific guidance.
singularity pull alphafast.sif docker://romerolabduke/alphafast:latestNote: There are several ways to move the weights to your university HPC system such as direct download from the link provided by the DeepMind team, SSH transfer via utilities like rsync or scp. Most university systems will have a data transfer node with services like Globus that may be useful.
rsync -avP /local/path/af3.bin.zst user@hpc:/path/to/weights/Create a directory of input .json files. See docs/input_format.md for the full format reference. Minimal example:
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGG..."
}
}
],
"modelSeeds": [1,2,3],
"dialect": "alphafold3",
"version": 1
}Note: Performance gains from AlphaFast scale from: input batch size, GPU compute capability/VRAM, and number of GPUs. On HPC systems specifically, AlphaFast will attempt to transfer almost all code, the container, databases to the local
/scratchdirectory of a compute node. This transfer can take up to 1-2 hours depending on network speed; therefore, AlphaFast is optimally used for very large input batch sizes. Furthermore, if you note significant slowdowns, you should ensure the cache directory for package managers like uv and other system packages is not set to a slow filesystem on your cluster setup. If all else fails, our Modal Setup section should be used instead for your needs.
./scripts/run_alphafast.sh \
--input_dir /path/to/inputs \
--output_dir /path/to/outputs \
--db_dir /path/to/databases \
--weights_dir /path/to/weights \
--temp_dir /scratch/$USER/alphafast_tmp \
--container /path/to/alphafast.sif \
--jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
--gpu_devices 0,1,2,3For best results on repeated multi-GPU runs, place the JAX cache on fast node-local or high-performance shared storage. A fully cold cache can still cause the first parallel wave to compile once per worker process before later runs benefit from the persisted artifacts.
Modal provides serverless GPU inference with pay-per-second billing.
pip install modal && modal token new
# Recommended for HuggingFace-backed database downloads on Modal
modal secret create huggingface HF_TOKEN=hf_your_token_here
modal run modal/upload_weights.py --file /path/to/af3.bin.zst --no-extract
modal run modal/prepare_databases.py
# Optional: include RNA FASTA fallback for nhmmer
modal run modal/prepare_databases.py --include-nhmmer
# Advanced: build on Modal from Google-hosted source data
modal run modal/prepare_databases.py --from-source
# Run predictions
modal run modal/af3_predict.py --input protein.jsonmodal/prepare_databases.py expects a Modal secret named huggingface containing HF_TOKEN, and passes it through to Hugging Face for authenticated downloads with higher rate limits.
See docs/modal.md for the full CLI reference, batch processing, multi-GPU modes, and cost estimates.
| Flag | Default | Description |
|---|---|---|
--input_dir |
(required) | Directory containing input JSON files |
--output_dir |
(required) | Output directory for results |
--db_dir |
(required) | Database directory (from setup_databases.sh) |
--weights_dir |
(required) | Directory containing af3.bin.zst |
--gpu_devices |
0 |
Comma-separated GPU device IDs. Single device = single-GPU mode, multiple = multi-GPU mode. Example: --gpu_devices 0,1,2,3 |
--container |
romerolabduke/alphafast:latest |
Docker image or .sif path |
--batch_size |
auto (count of inputs) | MSA batch size |
--backend |
auto-detect | Force docker or singularity |
For advanced flags, see docs/advanced.md.
If you use AlphaFast in your research, please cite, our work, AlphaFold 3, and MMSeqs2-GPU:
@article{Perry2026.02.17.706409,
author = {Perry, Benjamin C and Kim, Jeonghyeon and Romero, Philip A},
title = {AlphaFast: High-throughput AlphaFold 3 via GPU-accelerated MSA construction},
year = {2026},
doi = {10.64898/2026.02.17.706409},
publisher = {Cold Spring Harbor Laboratory},
abstract = {AlphaFold 3 (AF3) enables accurate biomolecular modeling but is limited by slow, CPU-bound multiple sequence alignment (MSA) generation. We introduce AlphaFast, a drop-in framework that integrates GPU-accelerated MMseqs2 sequence search to remove this bottleneck. AlphaFast achieves a 68.5x speedup in MSA construction and a 22.8x reduction in end-to-end runtime on a single GPU, and delivers predictions in 8 seconds per input on four GPUs while maintaining indistinguishable structural accuracy. A serverless deployment enables structure prediction for as little as $0.035 per input. Code is available at https://github.com/RomeroLab/alphafast.},
URL = {https://www.biorxiv.org/content/early/2026/02/18/2026.02.17.706409},
journal = {bioRxiv}
}
@article{Abramson2024,
author = {Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J. and Bambrick, Joshua and Bodenstein, Sebastian W. and Evans, David A. and Hung, Chia-Chun and O’Neill, Michael and Reiman, David and Tunyasuvunakool, Kathryn and Wu, Zachary and Žemgulytė, Akvilė and Arvaniti, Eirini and Beattie, Charles and Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and Congreve, Miles and Cowen-Rivers, Alexander I. and Cowie, Andrew and Figurnov, Michael and Fuchs, Fabian B. and Gladman, Hannah and Jain, Rishub and Khan, Yousuf A. and Low, Caroline M. R. and Perlin, Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine and Yakneen, Sergei and Zhong, Ellen D. and Zielinski, Michal and Žídek, Augustin and Bapst, Victor and Kohli, Pushmeet and Jaderberg, Max and Hassabis, Demis and Jumper, John M.},
journal = {Nature},
title = {Accurate structure prediction of biomolecular interactions with AlphaFold 3},
year = {2024},
volume = {630},
number = {8016},
pages = {493--500},
doi = {10.1038/s41586-024-07487-w}
}@article{Kallenborn2025-fd,
title = "{GPU}-accelerated homology search with {MMseqs2}",
author = "Kallenborn, Felix and Chacon, Alejandro and Hundt, Christian and
Sirelkhatim, Hassan and Didi, Kieran and Cha, Sooyoung and
Dallago, Christian and Mirdita, Milot and Schmidt, Bertil and
Steinegger, Martin",
journal = "Nat. Methods",
volume = 22,
number = 10,
pages = "2024--2027",
year = 2025,
doi = "10.1038/s41592-025-02819-8",
}Source code is licensed under CC-BY-NC-SA 4.0. Model parameters are subject to the AlphaFold 3 Model Parameters Terms of Use. Output is subject to the Output Terms of Use.
This is not an officially supported Google product.
