Skip to content

Latest commit

 

History

History
382 lines (287 loc) · 15.8 KB

File metadata and controls

382 lines (287 loc) · 15.8 KB

AlphaFast

Ultra-high-throughput inference with AlphaFold 3. Replaces Jackhmmer with MMseqs2-GPU for over 68x speedup in homology search and over 22x speedup in end-to-end inference on a single H200 GPU.

AlphaFast has multi-GPU capabilities capable of reaching throughput of 8s per input on 4 H200 GPUs, 4.5s per input on 8 H200 GPUs, and even higher throughput on larger systems, scaling approximately linearly with number of devices.

For minimal setup or those without significant computational resources, see our Modal Setup section for serverless inference at a cost of $0.035 and time of 28s per input.

Check out our bioarxiv preprint here!

Also check out the MMSeqs2-GPU paper here!

Disclaimer: AlphaFast requires AlphaFold 3 model weights, which are subject to Google DeepMind's Terms of Use. You must apply for and receive weights directly from Google. This is not an officially supported Google product.

Note: Protein MSA uses MMseqs2-GPU. RNA MSA uses MMseqs-CPU by default when present, and can optionally fall back to nhmmer via RNA FASTA databases. DNA chains use empty MSA, matching AlphaFold 3's native behavior.

Quick Start

Step 1: Acquire Model Weights

Request access to AlphaFold 3 model parameters via this form from Google. Approval typically takes 2-5 business days. You will receive a file of compressed weights named af3.bin.zst.

Step 2: Choose Your Compute Environment

Environment Requirements Jump to
Local Server Docker, Sudo Access Docker Setup
HPC Cluster Singularity, SLURM HPC Setup
Serverless Modal Billing Account Modal Setup

Docker Setup

Step 3: Download and Convert Databases

Downloads AlphaFast databases. By default this installs pre-built protein MMseqs2, RNA MMseqs2, and mmCIF data from HuggingFace.

Important: Point path/to/databases to a fast data drive (NVMe recommended). The default pre-built install includes protein MMseqs2, RNA MMseqs2, and mmCIF data. Add --include-nhmmer only if you want RNA FASTA fallback files for forced --use_nhmmer runs. Use --from-source only for advanced rebuild workflows.

Prerequisite: Pre-built mode requires hf, zstd, and tar. --from-source additionally requires wget and mmseqs. See docs/building.md for MMseqs2 installation instructions.

# Default: protein + RNA MMseqs + mmCIF from HuggingFace
./scripts/setup_databases.sh /path/to/databases

# Add RNA FASTA fallback files for forced nhmmer runs
./scripts/setup_databases.sh /path/to/databases --include-nhmmer

# Protein-only pre-built install
./scripts/setup_databases.sh /path/to/databases --protein-only

# RNA-only pre-built install
./scripts/setup_databases.sh /path/to/databases --rna-only

# Build from Google-hosted source data instead of using pre-built artifacts
./scripts/setup_databases.sh /path/to/databases --from-source

Alternatively, download pre-built databases from HuggingFace (no padded conversion necessary):

# install HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# run script
./scripts/setup_databases.sh /path/to/databases --from-prebuilt

Step 4: Pull Container

Optional: To build the container from source instead, see docs/building.md.

docker pull romerolabduke/alphafast:latest

Step 5: Place Weights

Note: There are several ways to move the weights to your server such as direct download from the link provided by the DeepMind team, SSH transfer via utilities like rsync or scp.

cp /path/to/downloaded/af3.bin.zst /path/to/weights/

Step 6: Create Input

Create a directory of input .json files. See docs/input_format.md for the full format reference. Minimal example:

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGG..."
      }
    }
  ],
  "modelSeeds": [1,2,3],
  "dialect": "alphafold3",
  "version": 3
}

RNA-Protein Complex:

{
  "name": "rna_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "rna": {
        "id": ["B"],
        "sequence": "GGGGACUGCGUUCGCGCUUUCCCC"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 3
}

Step 7: Run Inference

Note: Performance gains from AlphaFast scale from: input batch size, GPU compute capability/VRAM, and number of GPUs.

Single GPU:

./scripts/run_alphafast.sh \
    --input_dir /path/to/inputs \
    --output_dir /path/to/outputs \
    --db_dir /path/to/databases \
    --weights_dir /path/to/weights

On HPC systems with slow shared storage, add --temp_dir /scratch/$USER/alphafast_tmp to place MMseqs temporary files on fast node-local storage.

Multi-GPU:

./scripts/run_alphafast.sh \
    --input_dir /path/to/inputs \
    --output_dir /path/to/outputs \
    --db_dir /path/to/databases \
    --weights_dir /path/to/weights \
    --jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
    --gpu_devices 0,1,2,3

Use --jax_compilation_cache_dir to persist JAX/XLA compilations across runs. This reduces repeated cold-start compile time, especially for repeated inference-only batches on the same GPU model.

Force nhmmer for RNA:

./scripts/run_alphafast.sh \
    --input_dir /path/to/inputs \
    --output_dir /path/to/outputs \
    --db_dir /path/to/databases \
    --weights_dir /path/to/weights \
    --use_nhmmer

This requires RNA FASTA fallback files to be present, e.g. from ./scripts/setup_databases.sh /path/to/databases --include-nhmmer.

How Multi-GPU Mode Works

When multiple devices are specified via --gpu_devices, AlphaFast runs a phase-separated parallel pipeline:

  1. Partition — Inputs are distributed round-robin across GPUs. Identical protein sequences are deduplicated within each partition.
  2. Phase 1: Parallel MSA — All N GPUs run batched MMseqs2-GPU search simultaneously.
  3. Phase 2: Parallel Fold — AlphaFast waits until MSAs are complete then data files are re-distributed and all N GPUs run inference simultaneously.

At large batch sizes, every GPU is 100% utilized in each phase, achieving near-linear scaling.


HPC Setup

Step 3: Install Databases

Important: Point path/to/databases to a high speed volume with fast network transfer. The default setup_databases.sh mode downloads pre-built protein MMseqs2, RNA MMseqs2, and mmCIF data from HuggingFace. Add --include-nhmmer only if you also want RNA FASTA fallback files. AlphaFast will spend roughly ~1 hour to copy the databases to a local NVMe volume (often called /scratch on HPC systems). If this is not available, then make sure the databases are on the fastest I/O partition possible. Note: You may need to edit the SLURM directives to match your university's specific HPC formatting.

# Submit as SLURM job (CPU node, no GPU needed)
sbatch scripts/setup_databases.sbatch /path/to/databases

# Or run directly in an interactive session:
./scripts/setup_databases.sh /path/to/databases

# Optional: include RNA FASTA fallback for forced nhmmer runs
sbatch scripts/setup_databases.sbatch /path/to/databases --include-nhmmer

Alternatively, download pre-built databases from HuggingFace (no padded conversion necessary):

# install HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# run script
./scripts/setup_databases.sh /path/to/databases --from-prebuilt

Step 4: Pull Container

Important: Most university HPC systems contain apptainer or singularity rather than Docker for permission management. Depending on your HPC setup, your home directory may very small; therefore, you should ensure your apptainer or singularity cache directory is set to an appropriately sized and speed volume. For more information, see docs/hpc.md for specific guidance.

singularity pull alphafast.sif docker://romerolabduke/alphafast:latest

Step 5: Place Weights

Note: There are several ways to move the weights to your university HPC system such as direct download from the link provided by the DeepMind team, SSH transfer via utilities like rsync or scp. Most university systems will have a data transfer node with services like Globus that may be useful.

rsync -avP /local/path/af3.bin.zst user@hpc:/path/to/weights/

Step 6: Create Input

Create a directory of input .json files. See docs/input_format.md for the full format reference. Minimal example:

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGG..."
      }
    }
  ],
  "modelSeeds": [1,2,3],
  "dialect": "alphafold3",
  "version": 1
}

Step 7: Run

Note: Performance gains from AlphaFast scale from: input batch size, GPU compute capability/VRAM, and number of GPUs. On HPC systems specifically, AlphaFast will attempt to transfer almost all code, the container, databases to the local /scratch directory of a compute node. This transfer can take up to 1-2 hours depending on network speed; therefore, AlphaFast is optimally used for very large input batch sizes. Furthermore, if you note significant slowdowns, you should ensure the cache directory for package managers like uv and other system packages is not set to a slow filesystem on your cluster setup. If all else fails, our Modal Setup section should be used instead for your needs.

./scripts/run_alphafast.sh \
    --input_dir /path/to/inputs \
    --output_dir /path/to/outputs \
    --db_dir /path/to/databases \
    --weights_dir /path/to/weights \
    --temp_dir /scratch/$USER/alphafast_tmp \
    --container /path/to/alphafast.sif \
    --jax_compilation_cache_dir /scratch/$USER/alphafast_jax_cache \
    --gpu_devices 0,1,2,3

For best results on repeated multi-GPU runs, place the JAX cache on fast node-local or high-performance shared storage. A fully cold cache can still cause the first parallel wave to compile once per worker process before later runs benefit from the persisted artifacts.


Modal Setup

Modal provides serverless GPU inference with pay-per-second billing.

pip install modal && modal token new

# Recommended for HuggingFace-backed database downloads on Modal
modal secret create huggingface HF_TOKEN=hf_your_token_here

modal run modal/upload_weights.py --file /path/to/af3.bin.zst --no-extract
modal run modal/prepare_databases.py

# Optional: include RNA FASTA fallback for nhmmer
modal run modal/prepare_databases.py --include-nhmmer

# Advanced: build on Modal from Google-hosted source data
modal run modal/prepare_databases.py --from-source

# Run predictions
modal run modal/af3_predict.py --input protein.json

modal/prepare_databases.py expects a Modal secret named huggingface containing HF_TOKEN, and passes it through to Hugging Face for authenticated downloads with higher rate limits.

See docs/modal.md for the full CLI reference, batch processing, multi-GPU modes, and cost estimates.


Configuration

Flag Default Description
--input_dir (required) Directory containing input JSON files
--output_dir (required) Output directory for results
--db_dir (required) Database directory (from setup_databases.sh)
--weights_dir (required) Directory containing af3.bin.zst
--gpu_devices 0 Comma-separated GPU device IDs. Single device = single-GPU mode, multiple = multi-GPU mode. Example: --gpu_devices 0,1,2,3
--container romerolabduke/alphafast:latest Docker image or .sif path
--batch_size auto (count of inputs) MSA batch size
--backend auto-detect Force docker or singularity

For advanced flags, see docs/advanced.md.

Citing This Work

If you use AlphaFast in your research, please cite, our work, AlphaFold 3, and MMSeqs2-GPU:

AlphaFast Citation

@article{Perry2026.02.17.706409,
 author = {Perry, Benjamin C and Kim, Jeonghyeon and Romero, Philip A},
 title = {AlphaFast: High-throughput AlphaFold 3 via GPU-accelerated MSA construction},
 year = {2026},
 doi = {10.64898/2026.02.17.706409},
 publisher = {Cold Spring Harbor Laboratory},
 abstract = {AlphaFold 3 (AF3) enables accurate biomolecular modeling but is limited by slow, CPU-bound multiple sequence alignment (MSA) generation. We introduce AlphaFast, a drop-in framework that integrates GPU-accelerated MMseqs2 sequence search to remove this bottleneck. AlphaFast achieves a 68.5x speedup in MSA construction and a 22.8x reduction in end-to-end runtime on a single GPU, and delivers predictions in 8 seconds per input on four GPUs while maintaining indistinguishable structural accuracy. A serverless deployment enables structure prediction for as little as $0.035 per input. Code is available at https://github.com/RomeroLab/alphafast.},
 URL = {https://www.biorxiv.org/content/early/2026/02/18/2026.02.17.706409},
 journal = {bioRxiv}
}

AlphaFold 3 Citation

@article{Abramson2024,
  author  = {Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J. and Bambrick, Joshua and Bodenstein, Sebastian W. and Evans, David A. and Hung, Chia-Chun and O’Neill, Michael and Reiman, David and Tunyasuvunakool, Kathryn and Wu, Zachary and Žemgulytė, Akvilė and Arvaniti, Eirini and Beattie, Charles and Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and Congreve, Miles and Cowen-Rivers, Alexander I. and Cowie, Andrew and Figurnov, Michael and Fuchs, Fabian B. and Gladman, Hannah and Jain, Rishub and Khan, Yousuf A. and Low, Caroline M. R. and Perlin, Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine and Yakneen, Sergei and Zhong, Ellen D. and Zielinski, Michal and Žídek, Augustin and Bapst, Victor and Kohli, Pushmeet and Jaderberg, Max and Hassabis, Demis and Jumper, John M.},
  journal = {Nature},
  title   = {Accurate structure prediction of biomolecular interactions with AlphaFold 3},
  year    = {2024},
  volume  = {630},
  number  = {8016},
  pages   = {493--500},
  doi     = {10.1038/s41586-024-07487-w}
}

MMseqs2-GPU Citation

@article{Kallenborn2025-fd,
  title     = "{GPU}-accelerated homology search with {MMseqs2}",
  author    = "Kallenborn, Felix and Chacon, Alejandro and Hundt, Christian and
               Sirelkhatim, Hassan and Didi, Kieran and Cha, Sooyoung and
               Dallago, Christian and Mirdita, Milot and Schmidt, Bertil and
               Steinegger, Martin",
  journal   = "Nat. Methods",
  volume    =  22,
  number    =  10,
  pages     = "2024--2027",
  year      =  2025,
  doi       = "10.1038/s41592-025-02819-8",
}

License

Source code is licensed under CC-BY-NC-SA 4.0. Model parameters are subject to the AlphaFold 3 Model Parameters Terms of Use. Output is subject to the Output Terms of Use.

This is not an officially supported Google product.