AlphaPulldownSnakemake provides a convenient way to run AlphaPulldown using a Snakemake pipeline. This lets you focus entirely on what you want to compute, rather than how to manage dependencies, versioning, and cluster execution.
Helpful links: AlphaPulldown documentation · Precalculated feature databases · Downstream analysis guide
Create and activate the conda environment:
conda env create \
-n snake \
-f https://raw.githubusercontent.com/KosinskiLab/AlphaPulldownSnakemake/2.1.5/workflow/envs/alphapulldown.yaml
conda activate snakeThis environment file installs Snakemake and all required plugins via conda and pulls in alphapulldown-input-parser from PyPI in a single step.
That's it, you're done!
Create a new processing directory for your project:
snakedeploy deploy-workflow \
https://github.com/KosinskiLab/AlphaPulldownSnakemake \
AlphaPulldownSnakemake \
--tag 2.1.7
cd AlphaPulldownSnakemakeCreate a sample sheet folds.txt listing the proteins you want to fold. The simplest format uses UniProt IDs:
P01258+P01579
P01258
P01579
Each line represents one folding job:
P01258+P01579- fold these two proteins together as a complexP01258- fold this protein as a monomerP01579- fold this protein as a monomer
Advanced protein specification options
You can also specify:
- FASTA file paths instead of UniProt IDs:
/path/to/protein.fasta - Specific residue regions:
Q8I2G6:1-100(residues 1-100 only) - Multiple copies:
Q8I2G6:2(dimer of the same protein) - Combinations:
Q8I2G6:2:1-100+Q8I5K4(dimer of residues 1-100 plus another protein)
Edit config/config.yaml and set the path to your sample sheet:
input_files:
- "folds.txt"If you want to test which proteins from one group interact with proteins from another group, create a second file baits.txt:
Q8I2G6
And update your config:
input_files:
- "folds.txt"
- "baits.txt"This will test all combinations: every protein in folds.txt paired with every protein in baits.txt.
Multi-file pulldown experiments
You can extend this logic to create complex multi-partner interaction screens by adding more input files. For example, with three files:
input_files:
- "proteins_A.txt" # 5 proteins
- "proteins_B.txt" # 3 proteins
- "proteins_C.txt" # 2 proteinsThis will generate all possible combinations across the three groups, creating 5×3×2 = 30 different folding jobs. Each job will contain one protein from each file, allowing you to systematically explore higher-order protein complex formation.
Note: The number of combinations grows multiplicatively, so be mindful of computational costs with many files.
Run the pipeline locally:
snakemake --profile config/profiles/desktop --cores 8Cluster execution
For running on a SLURM cluster, use the executor plugin:
screen -S snakemake_session
snakemake \
--executor slurm \
--profile config/profiles/slurm \
--jobs 200 \
--restart-times 5Detach with Ctrl + A then D. Reattach later with screen -r snakemake_session.
After completion, you'll find:
- Predicted structures in PDB/CIF format in the output directory
- Per-fold interface scores in
output/predictions/<fold>/interfaces.csv - Aggregated interface summary in
output/reports/all_interfaces.csvwhengenerate_recursive_report: true - Interactive APLit web viewer (recommended) for browsing all jobs, PAE plots and AlphaJudge scores
- Optional Jupyter notebook with 3D visualizations and quality plots
- Results table with confidence scores and interaction metrics
APLit is a Streamlit-based UI for browsing AlphaPulldown runs (AF2 and AF3) and AlphaJudge metrics.
Install APLit (once):
pip install git+https://github.com/KosinskiLab/aplit.gitThen launch it from your project directory, pointing it to the predictions folder:
aplit --directory output/predictionsThis starts a local web server (by default at http://localhost:8501) where you can:
-
Filter and sort jobs by ipTM, PAE or AlphaJudge scores
-
Inspect individual models in 3D (3Dmol.js)
-
View PAE heatmaps and download structures / JSON files
On a cluster, run aplit on the login node and forward the port via SSH:
# on cluster
aplit --directory /path/to/project/output/predictions --no-browser# on your laptop
ssh -N -L 8501:localhost:8501 user@cluster.example.orgThen open http://localhost:8501 in your browser.
Override default values to match your cluster:
slurm_partition: "gpu" # which partition/queue to submit to
slurm_qos: "normal" # optional QoS if your site uses it
structure_inference_gpus_per_task: 1 # number of GPUs each inference job needs
structure_inference_gpu_model: "3090" # optional GPU model constraint (remove to allow any)
structure_inference_tasks_per_gpu: 0 # <=0 keeps --ntasks-per-gpu unset in the pluginstructure_inference_gpus_per_task and structure_inference_gpu_model are read by the
Snakemake Slurm executor plugin and translated into --gpus=<model>:<count> (or --gpus=<count> if
no model is specified). We no longer use slurm_gres; requesting GPUs exclusively through these
fields keeps the job submission consistent across clusters.
structure_inference_tasks_per_gpu toggles whether the plugin also emits --ntasks-per-gpu. Leaving
the default 0 prevents that flag, which avoids conflicting with the Tres-per-task request on many
systems. Set it to a positive integer only if your site explicitly requires --ntasks-per-gpu.
If you have precomputed protein features, specify the directory:
feature_directory:
- "/path/to/directory/with/features/"Note: If your features are compressed, set
compress-features: Truein the config.
You can tweak the feature-generation step by editing create_feature_arguments (or by running the
script manually). Commonly used flags:
--data_pipeline {alphafold2,alphafold3}– choose the feature format to emit.--db_preset {full_dbs,reduced_dbs}– switch between the full BFD stack or the reduced databases.--use_mmseqs2– rely on the remote MMseqs2 API; skips local jackhmmer/HHsearch database lookups.--use_precomputed_msas/--save_msa_files– reuse stored MSAs or keep new ones for later runs.--compress_features– zip the generated*.pklfiles (.xzextension) to save space.--skip_existing– leave existing feature files untouched (safe for reruns).--seq_index N– only process the N‑th sequence from the FASTA list.--use_hhsearch,--re_search_templates_mmseqs2– toggle template search implementations.--path_to_mmt,--description_file,--multiple_mmts– enable TrueMultimer CSV-driven feature sets.--max_template_date YYYY-MM-DD– required cutoff for template structures; keeps runs reproducible.
Post-inference analysis is enabled by default. You can disable it or add a project-wide summary in config/config.yaml:
enable_structure_analysis: true # skip alphaJudge if set to false
generate_recursive_report: true # disable if you do not need all_interfaces.csv
recursive_report_arguments: # optional extra CLI flags for alphajudge
--models_to_analyse: bestTo use AlphaFold3 or other backends:
structure_inference_arguments:
--fold_backend: alphafold3
--<other-flags>Note: AlphaPulldown supports:
alphafold2,alphafold3, andalphalinkbackends.
You can pass any backend CLI switches through structure_inference_arguments. Common options are listed below; keep or remove lines based on your needs.
AlphaFold2 flags
structure_inference_arguments:
--compress_result_pickles: False # gzip AF2 result pickles
--remove_result_pickles: False # delete pickles after summary is created
--models_to_relax: None # all | best | none
--remove_keys_from_pickles: True # strip large tensors from pickle outputs
--convert_to_modelcif: True # additionally write ModelCIF files
--allow_resume: True # resume from partial runs
--num_cycle: 3
--num_predictions_per_model: 1
--pair_msa: True
--save_features_for_multimeric_object: False
--skip_templates: False
--msa_depth_scan: False
--multimeric_template: False
--model_names: None
--msa_depth: None
--description_file: None
--path_to_mmt: None
--desired_num_res: None
--desired_num_msa: None
--benchmark: False
--model_preset: monomer
--use_ap_style: False
--use_gpu_relax: True
--dropout: FalseAlphaFold3 flags
structure_inference_arguments:
--jax_compilation_cache_dir: null
--buckets: ['64','128','256','512','768','1024','1280','1536','2048','2560','3072','3584','4096','4608','5120']
--flash_attention_implementation: triton
--num_diffusion_samples: 5
--num_seeds: null
--debug_templates: False
--debug_msas: False
--num_recycles: 10
--save_embeddings: False
--save_distogram: FalseSet the paths to AlphaFold databases and backend weights:
databases_directory: "/path/to/alphafold/databases"
backend_weights_directory: "/path/to/backend/weights"If AlphaPulldown (or this workflow) contributed to your research, please cite Molodenskiy et al., 2025:
@article{Molodenskiy2025AlphaPulldown2,
author = {Molodenskiy, Dmitry and Maurer, Valentin J. and Yu, Dingquan and
Chojnowski, Grzegorz and Bienert, Stefan and Tauriello, Gerardo and
Gilep, Konstantin and Schwede, Torsten and Kosinski, Jan},
title = {AlphaPulldown2—a general pipeline for high-throughput structural modeling},
journal = {Bioinformatics},
volume = {41},
number = {3},
pages = {btaf115},
year = {2025},
doi = {10.1093/bioinformatics/btaf115}
}