Hello and welcome! π
This repository is part of a hands-on workshop on using AlphaFold to predict protein-protein interactions. In this guide, you'll learn how to:
- Collect protein sequences
- Prepare them for multimer predictions
- Run AlphaFold on the server and on the HPC
- Analyze the results and structure quality
Whether you're a beginner in structural biology or just trying to automate your protein interaction screening, this walkthrough is for you!
In this workshop, we will focus on predicting interactions between a protein of interest and a list of candidate partners using structure-based modeling.
The goal is to predict how each candidate might interact directly with the protein of interest. To do this, weβll combine their sequences into pairs, generate 3D models and assess whether a meaningful interaction is likely.
By the end of the session, you'll understand how to:
-
Go from UniProt IDs to structural models
-
Prepare sequence pairs for interaction prediction
-
Evaluate the predicted structures to identify high-confidence interactions
This approach is useful for generating testable hypotheses about protein binding, interaction networks, and functional partnerships.
To predict protein-protein interactions using AlphaFold, we need to provide both protein sequences. Each protein is represented by its amino acid sequence (in FASTA format). Below you can find the UniProt IDs for the Target protein and the partner proteins we are going to test. Feel free to choose one (or more) partner protein.
You can get protein sequences from UniProt
-
Target protein:
- Dram2_mouse (Q9CR48)
-
Partner proteins:
- Q9DC58
- O70404
- Q9CZX7
- Q6GSS7
- P61161
- C0HKE1
- P31996
- P17047
- Q99KI0
- Q9QY73
- Q8R143
- Q922T2
- Q5SRX1
- Q91YT8
- Q91VK4
- O88384
You need to Login to the AlphaFoldServer page online. If you do not have an account, you can quickly create one. ![]()
-
Copy the target protein Fasta and paste it in the input box
-
Select 'Add entity' to add another input box
-
Copy and paste the partner protein Fasta in the other input box
-
Select 'Continue and preview this job'
-
Give a name to your job (recommended: Protein IDs)
-
Select 'Confirm and submit this job'
Notes:
-
It takes 2-3 min to run a protein-protein prediction, depending on the size of the proteins, so take a sip of coffee!
-
You probably have a lot of questions about your model (at least I did the first time).
-
Let's try to assess the output together. What do you think about your model?
Before we proceed with folding proteins on the Cluster, we have some tools to install. I hope this will not take long!
MobaXterm is a powerful SSH client and terminal with X11 server support. Here's how to get it:
Steps:
-
Go to the official website: MobaXterm
-
Choose the version:
-
Download the Home Edition.
-
Choose the Installer edition (so it installs like a normal program) or Portable edition (just unzip and run).
-
-
Install (if you chose Installer edition):
-
Run the .exe file.
-
Follow the installation steps.
-
-
Use the instructions on ALICE WIKI
PyMOL is a molecular visualization system. You can install the open-source version or the commercial one.
Requirements:
Install Anaconda or Miniconda
Steps (in Anaconda Prompt):
conda create -n pymol-env -c schrodinger pymol
conda activate pymol-env
pymol
This creates a new environment and installs PyMOL from the SchrΓΆdinger channel.
-
Follow the link: Pymol
-
Sign up or purchase a license.
You can also use my licence: π
Download the license
-
Download the installer for your OS and follow the instructions.
Lets create your own folder:
mkdir <your_name>Note:
- Change the <your_name> to your own name!
Then move into your folder by running:
cd <your_name>We will download the files by cloning the github repository onto alice: This will generate a new folder you can can work in with all the files you need.
git clone https://github.com/elenikaloudi/AF3_workshopNote:
You can move around the directory using:
cd(change directory).- Followed by either the directory you want to enter.
- Followed by .. (double dot) to go back one directory.
ls(look in a directory).- Followed by either the directory you want to look into.
- Followed by .. (double dot) to see one directory back.
Warning
Please check the box in the bottom left corner that says "Follow terminal folder". This will allow you to click and open files using the mouse.
Run this line of code to get into the workshops folder containing everything you need to follow the workshop:
cd AF3_workshopYou are now at the step of preparing input files to run interaction modeling on an HPC cluster using AlphaFold3.
To run interaction modeling on the cluster, we need to prepare inputs that define which two proteins should be modeled together as a complex. The input should be Json files. This format makes it easy to automate many pairwise runs between a protein of interest and multiple partners.
To Do:
-
Upload the
FASTAsequences on your directoryYou can find the sequences here
-
Convert to
.jsonConvert your
Fastatojsonby running the script
Convert as BaitβPrey Interactions
python fasta_to_json.py path/to/input/directory --bait path/to/bait.fasta
Notes:
bait.fasta should contain exactly one protein sequence.
Output files will be named like: PreyProtein_with_BAIT.json.
Random model seeds are automatically generated (default = 20). You can change this with --seeds:
python fasta_to_json.py ... --seeds 50
-
Move all your
jsonfiles in a directoryTo create a directory:
mkdir name_of_directory
- To move all the
jsonfiles at once run:
mv *.json /path/to/the/directory
After generating JSON files, you can organize each into its own folder using the script.
The output directory of the previous step will be the input directory for this step. So, make sure that all the json files are in one directory.
What it does:
For every *.json file in the provided directory:
-
Creates a new subdirectory named after the file.
-
Moves the file into its corresponding directory.
Usage:
bash organise_json.sh path/to/json/directory
Once your JSON input files are ready, you can launch the interaction prediction jobs by submitting them to the cluster.
Each job will:
-
Read one JSON file (defining the protein pair)
-
Run AlphaFold3
-
Save the predicted complex and confidence scores
You can use the script
This script goes through each folder inside a main directory. For every folder, it creates a SLURM job script, sets the correct input and output paths, and submits the job to the cluster using sbatch.
Usage
Each subdirectory must contain a valid AlphaFold3 input .json file.
Input directory structure:
master_dir/
βββ job1/
β βββ input.json
βββ job2/
β βββ input.json
You first need to set the environmental paths as stated above and then run:
bash master_script.sh /path/to/master_dir
The script needs some adjustments before submitting it!
Note:
It needs about 1h to run, so let's not waste time and get the output from the output folder
After generating predicted complexes, we can filter out low-confidence interactions by analyzing the Predicted Aligned Error (PAE) scores between chains.
PAE provides a per-residue estimate of how well the model predicts the relative position of two regions β in our case, the two protein chains. Lower PAE values at the interface indicate a more confident prediction of the interaction.
PAE extraction:
This script processes AlphaFold or similar model outputs by reading summary_confidences.json files to extract between-chain Predicted Aligned Error (PAE) statistics. It filters results based on an optional PAE threshold and outputs a CSV file summarizing the results.
You can find the script here.
The script extracts average, min, and max between-chain PAE values from structured model output directories.
Usage:
Run the script by editing the last line by specifying the input_dir, output_dir and the pae_threashold.
python pae_filtered_10.py
This will:
-
Traverse the directory tree to locate all
*summary_confidences.jsonfiles. -
Extract between-chain PAE values.
-
Write output rows to a CSV file containing:
- Directory - Average PAE - Minimum PAE - Maximum PAE - All extracted PAE values (comma-separated) -
Skips any
.jsonthat doesn't containchain_pair_pae_min.
Prepare Input Directory
The input should be a master directory structured like this:
input_directory/
βββ protein_1/
β βββ model_1/
β βββ result_summary_confidences.json
βββ protein_2/
β βββ model_1/
β βββ result_summary_confidences.jsonn
...
Each .json file should contain a "chain_pair_pae_min" matrix.
Once the interaction predictions are complete, you can explore the resulting protein complexes in PyMOL to examine structural details and binding interfaces.
The output directory you have contains many files. The one you are intrested in for visualization, the structure, is the .cif file. Click on that and open it in your computer.
We hope this workshop gives you the confidence to explore and predict protein interactions.
If you have questions or you want to contact me, send me an email: elenikaloudi1@gmail.com

