Skip to content

Downstream Analysis

amrismil edited this page Oct 28, 2025 · 1 revision

Jupyter notebook

Jupyter notebook output.ipynb is

  • Generated during the Create Jupyter Notebook.
  • Stored in the output/reports directory for Snakemake AlphaPulldown. Snakemake AlphaPulldown also generates a notebook with all cells executed in the report.html file, which can be copied locally and opened in a browser without JupyterLab.

You can output.ipynb open using JupyterLab. JupyterLab is installed with AlphaPulldown. To view the notebook, launch it with:

jupyter-lab output.ipynb

Note

If you run AlphaPulldown on a remote computer cluster, you will need a graphical connection, network mount of the remote directory, or a copy of the entire <models_output_dir> to open the notebook in a browser.

Jupyter Notebook remote access

To connect remotely, first launch Jupyter Notebook on the cluster. You can choose a different port number if the selected one is already in use:

jupyter-lab --no-browser --port=8895 output.ipynb

The output of this command will provide you with a link and a token for connection. The token is a unique string that is required for authentication when you first access the Jupyter Notebook interface. Here is an example of what the output might look like:

http://localhost:8895/?token=abc123def456

Next, establish an SSH tunnel using your local machine's command line. The port numbers should match those used in the previous command. Replace with your EMBL login, or if you are using a different cluster, provide the address of that cluster and your login in the format <login>@<address>:

ssh -N -f -L localhost:8895:localhost:8895 <login>@login01.cluster.embl.de

After establishing the SSH tunnel, you can access the Jupyter Notebook in your web browser. Open your browser and navigate to the following URL:

http://localhost:8895

You will be prompted to enter the token provided earlier when you launched Jupyter Lab on the cluster. Copy and paste the token from the command output into the browser prompt to gain access.

In the JupyterLab window, choose output.ipynb if it does not open automatically. Then, go to Run > Run All Cells. After all cells have been executed for every protein complex, you will see PAE plots, interactive structures colored by pLDDT, and interactive structures colored by a chain.

Shows an illustrated sun in light mode and a moon with stars in dark mode.

To zoom in on PAE plots, double-click on them. To increase the number of displayed interactive models, add the argument models to the parse_results() or parse_results_colour_chains() functions.

parse_results('./ProteinA_and_ProteinB', models=10)

Warning

If the Jupyter Notebook contains too many proteins, some interactive structures may disappear due to memory limitations. To restore the output of the cell, simply rerun it by selecting the cell and going to Run > Run Selected Cell or pressing Shift + Enter.

Results table

Results table:

By default, you will have a CSV file named predictions_with_good_interpae.txt created in the directory /path/to/your/output/dir as you have given in the command above. predictions_with_good_interpae.txt reports: 1. iptm, iptm+ptm scores provided by AlphaFold 2. mpDockQ score developed by Bryant et al., 2022 3. PI_score developed by Malhotra et al., 2021. The detailed explanations of these scores can be found in our paper and an example screenshot of the table is below. manuals/example_table_screenshot.png

Results management scripts

AlphaPulldown provides scripts to help optimize data storage and prepare structures for deposition.

Decrease the size of AlphaPulldown output

The most space-consuming part of the structure prediction results are pickle files result_model_{1,2,3,4,5}_*.pkl files. Please refer to the AlphaFold manual for more details on output files. Some information in these files is needed only for very special tasks. The truncate_pickles.py script copies the output of AlphaPulldown to a new directory and deletes the specified information from the pickle files. It may decrease the size of the output up to 100 times.

source activate AlphaPulldown
truncate_pickles.py \
  --src_dir=</path/to/source> \
  --dst_dir=</path/to/destination> \
  --keys_to_exclude=aligned_confidence_probs,distogram,masked_msa \
  --number_of_threads=4 
  • --src_dir=</path/to/source>: Replace </path/to/source> with the path to the structures output directory. This should be the same as the --output_path for the run_multimer_jobs.py script from the Predict Structures step.
  • --dst_dir=</path/to/destination>: Replace </path/to/destination> with the path of the directory to copy the truncated results to.
  • --keys_to_exclude=aligned_confidence_probs,distogram,masked_msa: A comma-separated list of keys that should be excluded from the copied pickle files. The default keys are "aligned_confidence_probs,distogram,masked_msa".
  • --number_of_threads=4: Number of threads to run in parallel.

Convert Models from PDB Format to ModelCIF Format

With PDB files now being marked as a legacy format, here is a way to convert PDB files produced by the AlphaPulldown pipeline into mmCIF files, including the ModelCIF extension.

In addition to the general mmCIF tables, ModelCIF adds information relevant for a modeling experiment. This includes target-sequence annotation and a modeling protocol, describing the process by which a model was created, including software used with its parameters. To help users assess the reliability of a model, various quality metrics can be stored directly in a ModelCIF file or in associated files registered in the main file. ModelCIF is also the preferred format for ModelArchive.

As AlphaPulldown relies on AlphaFold to produce model coordinates, multiple models may be predicted in a single experiment. To accommodate different needs, convert_to_modelcif.py offers three major modes:

  • Convert all models into ModelCIF in separate files.
  • Only convert a specific single model.
  • Convert a specific model to ModelCIF but keep additional models in a Zip archive associated with the representative ModelCIF formatted model.

1. Convert all models to separate ModelCIF files

The most general call of the conversion script, without any non-mandatory arguments, will create a ModelCIF file and an associated Zip archive for each model of each complex found in the --ap_output directory:

source activate AlphaPulldown
convert_to_modelcif.py \
  --ap_output <output path of run_multimer_jobs.py>
  • --ap_output: Path to the structures directory. This should be the same as the --output_path for the run_multimer_jobs.py script from the Predict Structures step.

The output is stored in the path that --ap_output points to. After running convert_to_modelcif.py, you should find a ModelCIF file and a Zip archive for each model PDB file in the AlphaPulldown output directory:

Output
ap_output
    protein1_and_protein2
        |-ranked_0.cif
        |-ranked_0.pdb
        |-ranked_0.zip
        |-ranked_1.cif
        |-ranked_1.pdb
        |-ranked_1.zip
        |-ranked_2.cif
        |-ranked_2.pdb
        |-ranked_2.zip
        |-ranked_3.cif
        |-ranked_3.pdb
        |-ranked_3.zip
        |-ranked_4.cif
        |-ranked_4.pdb
        |-ranked_4.zip
        ...
    ...

2. Only convert a specific single model for each complex

If only a single model should be translated to ModelCIF, use the --model_selected option. Provide the ranking of the model as the value. For example, to convert the model ranked 0:

source activate AlphaPulldown
convert_to_modelcif.py \
  --ap_output <output path of run_multimer_jobs.py> \
  --model_selected 0

This will create only one ModelCIF file and Zip archive in the path pointed at by --ap_output:

Output
ap_output
    protein1_and_protein2
        |-ranked_0.cif
        |-ranked_0.pdb
        |-ranked_0.zip
        |-ranked_1.pdb
        |-ranked_2.pdb
        |-ranked_3.pdb
        |-ranked_4.pdb
        ...
    ...

Besides --model_selected, the arguments are the same as for scenario 1.

3. Have a representative model and keep associated models

Sometimes you want to focus on a certain model from the AlphaPulldown pipeline but don't want to completely discard the other models generated. For this, convert_to_modelcif.py can translate all models to ModelCIF but store the excess in the Zip archive of the selected model. This is achieved by adding the option --add_associated together with --model_selected.

source activate AlphaPulldown
convert_to_modelcif.py \
  --ap_output <output path of run_multimer_jobs.py> \
  --model_selected 0 \
  --add-associated

Arguments are the same as in scenarios 1 and 2 but include --add_associated.

The output directory looks similar to when only converting a single model:

Output
ap_output
    protein1_and_protein2
        |-ranked_0.cif
        |-ranked_0.pdb
        |-ranked_0.zip
        |-ranked_1.pdb
        |-ranked_2.pdb
        |-ranked_3.pdb
        |-ranked_4.pdb
        ...
    ...

But a peek into ranked_0.zip shows that it stored ModelCIF files and Zip archives for all remaining models of this modeling experiment:

Output
ranked_0.zip
    |-ranked_0_local_pairwise_qa.cif
    |-ranked_1.cif
    |-ranked_1.zip
    |-ranked_2.cif
    |-ranked_2.zip
    |-ranked_3.cif
    |-ranked_3.zip
    |-ranked_4.cif
    |-ranked_4.zip

Associated Zip Archives

convert_to_modelcif.py produces two kinds of output: ModelCIF files and Zip archives for each model. The latter are called "associated files/archives" in ModelCIF terminology. Associated files are registered in their corresponding ModelCIF file by categories ma_entry_associated_files and ma_associated_archive_file_details. Historically, this scheme was created to offload AlphaFold's pairwise alignment error lists, which drastically increase file size. Nowadays, the Zip archives are used for all kinds of supplementary information on models, not handled by ModelCIF.

Miscellaneous Options

At this time, there is only one option left unexplained: --compress. It tells the script to compress ModelCIF files using Gzip. In the case of --add_associated, the ModelCIF files in the associated Zip archive are also compressed.