-
Notifications
You must be signed in to change notification settings - Fork 55
Downstream Analysis
Jupyter notebook output.ipynb is
- Generated during the Create Jupyter Notebook.
- Stored in the
output/reportsdirectory for Snakemake AlphaPulldown. Snakemake AlphaPulldown also generates a notebook with all cells executed in thereport.htmlfile, which can be copied locally and opened in a browser without JupyterLab.
You can output.ipynb open using JupyterLab. JupyterLab is installed with AlphaPulldown. To view the notebook, launch it with:
jupyter-lab output.ipynbNote
If you run AlphaPulldown on a remote computer cluster, you will need a graphical connection, network mount of the remote directory, or a copy of the entire <models_output_dir> to open the notebook in a browser.
Jupyter Notebook remote access
To connect remotely, first launch Jupyter Notebook on the cluster. You can choose a different port number if the selected one is already in use:
jupyter-lab --no-browser --port=8895 output.ipynbThe output of this command will provide you with a link and a token for connection. The token is a unique string that is required for authentication when you first access the Jupyter Notebook interface. Here is an example of what the output might look like:
http://localhost:8895/?token=abc123def456
Next, establish an SSH tunnel using your local machine's command line. The port numbers should match those used in the previous command. Replace with your EMBL login, or if you are using a different cluster, provide the address of that cluster and your login in the format <login>@<address>:
ssh -N -f -L localhost:8895:localhost:8895 <login>@login01.cluster.embl.deAfter establishing the SSH tunnel, you can access the Jupyter Notebook in your web browser. Open your browser and navigate to the following URL:
http://localhost:8895
You will be prompted to enter the token provided earlier when you launched Jupyter Lab on the cluster. Copy and paste the token from the command output into the browser prompt to gain access.
In the JupyterLab window, choose output.ipynb if it does not open automatically. Then, go to Run > Run All Cells. After all cells have been executed for every protein complex, you will see PAE plots, interactive structures colored by pLDDT, and interactive structures colored by a chain.
To zoom in on PAE plots, double-click on them. To increase the number of displayed interactive models, add the argument models to the parse_results() or parse_results_colour_chains() functions.
parse_results('./ProteinA_and_ProteinB', models=10)Warning
If the Jupyter Notebook contains too many proteins, some interactive structures may disappear due to memory limitations. To restore the output of the cell, simply rerun it by selecting the cell and going to Run > Run Selected Cell or pressing Shift + Enter.
Results table:
-
predictions_with_good_interpae.csvis generated during the Create Results table. -
analysis.csvgenerated in theoutput/reportsfor Snakemake AlphaPulldown
By default, you will have a CSV file named predictions_with_good_interpae.txt created in the directory /path/to/your/output/dir as you have given in the command above. predictions_with_good_interpae.txt reports: 1. iptm, iptm+ptm scores provided by AlphaFold 2. mpDockQ score developed by Bryant et al., 2022 3. PI_score developed by Malhotra et al., 2021. The detailed explanations of these scores can be found in our paper and an example screenshot of the table is below. 
AlphaPulldown provides scripts to help optimize data storage and prepare structures for deposition.
The most space-consuming part of the structure prediction results are pickle files result_model_{1,2,3,4,5}_*.pkl files. Please refer to the AlphaFold manual for more details on output files. Some information in these files is needed only for very special tasks. The truncate_pickles.py script copies the output of AlphaPulldown to a new directory and deletes the specified information from the pickle files. It may decrease the size of the output up to 100 times.
source activate AlphaPulldown
truncate_pickles.py \
--src_dir=</path/to/source> \
--dst_dir=</path/to/destination> \
--keys_to_exclude=aligned_confidence_probs,distogram,masked_msa \
--number_of_threads=4 -
--src_dir=</path/to/source>: Replace</path/to/source>with the path to the structures output directory. This should be the same as the--output_pathfor therun_multimer_jobs.pyscript from the Predict Structures step. -
--dst_dir=</path/to/destination>: Replace</path/to/destination>with the path of the directory to copy the truncated results to. -
--keys_to_exclude=aligned_confidence_probs,distogram,masked_msa: A comma-separated list of keys that should be excluded from the copied pickle files. The default keys are "aligned_confidence_probs,distogram,masked_msa". -
--number_of_threads=4: Number of threads to run in parallel.
With PDB files now being marked as a legacy format, here is a way to convert PDB files produced by the AlphaPulldown pipeline into mmCIF files, including the ModelCIF extension.
In addition to the general mmCIF tables, ModelCIF adds information relevant for a modeling experiment. This includes target-sequence annotation and a modeling protocol, describing the process by which a model was created, including software used with its parameters. To help users assess the reliability of a model, various quality metrics can be stored directly in a ModelCIF file or in associated files registered in the main file. ModelCIF is also the preferred format for ModelArchive.
As AlphaPulldown relies on AlphaFold to produce model coordinates, multiple models may be predicted in a single experiment. To accommodate different needs, convert_to_modelcif.py offers three major modes:
- Convert all models into ModelCIF in separate files.
- Only convert a specific single model.
- Convert a specific model to ModelCIF but keep additional models in a Zip archive associated with the representative ModelCIF formatted model.
The most general call of the conversion script, without any non-mandatory arguments, will create a ModelCIF file and an associated Zip archive for each model of each complex found in the --ap_output directory:
source activate AlphaPulldown
convert_to_modelcif.py \
--ap_output <output path of run_multimer_jobs.py>-
--ap_output: Path to the structures directory. This should be the same as the--output_pathfor therun_multimer_jobs.pyscript from the Predict Structures step.
The output is stored in the path that --ap_output points to. After running convert_to_modelcif.py, you should find a ModelCIF file and a Zip archive for each model PDB file in the AlphaPulldown output directory:
Output
ap_output
protein1_and_protein2
|-ranked_0.cif
|-ranked_0.pdb
|-ranked_0.zip
|-ranked_1.cif
|-ranked_1.pdb
|-ranked_1.zip
|-ranked_2.cif
|-ranked_2.pdb
|-ranked_2.zip
|-ranked_3.cif
|-ranked_3.pdb
|-ranked_3.zip
|-ranked_4.cif
|-ranked_4.pdb
|-ranked_4.zip
...
...
If only a single model should be translated to ModelCIF, use the --model_selected option. Provide the ranking of the model as the value. For example, to convert the model ranked 0:
source activate AlphaPulldown
convert_to_modelcif.py \
--ap_output <output path of run_multimer_jobs.py> \
--model_selected 0This will create only one ModelCIF file and Zip archive in the path pointed at by --ap_output:
Output
ap_output
protein1_and_protein2
|-ranked_0.cif
|-ranked_0.pdb
|-ranked_0.zip
|-ranked_1.pdb
|-ranked_2.pdb
|-ranked_3.pdb
|-ranked_4.pdb
...
...
Besides --model_selected, the arguments are the same as for scenario 1.
Sometimes you want to focus on a certain model from the AlphaPulldown pipeline but don't want to completely discard the other models generated. For this, convert_to_modelcif.py can translate all models to ModelCIF but store the excess in the Zip archive of the selected model. This is achieved by adding the option --add_associated together with --model_selected.
source activate AlphaPulldown
convert_to_modelcif.py \
--ap_output <output path of run_multimer_jobs.py> \
--model_selected 0 \
--add-associatedArguments are the same as in scenarios 1 and 2 but include --add_associated.
The output directory looks similar to when only converting a single model:
Output
ap_output
protein1_and_protein2
|-ranked_0.cif
|-ranked_0.pdb
|-ranked_0.zip
|-ranked_1.pdb
|-ranked_2.pdb
|-ranked_3.pdb
|-ranked_4.pdb
...
...
But a peek into ranked_0.zip shows that it stored ModelCIF files and Zip archives for all remaining models of this modeling experiment:
Output
ranked_0.zip
|-ranked_0_local_pairwise_qa.cif
|-ranked_1.cif
|-ranked_1.zip
|-ranked_2.cif
|-ranked_2.zip
|-ranked_3.cif
|-ranked_3.zip
|-ranked_4.cif
|-ranked_4.zip
convert_to_modelcif.py produces two kinds of output: ModelCIF files and Zip archives for each model. The latter are called "associated files/archives" in ModelCIF terminology. Associated files are registered in their corresponding ModelCIF file by categories ma_entry_associated_files and ma_associated_archive_file_details. Historically, this scheme was created to offload AlphaFold's pairwise alignment error lists, which drastically increase file size. Nowadays, the Zip archives are used for all kinds of supplementary information on models, not handled by ModelCIF.
At this time, there is only one option left unexplained: --compress. It tells the script to compress ModelCIF files using Gzip. In the case of --add_associated, the ModelCIF files in the associated Zip archive are also compressed.