Functionalities regarding protein Structures
This workflow offers several functionalities to explore the consequence of protein mutations. It reports features that overlap the mutations, or that are in close physical proximity.
The features reported include protein domains, variants, helices, ligand binding residues, catalytic sites, transmembrane domains, InterPro domains, and known somatic mutations in different types of cancer. This information is extracted from resources such as UniProt, COSMIC, InterPro and Appris. It can also identify mutations affecting the interfaces of protein complexes.
This workflow makes use of PDB files to calculate residues in close proximity. This information is used to find features close to the mutations, at a distance of 5 angstroms, or mutations in residues close to residues in a complex partner, at a distance of up to 8 angstroms.
PDBs are extracted from Interactome3d, which organized thousands of PDBs, for both experimental structures and structure models, of individual proteins and protein complexes.
Pairwise (Smith-Waterman) alignment is used to fix all inconsistencies between protein sequences in PDBs, Uniprot and Ensembl Protein ID.
Vazquez M, Valencia A, Pons T. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics (2015); 31(14):2397-2399 (doi: 10.1093/bioinformatics/btv142)
The annotation tasks take either genomic mutations or mutated isoforms. Mutated
isoforms must be in reference to Ensembl Protein IDs (e.g.
ENSP00000449454:E433D). When inputing genomic mutations, the consequence of
this mutations in terms of mutated isoforms is computed automatically. Genomic
mutations can be reported in Chromosome:Position:AlternativeAllele format (e.g.
1:19949995:T) or in VCF format. Normaly genomic mutations are given always with
respect to the Watson or forward strand. This is the default. Sometimes,
however, mutations are given with respect to the strand that encodes de gene;
If this is the case, specify watson to be false. The version of the methods
using mutated isoforms as input have the _mi_ term in their name. The
organism input is used to specify the version of the genome and gene set to
use, leave Hsa for the most recent Homo sapiens data, or Hsa/may2009
for the latest gene set of hg18. The date codes correspond the Ensembl
archives, which are used to download and process all the information
consistently for each version.
When PDBs are required, the PDB code or a URL can be specified using the pdb
parameter. Alternatively, the content of a PDB file can be provided using the
pdbfile parameter.
The result of these tasks are reported in TSV tables. These tables often encode
complex relationships. For instance a genomic mutation might have as
consequence several mutated isoforms, with each protein mutation overlapping
different domains or features. This multiplicity is encoded in the TSV in a
standard way: using tabs to separate fields, vertical bars (|) to separate
diferent values for each field, and semicolons (;) to further separate
multiple options for each result.
The main tasks are: annotate, annotate_neighbours, and interfaces.
Or alternatively for mutated isoforms: annotate_mi, annotate_mi_neighbours,
and mi_interfaces.
All the workflow tasks can be executed from the web interface, programatically,
through the REST interface using wget or curl, or using the rbbt command.
See the rbbt documentation for more
information.
Annotates genomic mutations based on the protein features that are overlapping amino-acid changes
Annotates genomic mutations based on the protein features that are in close physical proximity to amino-acid changes
At a distance of 5 angstroms
Find variants that affect residues in protein-protein interaction surfaces
Residues at a distance of 8 angstroms of a residue from an interaction partner
Annotates mutated isoforms based on the protein features that are overlapping amino-acid changes
Annotates mutated isoforms based on the protein features that are in close physical proximity to amino-acid changes
At a distance of 5 angstroms
Find mutated_isoforms with affected residues in protein-protein interaction sufaces
Residues at a distance of 8 angstroms of a residue from an interaction partner
Finds residues physical proximity to amino-acid changes in mutated isoforms
At a distance of 5 angstroms
For a given PDB, find all pairs of residues in a PDB that fall within a given 'distance' of each other. It uses PDBs from Interactome3d for individual proteins.
Use a pdb to find the residues neighbouring, in three dimensional space, a particular residue in a given sequence.
Find the correspondence between sequence positions in a PDB and in a given
sequence. PDB positions are reported as chain:position.
Translate the positions of amino-acids in a particular chain of the provided PDB into positions inside a given sequence.
Translate the positions inside a given amino-acid sequence to positions in the sequence of a PDB by aligning them
Run a list of variants through all the analysis and produce a combined report.
This analysis is limited to 1000 variants (use the other more granular methods
otherwise). Variants can be expressed as genomic mutations or protein
mutations. When protein mutations are used, the name of the protein can be
Ensembl Protein ID or any other protein or gene identifier, including gene
symbols (e.g. KRAS:G12V)
Score a list of variants based on the report generated by the wizard. The
limitation to 1000 variants still holds.
Produce a small table summarizing the mutation scores and a few of the features