We record the x, y, and z coordinates for each $\alpha-$carbon of each residue for every frame in the trajectory in the numpy.ndarray coordinates. We then compute the average x, y, and z coordinates for each atom. These are used to construct the atomic position fluctuation (displacement) vectors:
or more explicitly:
We compute the distance matrix, which contains the average distances between residues
We then compute the covariance matrix defined in terms of the atomic displacement vectors as follows:
Finally, we compute the eigenvalues and eigenvectors of the covariance matrix, or the "Essential Modes" of the covariance matrix. This can be used for other analysis such as Principal Components Analysis (PCA).
We estimate the degree of dynamical correlation in terms of the Mutual Information, defined as:
which can be expressed in terms of marginal and joint Shannon Entropies:
where
and
are the marginal and joint Shannon Entropies obtained as ensemble averages over atomic fluctuations with marginal and joint probability distributions computed over thermal fluctuations sampled by MD simulations of the system at equilibrium.
The Mutual Information is a measure of information shared between residues, that is, the information about one residue that can be gained from knowledge of the other residue.
The generalized correlation coefficient is calculated from the Mutual Information according to the following equation:
The generalized correlation coefficients,
They will also be used to compute the so-called "direct communication coefficients", which allow us to trace the paths of direct information transfer within the protein network in a later step.
Once the generalized correlation coefficient matrix has been computed, we compute its eigenvalues and eigenvectors. These can be used for the computation of Eigenvector Centrality (in conjunction with the average distances computed above).
One of the simplest concepts when computing node-level measures is that of centrality, i.e. how central a node or edge is in the graph.
DEGREE CENTRALITY
The degree centrality counts the number of edges adjacent to a node. The degree of node i is the number of existing edges
EIGENVECTOR CENTRALITY
Eigenvector centrality is the weighted sum of the centralities of all nodes that are connected to it by an edge, as defined in the adjacency matrix
For this calculation, the adjacency matrix is defined as follows:
where
The idea of a so-called locality factor presents a way of probing the effect of physical distances between residues, consolidating our focus to only those interactions that occur within this distance.
The centrality values can be normalized such that the nodes with the largest centralities will have values closer to
BETWEENNESS CENTRALITY
The betweenness centrality of a node or edge in a network measures the extent to which it lies on shortest paths. A higher betweenness indicates that the node or edge lies on more shortest paths and hence has more influence on the flow of information in the graph. The shortest path (geodesic path) between two nodes in a graph is the path(s) with the least number of edges.
The geodesic betweenness
-
$\Psi_{s,t}$ denotes the number of shortest paths (geodesics) between nodes$s$ and$t$ -
$\Psi_{s,t}(i)$ denotes the number of shortest paths (geodesics) between nodes$s$ and$t$ that pass through node$i$ . - The geodesic betweenness
$B_n$ of a network is the mean of$B_n(i)$ over all nodes$i$