Skip to content

Conversation

@anupam-banerjee
Copy link
Contributor

This module implements an MSA‑based evolutionary conservation workflow for protein sequences. It reads a wild‑type sequence plus either BLAST XML or a precomputed MSA, optionally maps residues to a PDB structure, and computes per‑position conservation metrics at both residue and residue‑class levels. The wild‑type–based scores include: identity fraction, Shannon entropy, normalized conservation (1 − normalized entropy), consensus residue frequency, gap fraction, and two BLOSUM62‑based scores (mean similarity to the wild‑type residue and mean pairwise BLOSUM62 within the column). The type‑based scores (using H/P/N/B/X classes) include: fraction of sequences in the wild‑type’s class, class entropy, and consensus class frequency. Results are written to a tab‑delimited text file with these metrics, Matplotlib heatmaps for any selected scores, and optional PDB files with chosen metrics encoded in the B‑factor column. A high‑level API (computeConservationFromMSA) supports both CLI use and programmatic access to all metrics, consensus sequences, MSA details, and generated output paths.

@anupam-banerjee
Copy link
Contributor Author

anupam-banerjee commented Dec 11, 2025

@AnthonyBogetti you can test using the following commands (you will need these files)-

from prody import *

For msa file based computation and plotting

nseq = computeConservationFromMSA(
sequence_file="pbp2a.seq",
pdb_file="pbp2a_closed.pdb",
msa_file="pbp2a.seqmsa",
wt_scores="identity,entropy,norm_entropy,consensus_freq,gap_fraction,blosum_wt,blosum_pairwise",
type_scores="type_freq,type_entropy,type_consensus_freq",
wt_bfactor="identity",
type_bfactor="type_freq",
wt_heatmaps="all",
type_heatmaps="all",
residues_per_row=50,
out_prefix="pbp2a_full",
)
print("MSA sequences used:", nseq)

For blast xml based computations

result = computeConservationFromMSA(
sequence_file="pbp2a.seq",
xml_file="pbp2a_blast.xml",
wt_scores="identity,norm_entropy,consensus_freq",
type_scores="type_freq,type_entropy",
wt_bfactor="none",
type_bfactor="none",
wt_heatmaps="identity",
type_heatmaps="none",
out_prefix="pbp2a_xml_tables",
return_data=True,
)

print("MSA sequences used:", result["n_sequences"])
print("WT scores:", result["wt_scores"])
print("TYPE scores:", result["type_scores"])
print("Scores TXT:", result["output_files"]["scores_txt"])
print("WT heatmaps:", result["output_files"]["wt_heatmaps"])

i = 9 # residue 10
print("Residue 10 WT metrics:", result["wt_metrics"][i])
print("Residue 10 TYPE metrics:", result["type_metrics"][i])

@anupam-banerjee
Copy link
Contributor Author

Files needed
evo2msa.ipynb
pbp2a_blast.xml
pbp2a_seq.txt
pbp2a_closed_pdb.txt
pbp2a_30000_seqmsa.txt

rename files accordingly; a jupyter notebook is also provided

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants