Add MSA-based conservation analysis, and plotting module #2195

anupam-banerjee · 2025-12-11T20:56:22Z

This module implements an MSA‑based evolutionary conservation workflow for protein sequences. It reads a wild‑type sequence plus either BLAST XML or a precomputed MSA, optionally maps residues to a PDB structure, and computes per‑position conservation metrics at both residue and residue‑class levels. The wild‑type–based scores include: identity fraction, Shannon entropy, normalized conservation (1 − normalized entropy), consensus residue frequency, gap fraction, and two BLOSUM62‑based scores (mean similarity to the wild‑type residue and mean pairwise BLOSUM62 within the column). The type‑based scores (using H/P/N/B/X classes) include: fraction of sequences in the wild‑type’s class, class entropy, and consensus class frequency. Results are written to a tab‑delimited text file with these metrics, Matplotlib heatmaps for any selected scores, and optional PDB files with chosen metrics encoded in the B‑factor column. A high‑level API (computeConservationFromMSA) supports both CLI use and programmatic access to all metrics, consensus sequences, MSA details, and generated output paths.

anupam-banerjee · 2025-12-11T20:58:05Z

@AnthonyBogetti you can test using the following commands (you will need these files)-

from prody import *

For msa file based computation and plotting

nseq = computeConservationFromMSA(
sequence_file="pbp2a.seq",
pdb_file="pbp2a_closed.pdb",
msa_file="pbp2a.seqmsa",
wt_scores="identity,entropy,norm_entropy,consensus_freq,gap_fraction,blosum_wt,blosum_pairwise",
type_scores="type_freq,type_entropy,type_consensus_freq",
wt_bfactor="identity",
type_bfactor="type_freq",
wt_heatmaps="all",
type_heatmaps="all",
residues_per_row=50,
out_prefix="pbp2a_full",
)
print("MSA sequences used:", nseq)

For blast xml based computations

result = computeConservationFromMSA(
sequence_file="pbp2a.seq",
xml_file="pbp2a_blast.xml",
wt_scores="identity,norm_entropy,consensus_freq",
type_scores="type_freq,type_entropy",
wt_bfactor="none",
type_bfactor="none",
wt_heatmaps="identity",
type_heatmaps="none",
out_prefix="pbp2a_xml_tables",
return_data=True,
)

print("MSA sequences used:", result["n_sequences"])
print("WT scores:", result["wt_scores"])
print("TYPE scores:", result["type_scores"])
print("Scores TXT:", result["output_files"]["scores_txt"])
print("WT heatmaps:", result["output_files"]["wt_heatmaps"])

i = 9 # residue 10
print("Residue 10 WT metrics:", result["wt_metrics"][i])
print("Residue 10 TYPE metrics:", result["type_metrics"][i])

anupam-banerjee · 2025-12-12T20:31:52Z

Files needed
evo2msa.ipynb
pbp2a_blast.xml
pbp2a_seq.txt
pbp2a_closed_pdb.txt
pbp2a_30000_seqmsa.txt

rename files accordingly; a jupyter notebook is also provided

Add MSA-based conservation analysis, and plotting module

799c6a2

AnthonyBogetti self-requested a review December 12, 2025 20:11

AnthonyBogetti self-assigned this Dec 12, 2025

AnthonyBogetti added the Awaiting review label Dec 12, 2025

AnthonyBogetti added the New feature label Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MSA-based conservation analysis, and plotting module #2195

Add MSA-based conservation analysis, and plotting module #2195

anupam-banerjee commented Dec 11, 2025

Uh oh!

anupam-banerjee commented Dec 11, 2025 •

edited

Loading

Uh oh!

anupam-banerjee commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add MSA-based conservation analysis, and plotting module #2195

Are you sure you want to change the base?

Add MSA-based conservation analysis, and plotting module #2195

Conversation

anupam-banerjee commented Dec 11, 2025

Uh oh!

anupam-banerjee commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For msa file based computation and plotting

For blast xml based computations

Uh oh!

anupam-banerjee commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anupam-banerjee commented Dec 11, 2025 •

edited

Loading