metamorphic_multifunction_search is a systematic protocol for the large-scale detection of structural metamorphisms and protein multifunctionality, built on top of the Protein Information System (PIS).
The project combines structural alignments, functional GO annotations, and protein language models to uncover hidden relationships between structure and function across model and non-model organisms.
- Aligns 3D protein structures with high sequence identity.
- Detects divergent conformations (i.e. metamorphisms) using metrics like RMSD or FC-score.
- Uses large-scale filtering (e.g., CD-HIT) and pairwise structural comparison.
- Extracts Gene Ontology (GO) annotations per protein.
- Computes semantic distances between GO terms within each namespace (MF, BP, CC).
- Identifies the most divergent pair of terms per protein to quantify multifunctionality.
- Python 3.11.6
- RabbitMQ
- PostgreSQL with
pgvectorextension - Docker (optional but recommended for deployment)
- Start PostgreSQL with
pgvector:
docker run -d --name pgvectorsql \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=BioData \
-p 5432:5432 \
pgvector/pgvector:pg16- Start RabbitMQ:
docker run -d --name rabbitmq \
-p 15672:15672 \
-p 5672:5672 \
rabbitmq:management- Run the main protocol:
python main.pyThis command executes the full pipeline: data extraction, structural filtering, alignment, functional analysis, and metric computation.
You can tailor the pipeline by editing the config.yaml file or modifying main.py to:
- Switch embedding models
- Apply taxonomy-based filters
- Add new annotation types or similarity metrics