A simple Python script to detect PROSITE-like motifs in protein sequences (FASTA format) using Biopython library. This project demonstrates the use of regular expressions and Biopython to locate biologically meaningful sequence patterns; shown here using a real example related to the ProSeC project at Forschungszentrum Jülich.
-
Reading and parsing FASTA files with Biopython
-
Searching for PROSITE-style sequence motifs with regular expressions
-
Applying a biologically meaningful motif (A-x-A) linked to protein secretion in Corynebacterium glutamicum
Requirements:
- Python 3.8+
- Biopython. Install with this line:
pip install biopythonTo run the file:
python prosite_regex_find.pyThis program relies on two main modules:
- re: handles regular expressions and used to search for motifs.
- Bio.SeqIO: reads and parses FASTA files, giving access to sequence IDs and amino acids strings.
The example sequence used here is the putative L,D-transpeptidase LppS (UniProt ID: Q8NMT9) from Corynebacterium glutamicum, the model organism used in the ProSeC project for studying protein secretion via the Sec pathway.
A characteristic sequence element of Sec-type signal peptides is the A-x-A motif (A = Alanine, x = any amino acid).
It represents the (–3, –1) rule at the cleavage site where the signal peptidase cuts the peptide before export, a feature described in:
-
von Heijne G. (1983) Patterns of amino acids near signal-sequence cleavage sites. (DOI)
-
PROSITE entry PS00013 (Signal peptide cleavage site).
- Enter the signature: A.{1}A
- Enter the path of your FASTA file.
- The output would be:

