Skip to content

A lightweight Biopython tool that scans FASTA sequences for PROSITE-style motifs using regular expressions. Includes a real biological example (A-x-A signal-peptide motif in Corynebacterium glutamicum).

License

Notifications You must be signed in to change notification settings

r98b/PrositeSignatureFinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PROSITE Signature Finder

A simple Python script to detect PROSITE-like motifs in protein sequences (FASTA format) using Biopython library. This project demonstrates the use of regular expressions and Biopython to locate biologically meaningful sequence patterns; shown here using a real example related to the ProSeC project at Forschungszentrum Jülich.

Purpose:

  • Reading and parsing FASTA files with Biopython

  • Searching for PROSITE-style sequence motifs with regular expressions

  • Applying a biologically meaningful motif (A-x-A) linked to protein secretion in Corynebacterium glutamicum


How to run

Requirements:

  • Python 3.8+
  • Biopython. Install with this line:
pip install biopython

To run the file:

python prosite_regex_find.py

Modules used

This program relies on two main modules:

  • re: handles regular expressions and used to search for motifs.
  • Bio.SeqIO: reads and parses FASTA files, giving access to sequence IDs and amino acids strings.

Example: Signal Peptide Motif in Corynebacterium glutamicum

The example sequence used here is the putative L,D-transpeptidase LppS (UniProt ID: Q8NMT9) from Corynebacterium glutamicum, the model organism used in the ProSeC project for studying protein secretion via the Sec pathway.

A characteristic sequence element of Sec-type signal peptides is the A-x-A motif (A = Alanine, x = any amino acid).

It represents the (–3, –1) rule at the cleavage site where the signal peptidase cuts the peptide before export, a feature described in:


Example input/output

1) Occurences found

  • Enter the signature: A.{1}A
  • Enter the path of your FASTA file.
  • The output would be:

OccurenceFound Example

2) Occurences not found

OccurenceNotFound Example

About

A lightweight Biopython tool that scans FASTA sequences for PROSITE-style motifs using regular expressions. Includes a real biological example (A-x-A signal-peptide motif in Corynebacterium glutamicum).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages