ProtDomRetrieverSuite builds on ProtDomRetriever, adding a comprehensive graphical interface and extended functionality for protein domain analysis. It retains core features, such as retrieving domain information from InterPro, while introducing support for AlphaFold structure downloads and domain-specific PDB structure processing.
Created by Nicolas-Frédéric Lipp, PhD
- ProtDomRetriever Illustration: A visual representation of ProtDomRetrieverSuite’s functionality, generated using AI tools. The exact AI prompt is available in
assets/ai_prompt_example.txt. - GUI Screenshot: An example of the graphical user interface of ProtDomRetrieverSuite, illustrating the input configuration, optional steps, and progress tracking.
Protein Domain Analysis • Bioinformatics Tools • InterPro • UniProt • AlphaFold • Structural Bioinformatics
- Features
- System Requirements
- Quick Installation
- Quick Start
- Usage
- Performance Notes
- Examples
- Support
- Author
- License
- Acknowledgments
- Development Notes
- Retrieve domain information for multiple UniProtKB accessions
- Filter domains based on specified InterPro entries
- Select longest domains when multiple entries overlap
- Generate TSV output with domain ranges
- Create FASTA files for the retrieved protein domains
- Modern-like graphical user interface with dark mode
- Real-time progress tracking and logging
- AlphaFold structure download integration
- PDB structure trimming based on domain ranges
- Improved error handling and recovery
- Multi-threaded processing for better performance
- Display resolution: Minimum 870x800 pixels
- RAM: 4GB minimum (≥ 8GB recommended for large datasets)
- Storage: Space requirements depend on dataset size and features used:
- Basic analysis: < 100MB
- With AlphaFold/PDB structures: ~300KB per structure
- With trimmed structures: additional ~300KB per structure
- Python 3.8 or newer
- Internet connection for API access (InterPro, UniProt, AlphaFold)
- Graphics system capable of supporting tkinter GUI
- macOS (Sequoia 15.0+ supported)
- Linux
- Windows
To get started, make sure you have Python 3.8+ installed. Open a terminal/command prompt and install ProtDomRetrieverSuite using either method:
pip install git+https://github.com/NicoFrL/protdomretrieversuite.git# Clone the repository
git clone https://github.com/NicoFrL/protdomretrieversuite.git
# Navigate to the cloned repository
cd protdomretrieversuite
# Install the package using pip
pip install .For detailed installation instructions, including system-specific setup and troubleshooting, see INSTALL.md.
For instance, on macOS Sequoia 15.0+, "Python[XXXXX:XXXXX] +[IMKInputSession subclass]: chose IMKInputSession_Legacy", this is a harmless message.
The application automatically saves your last used configuration (input/output paths,
selected options) to config.json and restores it on next launch for a smoother
workflow. An example configuration file config.json.example.json is provided in the repository.
Configuration options:
input_file: Path to input file containing UniProtKB accessionsoutput_dir: Directory where analysis results will be savedenable_fasta_retrieval: Download FASTA sequences with domain positionsenable_af_download: Download AlphaFold structuresenable_pdb_trimming: Enable domain-based PDB structure trimmingaccept_custom_pdbs: Allow using custom PDB filescustom_pdb_strict: Strict validation for custom PDB filespdb_source_dir: Directory containing custom PDB filesinterpro_entries: InterPro entries for domain filtering (comma-separated)
- Launch the application in your terminal/command prompt:
protdomretrieversuite - Select example input from
tests/seed_test/input_test1.txt - Select an output folder
tests/seed_test/output/ - Enter example entries from
tests/seed_test/entries_test1.txt - Choose an output directory and Press "▶ Run Analysis"!
-
Create an input file: Prepare a
.txtfile containing a list of UniProtKB accessions in one column with no header. These accessions should correspond to the proteins you want to analyze. UniprotKB accessions (Swiss-Prot/TrEMBL) provide a universal protein numbering system to ensure accurate identification. -
Select InterPro features: Decide which type of protein features you want to analyze either from the InterPro database or its consortium member databases.
Below are examples of protein classification databases and example entry formats ProtDomRetriever accepts. Use these as a reference when specifying InterPro entries to analyze:
| Database (with Link) | Entry Format (Example) |
|---|---|
| InterPro | IPR000001 |
| CATH-Gene3D | G3DSA:1.10.10.10 |
| CDD | cd00001 |
| HAMAP | MF_00001 |
| PANTHER | PTHR10000 |
| Pfam | PF00001 |
| PIRSF | PIRSF000005 |
| PRINTS | PR00001 |
| PROSITE Patterns | PS00001 |
| PROSITE Profiles | PS01031 |
| SMART | SM00002 |
| SFLD | SFLDF00001 |
| SUPERFAMILY | SSF100879 |
| NCBIfam | NF000124 |
For more information about Protein Classification (family, domain, sequence feature) and Protein Signatures (patterns, profiles, fingerprints, hidden Markov models (HMMs)), please visit EMBL-EBI tutorial.
protdomretrieversuite- Select an input file containing UniProtKB accessions (one per line)
# Example input file: Q02201 P12345 A0AA96LI61 - Choose output directory for results (wherever you want on your computer)
- Enter InterPro entries for domain filtering (as indicated, one per line or separated by comma)
# Example InterPro entries: IPR018159 SSF46966 # or IPR018159, SSF46966 - Select optional processing steps:
- FASTA sequence retrieval
- Download one Fasta File with domain positions in the headers
- AlphaFold structure download
- PDB structure trimming
- FASTA sequence retrieval
The suite generates several output files, depending on the selected options:
| File Name | Description |
|---|---|
domain_analysis.tsv |
Comprehensive domain information for all input proteins in a tab-separated file. |
domain_ranges.txt |
Text file listing the start and end ranges of the detected domains. |
domain_sequences.fasta |
Contains FASTA sequences of domains if the retrieval option is enabled. |
alphafold_structures/ |
A directory storing AlphaFold-predicted structures downloaded during analysis. |
trimmed_structures/ |
Stores PDB files trimmed to match the specific domain ranges. |
trimming_summary.json |
Trimming info, including timestamps, sources, number processed files and their paths. |
- Multi-threaded processing for efficient API requests
- Rate limiting implemented to respect API guidelines
- Memory usage scales with input size
- For large datasets (>1000 proteins), consider:
- Breaking input into smaller batches
- Ensuring stable internet connection
- Having sufficient disk space for structure files
Example datasets are provided in the tests directory:
- Test Dataset 1 (
input_test1.txt,entries_test1.txt) - Test Dataset 2 (
input_test2.txt,entries_test2.txt) - Test Dataset 3 (
input_test3.txt,entries_test3.txt)
If you encounter any issues or have questions, you can:
- Check the log files in your output directory for detailed debugging information.
- Open an issue on the GitHub repository.
- Contact the developer directly through GitHub.
Nicolas-Frédéric Lipp, PhD
https://github.com/NicoFrL
This project is distributed under a Custom Academic and Non-Commercial License.
It is free to use for educational, research, and non-profit purposes.
For commercial use, please refer to the LICENSE file or contact the author for more information.
- InterPro database and API
Includes rate limiting guidelines and API details on GitHub. - AlphaFold DB
Provides access to predicted protein structures with API documentation. - UniProt database
Comprehensive protein information with programmatic access details.
This project was developed with assistance from AI language models to enhance code structure, adhere to best practices, and improve documentation. The scientific approach and core algorithm were entirely designed and implemented by the author.

