ARGPrism is a deep learning-based pipeline for predicting and annotating Antibiotic Resistance Genes (ARGs) from protein sequences using transformer embeddings and neural networks.
- Deep Learning Classification: ProtAlbert transformer embeddings + neural network classifier
- GPU Accelerated: Fast processing with CUDA support
- Reference Mapping: DIAMOND BLAST alignment to ARG databases
- Simple Interface: Easy-to-use command line tool
- Flexible Deployment: CPU or GPU execution
- Linux operating system (Ubuntu 20.04+)
- Conda/Miniconda/Mamba (Recommended) must be installed
- 8+ GB RAM (16 GB recommended)
- NVIDIA GPU with CUDA 11.8+ or 12.x (optional, for acceleration)
# Install from conda-forge
mamba install -c bioconda argprism
# Verify installation
argprism --version# Clone repository
git clone https://github.com/haseebmanzur/ARGPrism.git
cd ARGprism
# Create environment
mamba env create -f environment.yml
# Activate environment
mamba activate argprism
# Verify installation
argprism --version# Activate environment
mamba activate argprism
# Run on test data
argprism Test_dataset/Test_data.faa --output-dir results/argprism INPUT_FILE.faa [OPTIONS]| Option | Description | Default |
|---|---|---|
-o, --output-dir |
Output directory | argprism_output |
--device |
Force CPU/CUDA usage | Auto-detect |
--quiet |
Reduce output verbosity | False |
from argprism import run_pipeline
# Run pipeline
result = run_pipeline(
input_fasta="input.faa",
output_dir="results/",
verbose=True
)
print(f"Predictions: {len(result.predictions)}")
print(f"ARGs found: {result.predicted_fasta}")ARGPrism processes protein sequences through the following steps:
Input FASTA → ProtAlbert Embeddings → Neural Classifier → ARG Prediction → DIAMOND Mapping → Report
- Embedding Generation: ProtAlbert generates 4096-dimensional embeddings
- Classification: Neural network predicts ARG/Non-ARG for each sequence
- Reference Mapping: DIAMOND aligns predicted ARGs to reference database
- Report Generation: Creates annotated CSV with ARG names and drug classes
- FASTA file: Protein sequences to analyze
- Built-in models and databases are included
All results saved to output directory:
predicted_ARGs.fasta- Sequences classified as ARGspredicted_ARGs_vs_ref.tsv- DIAMOND alignment resultsfinal_ARG_prediction_report.csv- Annotated predictions with ARG names/drugsdiamond_arg_db.dmnd- DIAMOND database index
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please open an issue on GitHub.
Project PI: Dr. Masood Ur Rehman
Email: m.kayani@sines.nust.edu.pk
Author: Haseeb Manzoor
GitHub: @haseebmanzur
Package Maintainer: Muhammad Muneeb Nasir
GitHub: @muneebdev7
- ProtAlbert - Protein language model
- DIAMOND - Sequence alignment tool
If you use ARGPrism in your research, please cite: