Welcome! This guide shows you how to extract protein sequences from many genomes in just a few minutes—no bioinformatics expertise required!
- Finds genes in your bacterial genomes using coordinate tables
- Translates DNA to protein (amino acid) sequences
- Saves each protein as a FASTA file, ready for BLAST, alignment, etc.
pip install -r requirements.txt
pip install -e .Step 1: Place all your genome files (.fasta or .fa) in one folder, and all coordinate files (.tsv) in another folder. Make sure filenames match:
AP018572.2.fasta→AP018572.2.tsvCP029242.fasta→CP029242.tsv
Step 2: Run the tool:
fasta_aa_extractor --genome-dir path/to/genomes/ --coords-dir path/to/coords/ --genes "acrA,acrB,tolC" --parallel --output-dir results/Step 3: Check your results in the output folder. You'll see files like:
AP018572.2_acrA.faaAP018572.2_acrB.faaCP029242_tolC.faa
- Extract all resistance genes from 100+ genomes in one go
- Use a gene list file:
--genes @genes.txt - Works on Windows, Mac, Linux
- No output? Check that genome and coordinate filenames match exactly
- Missing genes? Make sure gene names in your coordinate files match what you requested
- Need help? Run
fasta_aa_extractor --helpor ask on GitHub
Simple. Fast. No CSVs. Just point and extract!