This is a temporary repository for the hydrogenase data and HMM compiled by Greening lab (http://www.greeninglab.com/) used in the HydDB webserver, since the previous site hosted by Aarhus University (http://services.birc.au.dk/hyddb) is no longer being maintained.
To run the original HydDB webserver locally, please see our HydDB-webserver repository.
Sequences for [NiFe]-, [FeFe]-, and [Fe]-hydrogenases were previously complied in the two following papers:
- Søndergaard D*, Pedersen CNS, Greening C* (2016) HydDB: a web tool for hydrogenase classification and analysis. Scientific Reports 6, 34212
- Greening C*, Biswas A, Carere CR, Jackson CJ, Taylor MC, Stott MB, Cook GM, Morales SE* (2016). Genomic and metagenomic surveys of hydrogenase distribution indicate H2 is a widely-utilised energy source for microbial growth and survival. The ISME Journal 10, 761-777
To identify highly divergent hydrogenases, the HMM profiles in this repo can be used with HMMER:
# search profile against a database
hmmsearch \
-o output.txt \
-A output_alignment.aln \
--tblout tblout.tsv \
--domtblout domains-tblout.tsv \
--pfamtblout pfamtblout.pfam \
--acc \
--domT <BIT_SCORE_CUTOFF> \
HydDB_all_MM2022.hmm \
your_query_sequences.fastaPlease keep in mind that while HMMER is a sensitive tool for finding remote homologs, it is also prone to return false positive hits. To reduce false-positives, we recommend implementing the following steps:
- Using bit score cutoffs in HMMER (see below)
- Cross-comparing hits from sequence based searches like BLAST or MMseqs2
- Checking the predicted structure of a HMMER hit with AlphaFold
We recommend using the following WIP bit score cutoffs for each class to use with the --domT flag:
- [FeFe]: 15.9
- [NiFe]: 34.5
- [Fe]: 54.4
If you want to reduce the frequency of false positive hits further, we recommend using these more conservative WIP bit score cutoff for each class to use with the --domT flag:
- [FeFe]: 50
- [NiFe]: 120
- [Fe]: n/a
Use DIAMOND for quick and convenient BLASTing of your query sequences against the HydDB sequences.
First, install the diamond binary tool:
# download DIAMOND
wget http://github.com/bbuchfink/diamond/releases/download/v2.1.11/diamond-linux64.tar.gz
# extract binary
tar xzf diamond-linux64.tar.gzSecond, download the HydDB sequences provided in this repository (the HydDB_all_hydrogenases.faa file), and create a diamond database (.dmnd file):
# create a DIAMOND formatted sequence database of the HydDB to search against
./diamond makedb --in HydDB_all_hydrogenases.faa -d hyddbNow you can search your query sequences against the hyddb.dmnd database:
# search the HydDB DIAMOND database with your query protein seqeunces
./diamond blastp \
-q query_sequences.fasta \
-d hyddb.dmnd \
-o output_hits.tsv \
--max-target-seqs 1 \
--header simple \
--outfmt 6 qseqid sseqid pident evalue bitscore full_sseq
This produces a .tsv file of the diamond blastp results. To extract just your query ID's with their corresponding best predicted hydrogenase group:
awk 'NR>1{split($2,arr,/\|/); print $1, arr[3]}' output_hits.tsv | sed '1i query_id\tclosest_hydrogenase_group' > output_hyd_classification.tsvThis repository also provides a simple bash script to run diamond blastp and the subsequent awk step in one go. Make sure that 1) diamond has been installed and is 2) added to your PATH, and that 3) the .dmnd database of HydDB sequences has been is setup. Then run:
./hydclassify.sh [YOUR_QUERY.faa]To reduce false positive hits, we suggest using the following sequence % identity score cutoffs for each class. These can be provided to diamond blastp using the --id flag:
- [NiFe] = >50% for group 4, >30% for all other groups
- [FeFe] = >45%
- [Fe] = >50%
If you use the HydDB in your research, please cite the following:
@article{Søndergaard_2016,
author={Søndergaard, Dan and Pedersen, Christian N. S. and Greening, Chris},
title={HydDB: A web tool for hydrogenase classification and analysis},
year={2016},
journal={Scientific Reports},
DOI={10.1038/srep34212},
}
@article{Greening_2015,
author={Greening, Chris and Biswas, Ambarish and Carere, Carlo R and Jackson, Colin J and Taylor, Matthew C and Stott, Matthew B and Cook, Gregory M and Morales, Sergio E},
title={Genomic and metagenomic surveys of hydrogenase distribution indicate H2 is a widely utilised energy source for microbial growth and survival},
year={2015},
journal={The ISME Journal},
DOI={10.1038/ismej.2015.153},
}