Skip to content

ZoliQua/Ortholog-Finder-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

74 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ortholog Finder Tool v1.6

License: GPL v2 Version PHP jQuery DataTables

A unified bioinformatics web tool for exploring evolutionarily conserved proteins across model organisms. Combines multi-database ortholog search with Gene Ontology annotation extension via ortholog-based Venn diagram analysis.

PhD Thesis: Dul, Z. (2019). A system-level approach to identify novel cell size regulators. King's College London. Downloand from KCL Pure website


πŸ“‹ Table of Contents


πŸ”¬ Overview

The Ortholog Finder Tool was developed as part of a PhD project at King's College London (2013–2018) to systematically identify conserved regulators of cell size across eukaryotic model organisms. It integrates orthologous protein relationships from multiple databases and cross-references them with pathway annotations and genome-wide functional screen data.

Version 1.6 unifies two previously separate tools into a single application:

Previously Now
Ortholog Finder Tool v1.5 Mode A: Ortholog Search
GeneOntology Extension Tool v1.0 Mode B: GO Extension

🧬 Two Modes

Mode A β€” Ortholog Search

Multi-database ortholog lookup with pathway annotations and cell-size screen hit data.

  • 5 species: AT, DM, HS, SC, SP
  • 6 ortholog databases: HomoloGene, orthoMCL v5, InParanoid v8, eggNOG v4, COG/KOG, PomBase
  • Pathway data: KEGG and Reactome annotations per protein
  • Screen data: Published cell-size regulatory gene lists (Jorgensen, Moretto, Hayles, BjΓΆrklund, Neumann)
  • Query levels: orth (all orthologs), path (with pathways), same (shared pathways), size_mut1–6 (screen hit filters)
  • Output: Interactive sortable/searchable DataTable + CSV download
  • Data: 225,000 ortholog pairs across 40 MB of pre-processed flat files

Mode B β€” GO Extension

Gene Ontology annotation extension via eggNOG ortholog groups, visualized with Venn diagrams.

  • 7 species: AT, CE, DM, DR, HS, SC, SP
  • 85 GO Slim Generic terms (from GO Consortium)
  • Venn diagram types: Classic Venn (2–7 sets) and Edwards-Venn (2–7 sets)
  • Configurable threshold: Minimum species co-occurrence (2–7)
  • 4 result tables: Filtered hits, expanded list, conserved core, novel annotation suggestions
  • Data: 75 eggNOG export CSVs + 12 SVG diagram templates
  • Backend: MySQL queries against orthology_databases and geneontology tables

Method β€” Homology/Membership (H/M) Ratio

The GO Extension mode identifies potential gaps in Gene Ontology annotations by computing an H/M ratio for each GO term within each orthologous group:

  1. Ortholog group resolution β€” Proteins are mapped to COG/KOG orthologous groups via the eggNOG database (v4). UniProt KB accessions serve as the common identifier across all 7 species.
  2. GO annotation lookup β€” For each orthologous group, the tool retrieves GO Slim annotations for all member proteins from the Gene Ontology database.
  3. H/M score calculation β€” For each GO term within a group, the tool computes the ratio of species carrying the annotation (Homology) to the total number of species with members in the group (Membership). A high H/M ratio combined with a missing annotation in one species suggests a candidate for annotation extension.
  4. Visualization β€” Results are presented in interactive DataTables and Edwards-Venn diagrams (2–7 species) showing the overlap and gaps of annotations across the selected species.

🧫 Model Organisms

Organism Code Taxonomy ID Mode A Mode B
Arabidopsis thaliana AT 3702 βœ… βœ…
Caenorhabditis elegans CE 6239 β€” βœ…
Drosophila melanogaster DM 7227 βœ… βœ…
Danio rerio DR 7955 β€” βœ…
Homo sapiens HS 9606 βœ… βœ…
Saccharomyces cerevisiae SC 559292 βœ… βœ…
Schizosaccharomyces pombe SP 4896 βœ… βœ…

🌐 Live Versions

Tool URL Status
Ortholog Search & GO Tool orthologfindertool.com Online

πŸš€ Installation

Requirements

  • PHP 5.6 or higher
  • MySQL 5.7 or higher
  • Apache or compatible web server

Setup

  1. Clone the repository:

    git clone https://github.com/ZoliQua/Ortholog-Finder-Tool.git
    cd Ortholog-Finder-Tool
  2. Create .env file from the template:

    cp .env.example .env
  3. Edit .env with your database credentials:

    DB_HOST=localhost
    DB_USER=root
    DB_PASS=your_password
    DB_NAME=ortholog
    DB_SOCKET=
    DB_PORT=3306
  4. Set up MySQL database:

    For Mode A (Ortholog Search): No MySQL tables required for the core query β€” all data is file-based (CSV/TSV). The MySQL connection is only used for GeoIP visitor logging (ip2c table).

    For Mode B (GO Extension): Requires orthology_databases and geneontology tables in the orthology database. These contain the ortholog mappings and GO annotation data used by the QueryGO class.

  5. Point your web server to the project root and open index.php in a browser.


βš™οΈ Configuration

Environment Variables (.env)

Variable Description Default
DB_HOST MySQL hostname localhost
DB_USER MySQL username root
DB_PASS MySQL password (empty)
DB_NAME MySQL database name ortholog
DB_SOCKET Unix socket path (optional, for MAMP/XAMPP) (empty)
DB_PORT MySQL port (0 = default) 3306

πŸ“ Project Structure

orthologfindertool-v1.1/
β”‚
β”œβ”€β”€ index.php                β†’ Redirect to main.php
β”œβ”€β”€ main.php                 β†’ Unified router (mode switching)
β”œβ”€β”€ download.php             β†’ CSV export handler
β”œβ”€β”€ dumper.php               β†’ GO batch export (all 85 terms)
β”‚
β”œβ”€β”€ includes/
β”‚   β”œβ”€β”€ mysql.php            β†’ Database connection (.env based)
β”‚   β”œβ”€β”€ mylog.php            β†’ Visitor logging (CSV + GeoIP)
β”‚   β”œβ”€β”€ ip2country.php*      β†’ GeoIP country detection
β”‚   β”‚
β”‚   β”œβ”€β”€ functions.php        β†’ FajlBeolvas + Lekeres classes (Mode A core)
β”‚   β”œβ”€β”€ inc_analyzer.php     β†’ QueryGO class (Mode B core)
β”‚   β”œβ”€β”€ inc_analyzer_dumper.php β†’ QueryGOExport (batch CSV)
β”‚   β”œβ”€β”€ inc_functions.php    β†’ VennDiagram + SVG_File + helpers
β”‚   β”œβ”€β”€ inc_variables.php    β†’ Species, GO terms, POST handling
β”‚   β”‚
β”‚   β”œβ”€β”€ page_landing.php     β†’ Mode selector (landing page)
β”‚   β”œβ”€β”€ page_ortholog_form.php    β†’ Mode A: query form
β”‚   β”œβ”€β”€ page_ortholog_results.php β†’ Mode A: results + DataTable
β”‚   β”œβ”€β”€ page_2_analysis_{species}.php β†’ Mode A: per-species table templates
β”‚   β”œβ”€β”€ page_go_analyzer.php β†’ Mode B: GO form + results + 4 DataTables
β”‚   β”œβ”€β”€ page_sources.php     β†’ References page
β”‚   └── page_aboutus.php     β†’ About page
β”‚
β”œβ”€β”€ _dataset/                β†’ Mode A data files (40 MB)
β”‚   β”œβ”€β”€ ALL_ortholog_dbs_merged.csv   β†’ 225K ortholog pairs (6 DBs)
β”‚   β”œβ”€β”€ kegg_pathways_uniprot.tsv     β†’ KEGG pathway annotations
β”‚   β”œβ”€β”€ reactome_pathways_uniprot.tsv β†’ Reactome pathway annotations
β”‚   β”œβ”€β”€ regular_names.txt             β†’ UniProt ID β†’ gene name mapping
β”‚   └── {AT,DM,HS,SC,SP}_interact_deg2_exp.csv β†’ Cell-size screen data
β”‚
β”œβ”€β”€ source/                  β†’ Mode B source data (15 MB)
β”‚   β”œβ”€β”€ eggNOG-export-*-7.csv (Γ—75) β†’ eggNOG ortholog group exports
β”‚   └── ortholog_*_sample.svg (Γ—12)  β†’ Venn diagram SVG templates
β”‚
β”œβ”€β”€ output/                  β†’ Mode B generated output (15 MB)
β”‚   β”œβ”€β”€ eggNOG-export-*-7.csv (Γ—75) β†’ Processed results
β”‚   └── GO*-Venn-Diagram-*.svg (Γ—49) β†’ Generated Venn diagrams
β”‚
β”œβ”€β”€ _query/                  β†’ Mode A pre-computed results (23 MB)
β”‚   └── jsonquery_{species}.txt      β†’ DataTables JSON (per species)
β”‚
β”œβ”€β”€ media/
β”‚   β”œβ”€β”€ css/unified.css      β†’ Merged stylesheet
β”‚   β”œβ”€β”€ js/                  β†’ jQuery 1.11.2 + DataTables 1.10.5
β”‚   └── images/              β†’ Logos, icons, organism photos
β”‚
β”œβ”€β”€ _log/                    β†’ Visitor logs (not tracked)
β”œβ”€β”€ .env.example             β†’ Environment template
β”œβ”€β”€ .gitignore
└── LICENSE.md               β†’ GNU GPL v2

URL Routing

URL Page
main.php Landing page β€” mode selector
main.php?mode=ortholog Ortholog Search form
main.php?mode=go GO Extension form
main.php?page=source References
main.php?page=about About Us

πŸ“Š Data Sources

Ortholog Databases (Mode A)

Database Version Date
BioGRID 3.2.112 April 2014
eggNOG 4.0 December 2013
InParanoid 8.0 December 2013
orthoMCL 5 2013
HomoloGene β€” 2014
COG/KOG β€” 2014

Pathway Databases (Mode A)

Database URL
KEGG genome.jp/kegg
Reactome reactome.org

Additional Databases

Database Version Purpose
UniProt 2014_04 ID mapping & protein names
PomBase V2.19 Pombe↔Human/Cerevisiae curated orthologs
intAct 2013-11-20 Protein interaction data
MitoCheck ens73 H. sapiens phenotype data

GO Annotation (Mode B)

Source Details
Gene Ontology 85 GO Slim Generic terms
eggNOG v4 Ortholog group β†’ protein mapping

πŸ“š Academic References

Thesis

Dul, Z. (2019). A system-level approach to identify novel cell size regulators. PhD thesis, King's College London. KCL Pure

Cell-Size Screen Publications

Species Publication Journal Year
S. cerevisiae Jorgensen P et al. β€” Systematic identification of pathways that couple cell growth and division in yeast Science 2002
S. cerevisiae Moretto F et al. β€” A pharmaco-epistasis strategy reveals a new cell size controlling pathway in yeast Mol Syst Biol 2013
S. pombe Hayles J et al. β€” A genome-wide resource of cell cycle and cell shape genes of fission yeast Open Biology 2013
S. pombe Graml V et al. β€” A genomic multiprocess survey of machineries that control and link cell shape, microtubule organization, and cell-cycle progression Dev Cell 2014
D. melanogaster BjΓΆrklund M et al. β€” Identification of pathways regulating cell size and cell-cycle progression by RNAi Nature 2006
H. sapiens Neumann B et al. β€” Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes Nature 2010

Gene Ontology

Ashburner M et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25–29. doi:10.1038/75556

Ortholog & Protein Databases

Powell S, Forslund K, Szklarczyk D, et al. (2014). eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Research, 42(D1), D231–D239. PubMed: 24297252

The UniProt Consortium (2015). UniProt: a hub for protein information. Nucleic Acids Research, 43(D1), D99–D106. PubMed: 25348405

External Code

Library Version License
jQuery 1.11.2 MIT
jQuery DataTables 1.10.5 MIT
ip2country by Blagoj Janevski β€” GNU GPL v2

πŸ“œ History

This tool was originally published as two separate web applications during 2015–2018:

  1. Ortholog Finder Tool (orthologfindertool.com) β€” Multi-database ortholog search with pathway and screen data integration. Archived source: ZoliQua/Ortholog-Finder-Tool-Draft

  2. GeneOntology Extension Tool (go.orthologfindertool.com) β€” GO annotation extension via eggNOG ortholog groups with Venn diagram visualization. Archived source: ZoliQua/Ortholog-Finder-Tool-GO

Version 1.6 (November 2025) unifies both tools into a single application with a shared navigation and mode-selection interface. No data or backend logic was changed β€” only the UI was consolidated.

The git history of this repository preserves the full commit histories of both original repositories.


πŸ‘€ Author

Dr. ZoltΓ‘n Dul Dentist, Phd graduate (2014–2019), King's College London

Supervisors & Groups

  • Prof. N. Shaun B. Thomas β€” Cell Cycle & Epigenetics Team, Division of Cancer Studies, King's College London
  • Dr. Attila CsikΓ‘sz-Nagy β€” CsikΓ‘sz-Nagy Group, Randall Division of Cell and Molecular Biophysics, King's College London
  • Dr. Azeddine Si Ammour β€” Genomics and Biology of Fruit Crop, Fondazione Edmund Mach, San Michele all'Adige

πŸ“„ License

This project is licensed under the GNU General Public License v2.0.


Built at King's College London & Fondazione Edmund Mach

About

πŸ’» A unified bioinformatics tool for exploring evolutionarily conserved 🧬 proteins across model organisms. Combines multi-database ortholog search with Gene Ontology annotation extension. πŸŽ“ PhD project, King's College London (2013-2019).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors