Skip to content
This repository was archived by the owner on Mar 16, 2026. It is now read-only.

ZoliQua/Ortholog-Finder-Tool-Draft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ortholog Finder Tool β€” Draft (Archived)

This repository is archived. The unified version of this tool is available at: Ortholog-Finder-Tool

Overview

A web-based bioinformatics tool for querying and analyzing orthologous proteins across five eukaryotic model organisms. The tool integrates ortholog relationships from multiple databases with pathway annotations and genome-wide cell-size screen data to identify evolutionarily conserved regulators.

This is the original draft/prototype version (2014–2017) that preceded the published Ortholog Finder Tool v1.1. It served as the development platform for the multi-database ortholog query engine (Lekeres class) and the flat-file data integration pipeline.

Model Organisms

Abbreviation Species Common name
AT Arabidopsis thaliana Thale cress
DM Drosophila melanogaster Fruit fly
HS Homo sapiens Human
SC Saccharomyces cerevisiae Budding yeast
SP Schizosaccharomyces pombe Fission yeast

Method

  1. Multi-database ortholog resolution β€” Proteins are queried across 6 ortholog databases (HomoloGene, orthoMCL v5, InParanoid v8, eggNOG v4, COG/KOG, PomBase) using UniProt KB accessions as the common identifier.
  2. Pathway annotation β€” KEGG and Reactome pathway memberships are retrieved for each protein, enabling identification of shared pathway context across orthologous groups.
  3. Screen data integration β€” Published genome-wide cell-size regulatory gene lists (Jorgensen, Moretto, Hayles, BjΓΆrklund, Neumann) are cross-referenced with ortholog results.
  4. Query levels β€” Progressive filtering from all orthologs (orth) through pathway-annotated (path), shared-pathway (same), to screen-hit subsets (size_mut1–6).

Data Sources

Ortholog Databases

Database Version Date
HomoloGene β€” 2014
orthoMCL 5 2013
InParanoid 8.0 December 2013
eggNOG 4.0 December 2013
COG/KOG β€” 2014
PomBase V2.19 2014

Pathway Databases

Database Purpose
KEGG Metabolic and signaling pathway annotations
Reactome Curated biological pathway annotations

Protein Identifiers

Database Version Purpose
UniProt 2014_04 Common identifier (KB accession) and gene name mapping

Project Structure

.
β”œβ”€β”€ index.php / main.php        # Entry points and routing
β”œβ”€β”€ _includes/                  # PHP core logic
β”‚   β”œβ”€β”€ mysql.php               # Database connection
β”‚   β”œβ”€β”€ functions.php           # FajlBeolvas + Lekeres classes (query engine)
β”‚   β”œβ”€β”€ mylog.php               # Visitor logging (CSV + GeoIP)
β”‚   └── page_*.php              # Page templates
β”œβ”€β”€ _dataset/                   # Ortholog and pathway data (CSV/TSV, ~40 MB)
β”‚   β”œβ”€β”€ ALL_ortholog_dbs_merged.csv   # 225K ortholog pairs (6 DBs)
β”‚   β”œβ”€β”€ kegg_pathways_uniprot.tsv     # KEGG annotations
β”‚   β”œβ”€β”€ reactome_pathways_uniprot.tsv # Reactome annotations
β”‚   └── *_interact_deg2_exp.csv       # Cell-size screen data per species
β”œβ”€β”€ _media/                     # Frontend assets (JS, CSS, images)
β”œβ”€β”€ _query/                     # Pre-computed JSON for DataTables
β”œβ”€β”€ _download/                  # Downloadable data files
β”œβ”€β”€ tests/                      # PHPUnit tests (32 tests)
└── work/                       # Development resources

Technology Stack

  • Backend: PHP 5.x (updated for PHP 8.2 compatibility in 2025)
  • Frontend: jQuery 1.11.2, jQuery DataTables 1.10.5
  • Database: MySQL 5.7+ (for GeoIP logging)
  • Testing: PHPUnit (32 tests: CSV parsing, config integrity, routing)

Testing

# Install dev dependencies
php composer.phar install

# Run all tests
php vendor/bin/phpunit

# Run with verbose output
php vendor/bin/phpunit --testdox

32 tests covering:

  • FajlBeolvasTest β€” CSV/TSV file parsing (faj, path, db, reg types)
  • IncludeValuesTest β€” Configuration values and static data integrity
  • PageRoutingTest β€” URL routing logic and include file mapping

Thesis

This tool was developed by ZoltΓ‘n Dul as part of his PhD research at King's College London (2013–2018):

"A system level approach to identify novel cell size regulators"

The thesis describes a systems biology strategy combining ortholog analysis, Gene Ontology annotation, protein-protein interaction networks, and high-throughput cell size screening data to identify novel regulators of cell size across eukaryotes.

Thesis: https://kclpure.kcl.ac.uk/portal/en/studentTheses/a-system-level-approach-to-identify-novel-cell-size-regulators/

References

  • Ashburner M, Ball CA, Blake JA, et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1):25–29. doi:10.1038/75556
  • Powell S, Forslund K, Szklarczyk D, et al. (2014). eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Research, 42(D1):D231–D239. PubMed: 24297252
  • The UniProt Consortium (2015). UniProt: a hub for protein information. Nucleic Acids Research, 43(D1):D99–D106. PubMed: 25348405
  • Li L, Stoeckert CJ Jr, Roos DS (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research, 13(9):2178–2189. PubMed: 12952885

History

Originally developed in 2014 at King's College London & Fondazione Edmund Mach as the first prototype of the ortholog query engine. Updated for PHP 8.2 compatibility in 2025. Superseded by the unified Ortholog Finder Tool v1.1.

Author

ZoltΓ‘n Dul King's College London, Randall Centre for Cell and Molecular Biophysics

License

Copyright (C) 2014–2018 ZoltΓ‘n Dul

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License v2 as published by the Free Software Foundation.

About

πŸ’» Archive of my bioinformatics tool for exploring evolutionarily conserved 🧬 proteins across five model organisms using orthology databases (eggNOG, inParanoid, Homologene etc.). πŸŽ“ Part of my PhD project, King's College London (2013-2019).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors