An R interface to i-ADHoRe 3.0 for detecting collinear (syntenic) regions within and between genomes.
i-ADHoRe identifies conserved gene order across chromosomes — evidence of ancient whole-genome duplications or shared ancestry between species. iadhoreR handles the full workflow from raw annotation files to parsed results:
GFF + FASTA → parse_gff()
run_diamond() → blast_to_families()
write_iadhore_config()
run_iadhore()
read_iadhore_output()
Follow these steps in order.
Download and install R (≥ 4.0) from https://cran.r-project.org.
RStudio is recommended as an IDE but not required.
If you do not already have conda, install Miniconda (lightweight) or Mambaforge (faster solver, recommended for bioinformatics).
All three tools — i-ADHoRe, DIAMOND, and MCL — can be installed via conda:
Option A: dedicated environment (recommended)
conda env create -f https://raw.githubusercontent.com/lizhencmb/iadhoreR/main/inst/conda/environment.yml
conda activate iadhoreROption B: install into an existing environment
conda activate my_existing_env
conda install -c lizhencmb -c bioconda -c conda-forge i-adhore diamond mclImportant: always activate the conda environment before opening R or RStudio, so that the tools are on the PATH. On macOS/Linux, open a terminal, activate the environment, then launch R or RStudio from that same terminal:
conda activate iadhoreR open -a RStudio # macOS rstudio & # Linux
With the conda environment active, open R and run:
install.packages("remotes")
remotes::install_github("lizhencmb/iadhoreR", build_vignettes = TRUE)library(iadhoreR)
check_tools()
#> External tool status:
#> i-adhore [OK] /path/to/conda/envs/iadhoreR/bin/i-adhore
#> diamond [OK] /path/to/conda/envs/iadhoreR/bin/diamond
#> mcl [OK] /path/to/conda/envs/iadhoreR/bin/mcl
#> mcxload [OK] /path/to/conda/envs/iadhoreR/bin/mcxload
#> mcxdump [OK] /path/to/conda/envs/iadhoreR/bin/mcxdump
#> All tools found. You are ready to use iadhoreR.If any tool shows [MISSING], run setup_instructions() in R for
troubleshooting guidance.
i-ADHoRe and MCL do not have native Windows builds. The recommended approach is Windows Subsystem for Linux 2 (WSL2), which runs a full Linux environment inside Windows and is fully supported.
Set up WSL2:
-
Open PowerShell as Administrator and run:
wsl --installThis installs WSL2 with Ubuntu. Restart your computer when prompted.
-
Open the Ubuntu app and install Miniconda inside WSL2:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh # follow the prompts, then restart the shell -
Install the tools and R inside WSL2:
conda env create -f https://raw.githubusercontent.com/lizhencmb/iadhoreR/main/inst/conda/environment.yml conda activate iadhoreR conda install -c conda-forge r-base
-
Inside WSL2 R, install iadhoreR:
install.packages("remotes") remotes::install_github("lizhencmb/iadhoreR", build_vignettes = TRUE)
Using RStudio on Windows with WSL2:
Install RStudio Desktop on Windows. RStudio automatically detects WSL2 and can use R installed inside it — see the Posit WSL2 guide for setup instructions.
Before parsing, inspect your annotation file to find the right feature type and attribute key for your data:
library(iadhoreR)
# See all feature types and attribute keys in the file
inspect_gff("species1.gff3")
# Auto-match GFF IDs against your protein FASTA and get recommended parameters
recommend_parse_gff("species1.gff3", "species1_proteins.fasta")
#> Best match: feature_type = "mRNA", id_attribute = "ID"
#> Matched: 27655 / 27655Use the feature_type and id_attribute values returned in the examples below.
work <- "my_analysis"
# 1. Parse GFF into gene lists (one file per chromosome)
sp1_lists <- parse_gff("species1.gff3",
output_dir = file.path(work, "sp1_lists"),
genome_name = "sp1",
feature_type = "mRNA",
id_attribute = "ID")
# 2. All-vs-all protein similarity search
run_diamond("species1_proteins.fasta",
output_file = file.path(work, "sp1.blast"),
threads = 8)
# 3. Cluster into gene families
blast_to_families(
blast_file = file.path(work, "sp1.blast"),
output_file = file.path(work, "families.txt"),
gene_list_files = unname(sp1_lists)
)
# 4. Write config and run i-ADHoRe
write_iadhore_config(
genomes = list(sp1 = sp1_lists),
blast_table = file.path(work, "families.txt"),
table_type = "family",
output_path = file.path(work, "output"),
file = file.path(work, "config.ini")
)
run_iadhore(file.path(work, "config.ini"))
# 5. Read results
results <- read_iadhore_output(file.path(work, "output"))
head(results$multiplicons) # syntenic regions
head(results$anchorpoints) # homologous gene pairswork <- "my_analysis"
# 1. Parse GFF for each species
sp1_lists <- parse_gff("species1.gff3",
output_dir = file.path(work, "sp1_lists"),
genome_name = "sp1",
feature_type = "mRNA",
id_attribute = "ID")
sp2_lists <- parse_gff("species2.gff3",
output_dir = file.path(work, "sp2_lists"),
genome_name = "sp2",
feature_type = "mRNA",
id_attribute = "ID")
# 2. All-vs-all search across both species (pass both FASTAs)
run_diamond(c("species1_proteins.fasta", "species2_proteins.fasta"),
output_file = file.path(work, "all_vs_all.blast"),
threads = 8)
# 3. Cluster into gene families (include both species' gene lists)
blast_to_families(
blast_file = file.path(work, "all_vs_all.blast"),
output_file = file.path(work, "families.txt"),
gene_list_files = c(unname(sp1_lists), unname(sp2_lists))
)
# 4. Write config and run i-ADHoRe
write_iadhore_config(
genomes = list(sp1 = sp1_lists, sp2 = sp2_lists),
blast_table = file.path(work, "families.txt"),
table_type = "family",
output_path = file.path(work, "output"),
file = file.path(work, "config.ini")
)
run_iadhore(file.path(work, "config.ini"))
# 5. Read results
results <- read_iadhore_output(file.path(work, "output"))
head(results$multiplicons) # syntenic regions
head(results$anchorpoints) # homologous gene pairsA step-by-step vignette using bundled Arabidopsis and Vitis example data is available after installation:
vignette("iadhoreR-tutorial", package = "iadhoreR")| Function | Description |
|---|---|
check_tools() |
Verify all external tools are on PATH |
setup_instructions() |
Print conda installation commands |
inspect_gff() |
Explore feature types and attributes in a GFF file |
recommend_parse_gff() |
Auto-detect best GFF parameters for your FASTA |
parse_gff() |
Create i-ADHoRe gene list files from a GFF |
run_diamond() |
All-vs-all protein similarity search |
blast_to_families() |
Cluster BLAST results into gene families via MCL |
parse_blast() |
Filter BLAST results into a gene-pair table |
write_iadhore_config() |
Write i-ADHoRe configuration file |
run_iadhore() |
Run i-ADHoRe |
read_iadhore_output() |
Read all output tables into a named list |
colinear_portions() |
Per-list colinearity percentages between genomes |
multiplicated_portions() |
Per-list duplication level breakdown |
iadhore_summary() |
Print a text summary of collinearity and duplication |
plot_dotplot() |
Synteny dot plot coloured by multiplicon/basecluster |
plot_multiplicon() |
Segment track diagram for a single multiplicon |
plot_genome_overview() |
Genome-wide stacked overview of multiplicon segments |
MIT