dN/dS Analysis Pipeline

A Nextflow workflow for comprehensive evolutionary analysis using CodeML to detect selection pressure through dN/dS ratio calculations and likelihood ratio tests.

Overview

This NEXTFLOW workflow performs custom dN/dS analysis by building codon-aware alignments from orthologous sequences and testing multiple evolutionary models using CodeML. It implements branch-specific and subtree-specific models inspired by Zhang et al. 2019 to detect positive selection across phylogenetic trees.

Input Parameters

outdir = name of the results directory
A = focal nucleotide FASTA file with the query sequences
B = subject nucleotides FASTA file with the orthologs
includeA = boolean flag (default: false, unnecessary)
A_seqlist = a subset list of the focal FASTA queries
ortho = a directory with 2-column TSV files containing pairs of orthologs (should be named "{A}_vs_{anyB}_orthologs.tsv")
tree = a NEWICK tree of the genomes
ctl_file = should be "${projectDir}/ctl/codeml_one_omega_model.ctl" (others not tested)
style = should be 'branch_models' (others not tested)
fasta_batch_size = batch size for FASTA processing (default: 10000)
alignment_batch_size = batch size for alignment processing (default: 100)
codeml_batch_size = batch size for CodeML analysis (default: 10)

Usage

nextflow run <repository>/dNdS -profile <SINGULARITY-APPTAINER-DOCKER> --outdir <OUTDIR> --A <FOCAL_FASTA> --B <SUBJECT_FASTA> --ortho <ORTHO_DIR> --tree <NEWICK_TREE>

Container Requirements

The workflow is expected to be run with a container manager (Singularity, Apptainer, or Docker).

Workflow Strategy

Alignment Building

The pipeline builds codon alignments using the following strategy:

Dual alignment with MAFFT: Align both nucleotides and amino acids using translatorX.pl with '-p F' option
Codon alignment: Generate codon-aligned sequences using pal2nal.pl respecting amino acid alignment
Gap removal: Remove mostly gapped positions using remove_mostly_gapped_positions.py
Format conversion: Convert FASTA alignment to PHYLIP format for CodeML compatibility

# Align both nucleotides and AAs with MAFFT
translatorX.pl -i $ortho_fna -p F -o $orf

# Get codons aligned with respect to amino acids 
pal2nal.pl ${orf}.aa_ali.fasta ${orf}.nt_ali.fasta -output fasta > ${orf}_pal2nal.aln

# Remove mostly gapped positions
remove_mostly_gapped_positions.py ${orf}_pal2nal.aln ${orf}_clean.aln

# Convert to PHYLIP format for CodeML
FASTAtoPHYL.pl ${orf}_clean.aln $(grep -c ">" ${orf}_clean.aln) $(awk 'FNR==2 {print length}' ${orf}_clean.aln)

Evolutionary Model Testing

For each alignment, the workflow tests multiple evolutionary models:

Basic Models

1omega: Single omega (dN/dS) across entire tree
1omega_fixed1: Single omega fixed to 1

Branch-Specific Models (Zhang et al. 2019 approach)

2omega: Two-omega model with foreground branch having different omega
2omega_fixed1: Two-omega model with foreground omega fixed to 1

Advanced Models

free_omega: Free omega model allowing different omega for each branch

Statistical Analysis

Model Comparison

Likelihood ratio tests: Compare nested models to detect significant differences
AIC/BIC criteria: Evaluate model fit and complexity
Parameter estimation: Extract dN/dS ratios and confidence intervals

Selection Detection

P-value thresholds: Default significance level of 0.05
Bonferroni correction: Multiple testing correction available
Branch-specific selection: Identify specific lineages under selection

Output Files

Analysis Results

*_models.tsv_bonferroni_corrected
under_selection_bonferroni_corrected.txt

Dependencies

Nextflow >= 23.10.0
Singularity OR Apptainer OR Docker

Methodology Reference

The branch-specific and subtree-specific analysis approach is inspired by: Zhang et al. 2019. "The evolution of gene expression in cis and trans." Nature Ecology & Evolution. https://doi.org/10.1038/s41559-019-0822-5

Citation

If you use this pipeline in your research, please cite the associated publication and the Zhang et al. 2019 methodology reference.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bin		bin
conf		conf
ctl		ctl
modules		modules
subworkflows		subworkflows
workflows		workflows
LICENCE		LICENCE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dN/dS Analysis Pipeline

Overview

Input Parameters

Usage

Container Requirements

Workflow Strategy

Alignment Building

Evolutionary Model Testing

Basic Models

Branch-Specific Models (Zhang et al. 2019 approach)

Advanced Models

Statistical Analysis

Model Comparison

Selection Detection

Output Files

Analysis Results

Dependencies

Methodology Reference

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

Proginski/dNdS

Folders and files

Latest commit

History

Repository files navigation

dN/dS Analysis Pipeline

Overview

Input Parameters

Usage

Container Requirements

Workflow Strategy

Alignment Building

Evolutionary Model Testing

Basic Models

Branch-Specific Models (Zhang et al. 2019 approach)

Advanced Models

Statistical Analysis

Model Comparison

Selection Detection

Output Files

Analysis Results

Dependencies

Methodology Reference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages