Impact of Data Error on Phylogenetic Network Inference from Gene Trees Under the Multispecies Network Coalescent
This script is used to simulate phylogenetic data, including gene trees and sequence alignments, using external tools such as ms, PhyloNet, and INDELible.
Before using the script, make sure the following tools are installed and accessible from your system path:
ms— simulates gene trees under a coalescent modelPhyloNet— performs species network simulations and phylogenetic network inferenceINDELible— simulates sequence evolution along trees with indelsMAFFT— for multiple sequence alignmentIQ-TREE— for gene tree inferenceFastSP— for comparing tree topologies using split distancenw_reroot— reroots Newick treesnw_ed— edits Newick trees
Make sure all executables are installed and their paths are added to your $PATH, or adjust your script accordingly.
This repository includes a custom PhyloNet.jar file that should be used instead of the official release. The original version of PhyloNet does not support computing the pseudo-likelihood of a true species tree given a set of gene trees, which is required for our experiments.
Make sure to use the provided PhyloNet.jar to ensure compatibility with all scripts and methods used in this study.
To install all required Python packages, run the following command from the project directory:
pip install -r requirements.txtYou need to configure the script to point to the correct locations of the required tools. In generate_data.py, set the root_folder variable to the absolute path containing the tool directories and files. Then, make sure the following paths are correctly set:
indelible_control_folder = root_folder + "INDELibleV1.03/"
phylonet = root_folder + "PhyloNet.jar"
ms_address = root_folder + "msdir/ms"
indelible_control_folder = root_folder + "INDELibleV1.03/"
iqtree_folder = root_folder + "iqtree/"
phylonet = root_folder + "PhyloNet.jar"
iqtree_pkg = root_folder + "iqtree-2.3.5-Linux-intel/bin/iqtree2"
mafft_pkg = "/shared/mt100/ml_env/bin/mafft"
mafft_result_addr = root_folder + "iqtree/result_mafft.txt"
iqtree_result_addr = root_folder + "iqtree/result.txt"
FastS_addr = root_folder + "FastSP/FastSP.jar"To run the inference, execute the function main_inference in main.py:
To plot the results, run the function main_plotting in main.py:
Because the functions run in parallel, make sure the address of nw_reroot, nw_ed, result_addr, and FastSP are correctly set in the functions run_iqtree_bootstrap, run_iqtree, and run_mafft.
The final_result.csv, which is a CSV file containing all the results, along with all the generated plots, can be found in the /results folder.