Mixtum: a graphical tool for two-way admixture analysis in population genetics based on $f$ -statistics
Mixtum is a Python-based code that estimates ancestry contributions in a hybrid derived from a two-way admixture based on bi-allelic genotype data. The outcomes of Mixtum come from the geometric interpretation of the
- J.-A. Oteo and G. Oteo-García. The geometry of admixture in population genetics: the blessing of dimensionality. Genetics, 228(2):iyae134, 08 2024. ISSN 1943-2631. doi: 10.1093/genetics/iyae134. URL https://doi.org/10.1093/genetics/iyae134
- G. Oteo-García and J.-A. Oteo. A Geometrical Framework for f-Statistics. Bulletin of Mathematical Biology, 83:14, 2021. ISSN 1522-9602. doi: 10.1007/s11538-020-00850-8. URL https://doi.org/10.1007/s11538-020-00850-8
Mixtum has been developed as a graphical user interface (GUI) and as standalone Python script. Both versions use the same computation routines. Before running Mixtum, please ensure the required dependencies are met. Instructions to install them on Linux and Windows systems are given below. Optionally, use the Python package manager uv, available on Linux, Windows and macOS, to skip installing dependencies or creating virtual environments, and directly run Mixtum's GUI.
A GUI has been developed with Qt's PySide6 library. Please, get the source code and run Mixtum's GUI with the following command:
python mixtum_gui.py
Optionally, run it via uv:
uv run --with numpy --with pyside6 --with matplotlib mixtum_gui.py
To be run from the command line, this script needs some arguments which inform it where the needed input files are located, where to save the resulting output files and the number of parallel processes with which to perform the computations, among others. To replicate the computation done on the GUI, one can export the arguments required to run the analogous command-line script, along with the populations used.
To get help, please run the following command:
python mixtum.py --help
We give instructions to install the dependencies for Linux and Windows systems using a virtual environment and pip. We assume that Python 3 is available on your system.
First create a virtual environment under the .venv directory within Mixtum's location, where the required libraries will be installed, and activate it. Note that you can create the virtual environment on any location of your filesystem, and name it as you wish.
On Linux and Windows, open a terminal and enter Mixtum's directory. Then create the virtual environment:
python -m venv .venv
Afterwards, you should activate it. On Linux, this is done as follows:
source .venv/bin/activate
On Windows, the activation script to be run depends on the employed shell. If using Command Prompt:
.venv\Scripts\activate.bat
Whereas PowerShell requires running the following script:
.venv\Scripts\Activate.ps1
In case the execution policy of PowerShell does not allow the execution of scripts, you can enable it with the following command on a PowerShell run as administrator:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned
The standalone script depends on numpy and matplotlib. The graphical user interface also depends on pyside6. Please, install these libraries in the virtual environment with pip as follows.
pip install pyside6 numpy matplotlib
Mixtum is organized as a set of tabs, each of which contains a step of the workflow. Below these tabs, a log console displays information relevant to each step. The workflow may be described as follows.
It is compulsory to select the triad '*.geno, *.ind, *.snp'. They are in EIGENSTRAT or PACKEDANCESTRYMAP format. Optionally, you can select and load a plain text file with a list of populations of interest, in a column. Then, parse and check the input files. This may take several minutes depending on the size of the dataset and the CPU.
The left table contains all the population names in '*.ind'. They may be ordered alphabetically by clicking on the table's header, and selected/deselected with the mouse left button. Note that clicking while pressing the SHIFT key allows you to select a range of populations, or pick specific ones with the CTRL key. The combination CTRL+A allows the selection of all populations.
After searching and choosing populations of interest, select the number of computation processes to parallelize the frequencies computation. Check how many cores your CPU has to tune this parameter. Then compute the table of allele frequencies on which all the f-statistics are computed or, equivalently, on which all the scalar products are carried out.
Choose the admixture model and the set of auxiliary populations (at least 4, or 8 if bootstrap is enabled) on the left side of the tab. Then compute the results. Ensure a reasonable stability of the outcomes with respect to the cardinal of the set. You can then examine the linear regression plot of the admixture model, and identify the auxiliary pairs by clicking the corresponding plot points. A glimpse may already inform you about the quality of the admixture model. Examine the angles, too. A good ancestral admixture reconstruction conveys the Post-JL angle is close to 180 degrees. A decreasing of the angle from Pre-JL to Post-JL points towards a bad admixture model.
Note that the bootstrap process performs random selections of auxiliary populations, so slight variations are expected for the error bar for different executions.
Choose populations of interest to visualize their allele frequencies in terms of the Principal Components content in 2D and 3D. You can identify which points on the plots correspond to which population by clicking on them, or alternatively, by selecting populations on the second table, which contains those used in the PCA computation.
Compute f2, f3 and f4 for specific combinations of populations. The statistics f3 and f4 can be assigned to an angle in the interval [0,180] deg. which gives them a scale for comparison purposes between different combinations of populations.
A step-by-step, how-to tutorial is available with an example of usage with detailed instructions. You can find it here.
The notes below are meant for developers of Mixtum. If you are a user you can safely ignore them.
We can make use of pyinstaller to build self-contained standalone executables for several operating systems. Note that the executable for each operating system must be built under that particular operating system, no cross-compiling is possible with this tool.
First install pyinstaller in the virtual environment that contains the dependencies of Mixtum.
pip install pyinstaller
Then, if under Linux run it in the project's root directory as follows.
pyinstaller --onefile mixtum_gui.py
Or if under Windows,
pyinstaller --onefile --windowed mixtum_gui.py
to avoid providing a console window.
If everything works fine, the executable can be found in the dist subdirectory.
