Skip to content

GADMA slow for 5 population model #103

@dmacguigan

Description

@dmacguigan

Hello,

I'm attempting to run a 5 population model using the Moments engine with SMAC_BO_combination Bayesian optimization. GADMA has been running for 50 hours but still reports No models yet. I'm running this on a cluster compute node with 8 cores and 640 GB of RAM.

Is this long runtime normal for a larger model? Or is there something I can do to help speed things up?

Custom model file:

import moments
import numpy as np


def model_func(params, ns):
    """
    Five population demographic history for the species complex
    populations:
    A - Acin Elk 
    B - Acin upper Tennessee 
    C - Aana
    D - Amay Cumberland basin
    E - Amay Big South Fork

    All pops have constant population size.
    Order of splits in this model are based on prior phylogenetic results
    Ancestral population was split (t1 + t2) time ago into two new
    populations: AB and CDE. Next, CDE population was split into C and DE.
    Then AB then split into two populations, A and B. Finally, DE splits
    into two populations, D and E. There is symmetrical migration between pops A, B,
    C, D, and E after the final split. But there is no migration between
      ancestral populations.

    :param nAB: Size of ancestral population AB after earliest split
    :param nDCE: Size of ancestral population DCE after earliest split
    :param nA: Size of extant population A
    :param nB: Size of extant population B
    :param nC: Size of extant population C
    :param nDE: Size of ancestral population DE
    :param nD: Size of extant population D
    :param nE: Size of extant population E
    :params mAB, mAC, mAD, mAE, mBC, mBD, mBE, mCD, mCE, mDE: migration rates between each pair of extant populations
    :param t1: Time between ancestral population split to C - DE split 
    :param t2: Time between C - DE split and A - B split
    :param t3: Time between A - B split and D - E split
    :param t4: Time between D - E split to the present
    
    """

    nAB, nDCE, nA, nB, nC, nDE, nD, nE, mAB, mAC, mAD, mAE, mBC, mBD, mBE, mCD, mCE, mDE, t1, t2, t3, t4 = params
    sts = moments.LinearSystem_1D.steady_state_1D(sum(ns))
    fs = moments.Spectrum(sts)

    # first split: AB - CDE
    fs = fs.split(idx = 0, n0 = ns[0] + ns[3], n1 = ns[1] + ns[2] + ns[4])
    fs.integrate(Npop=[nAB, nDCE], tf=t1)                                 

    # second split: C - DE
    fs = fs.split(idx = 1, n0 = ns[1], n1 = ns[2] + ns[4])
    fs.integrate(Npop=[nAB, nC, nDE], tf=t2)

    # third split: A - B
    fs = fs.split(idx = 0, n0 = ns[0], n1 = ns[3])
    fs.integrate(Npop=[nA, nC, nDE, nB], tf=t3)

    # fourth split: D - E
    fs = fs.split(idx = 2, n0 = ns[2], n1 = ns[4])

    # Full 5x5 migration matrix
    mig_mat = [
        [0,   mAC, mAD, mAB, mAE],
        [mAC, 0,   mCD, mBC, mCE],
        [mAD, mCD, 0,   mBD, mDE],
        [mAB, mBC, mBD, 0,   mBE],
        [mAE, mCE, mDE, mBE, 0  ]
    ]

    fs.integrate(Npop=[nA, nC, nD, nB, nE], tf=t4, m=mig_mat)

    return fs(base)

Parameters file:

Input file: easySFS/output/dadi/A-C-D-B-E.sfs
Output directory: results/model_5pops_01_quick
Custom filename: models/model_5pops_01.py

Num init const: 2
Global optimizer: SMAC_BO_combination
Global maxeval: 200

Number of repeats: 8
Number of processes: 8

Linked SNP's: False

Engine: moments

# NEED TO REFINE THIS
Mutation rate: 2.35e-08

# NEED TO FIND REFERENCE FOR THIS
Time for generation: 2.0

#   Effective length of the sequence used to build the SFS data.
#   This should be used together with the Mutation rate and can be replaced
#   by the Theta0 setting.
#   Default: None
# https://gadma.readthedocs.io/en/latest/faq.html#how-can-i-get-effective-length-of-sequence-l
# Assume total length of sequence that was used for SFS building is equal to Nseq.
# From this data total number of X SNP’s were received.
# But not all of them were used for SFS: some of them (e.g. total number of Y SNP’s) were filtered out.
# Then we should count filtered SNP’s and take L value the following way:
# L = (X - Y) / X * Nseq
# in our case:
# X = 171832 [total number of variable sites in m80p _stats.txt file]
# 16823 SNPs in the m80p unlinked VCF, so 
# Y = 155009
# so L = ((171832 - 155009) / 171832) * 3995963) = 391220
Sequence length: 391220

#    Engine that will draw demographic model plots.
#    Can be moments or demes.
#    Default: moments
Model plot engine: demes

Units of time in drawing: thousand years

Thanks for your help,
Dan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions