Skip to content

interCM/evolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evolution package includes files:
evolution.py - main script performing simulation of the evolution
monitor.py - auxiliary module responsible for logging
config.cfg - template configuration file
produceplots.py - a module for visualization of the simulation results

Package is written for python 3 and will not work with python 2.
Testing was done with python 3.5, however versions >= 3.3 should also work.


Next sections describe how to
> configure
> start
an evolution experiment
> and visualize
its outcome.


Configuration.
==============
The first thing you need to do after you came up with the idea of the
experiment is setting up of a corresponding configuration file. For this
purpose you can either edit an existing config.cfg file which is a part of the
package or create a new configuration file using config.cfg as a template.
All simulation related parameters should be included into configuration file.
The structure and syntax of configuration file should conform with the
specification of the python's configparser module.
Sections and parameters in the configuration file (see config.cfg):

[DEFAULT]
- This section is optional. If some required parameter from one of the next
  sections is absent the program will try to take this parameter from the
  DEFAULT section. If the program fails to find a required parameter in the
  configuration file it will be terminated with error.
  
[common]
interations - required (integer).
    Number of iterations to perform. Each iteration represents Moran-like
    reproduction round: one organism is eleminated from the population (either
    random or the oldest) and one organism is added to the population.
run_id - required (string).
    Name of the directory inside {runs_base_dir}, where all results
    corresponding to this run will be stored as well as runs's starting
    configuration, so you can easily repeat the simulation with the same
    setup.
runs_base_dir - required (string).
    Name of the directory where results of all runs are stored. If absent it
    will be created.
                
[environment]
primary_pos_base_weights - required (format:
                    1 : A 0.65, G 0.25, C 0.1, T 0
                    2-4,6,8 : A 0.7, G 0.1, C 0.1, T 0.1
                    5,7 : A 0.55, G 0.2, C 0.2, T 0
                     |    |   |
             positions  base  weight).
    The sum of positional base weights (pbw) over all positions determine the
    weight (fitness) of the organism in the population.
    Primary pbw will be applied to initialize all organisms in all populations
    and then will be used until the smallest number from the
    pos_base_weights_switch_at_iter is reached. After that pbw of all
    organisms in all populations are switched to the
    alternative_pos_base_weights. When the second smallest number from the
    pos_base_weights_switch_at_iter is reached all pbw are swetched back to
    the primary_pos_base_weights and so on.
    Provided positions of the pbw should cover a range of integers from 1 to n
    (n = 1, 2, ...) without gaps. If n is less than the length of genome
    (genome_len parameter) then weight to positions will be assigned
    circulary: pbw(i*n+k) = pbw(k) (k<n, i = 0,1, ...). So if only a base
    weights for position 1 is indicated then all position will recieve this
    weights.
update_origin_at_iter - optional (space delimited list of integers).
    Iterations at which origin genome of each organism in all populations
    will be updated to the current state of genome. Origin genome is used
    when the number of accumulated mutations is counted (i.e. the number of
    mutations for each organism is equal to the number of mismatches between
    current state of genome and origin genome). The parameter can be left
    empty, in this case initial state of genome (random sequence) will be used
    as an origin during the whole simulation.
pos_base_weights_switch_at_iter - optional (space delimited list of integers).
    Iterations at which positional base weights of all organisms in all
    populations will be swiched between primary and alternative positional
    base weights. If this parameter is not given only primary_pos_base_weights
    will be used during the simulation. 
alternative_pos_base_weights - optional (same as primary_pos_base_weights).

[population_1]
name - required (string, unique for each population).
    Name that will be used to refer to the population in the run_log.txt
    file and in the figure.
pop_size - required (integer).
    Number of organisms in the population.
genome_len - required (integer).
    Length of organisms' genome (the same for all organisms in the
    population).
mut_prob - required (0 <= float <= 1).
    Probability of mutation per positinon.
ti_prob - required (0 <= float <= 1).
    Probability that occured mutation will be a transition mutation.
sel_strength - required (0 <= float <= 1).
    Determines which top-weighted fraction of the populaiton is considered to
    select an organism that will reproduce at the current reproduction round.
    If recombination does not take place during the current reproduction round
    (whether it will happen or not is determined in the beginning of each
    round based on the rec_freq parameter) then it will be the only
    reproducing organism, otherwise a mating partner is required that will be
    selected based on the partner_sel_strength parameter.
children_number - required (integer).
    Number of offsprings generated per replication round. Only an offspring
    with the highest weight is retained in the population, others are ignored.
rec_freq - required (0 <= float <= 1).
    Probability that recombination event will take place during a round of
    replication. If recombination takes place then two organisms are selected
    from the population (the first organism is selected based on the
    sel_strength parameter, the second is selected based on the
    partner_sel_strength parameter and a sex of the first organisms if it is a
    sexual population) to produce children_number of offsprings. Each
    offspring is generating by random recombination and mutation of parantal
    genomes. If no recombination occurs then a single organism is selected
    based on the sel_strength parameter and offsprings are generated by
    introducing random mutations into the parantal genome. It is also worth
    noting that recombination can take place in the population with no sexual
    differentiation (i.e. has_sex = False).
    E.g. if rec_freq = 1, recombination will occure at each reproduction
    round. Other way round if rec_freq = 0 then recombination will never
    happen.
rec_rate - required (0 <= float <= 1).
    Probability of crossover event per base. If recombination takes place this
    parameter determines how frequently an offsprings' genome will be switched
    between genomes of his parents.
    E.g. if rec_rate = 1 then each next base in the genome of the offspring
    will be taken from different paren. If rec_rate = 0 then the whole genome
    of the offspring will be taken from the first parent (equivalent to
    setting rec_freq = 0).
partner_sel_strength - required (0 <= float <= 1)
has_sex - required (boolean).
    Determines whether there will be different sexes in the population or it
    will be asexual. This parameter is only important if recombination takes
    place, if so then the mating partner of the opposite sex is selected for
    reproduction. If has_sex = True but recombination is not happaning during
    the current reproduction round then any organism from the population,
    regardless of its sex, is allowed to produce offsprings.
male_number - required if has_sex is True otherwise will be ignored (integer).
    The number of males in the population (the rest pop_size - male_number
    will be females).
    Sexual compound of the population remains constant in the course of the
    simulation while the sex of retained offspring is selected equal to the
    sex of eliminated organism.
eliminate_oldest - required (boolean).
    If true the oldest organism is eliminated from the population after each
    reproduction round, otherwise random organism is eliminated. An age of the
    organism is equal to the number of iterations (reproduction rounds) it
    exists in the population.
    
[population_2]
Settings for the second population.
...

Theoretically nothing prevents you from running 100 different populations
simultaneously. However, it is not recommended to start simulations with more
than 5 different populations.
Detailed description of all simulation related parameters can be found in the
evolution.py file.


Starting simulation.
====================
When configuration file is constructed you can start simulation with:
$python3 evolution.py config.cfg
If everything goes fine [runs_base_dir]/[run_id] folder will be created. This
folder will contain a file with starting configuration (start.cfg) and
run_log.txt file, where data produced by the simulation will be saved. These
data contains normalized average weight, GI and number of mutations compared
to origin genome as well as weight after mutation, recombination and selection
for all offsprings generated during the iteration for each population at each
iteration.
Simulation progress will be shown in the command line.


Visualization of the results.
=============================
When simulation run is finished the data from the corresponding run_log.txt
file can be easily visualized with:
$python3 produceplots.py [runs_base_dir]/[run_id]
This will show you a figure representing the change of population parameters
in the course of the evolution. This figure will be also automatically saved
to the corresponding run directory ([runs_base_dir]/[run_id]).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages