diff --git a/docs/Algorithms.rst b/docs/Algorithms.rst index 68f5f63..3190752 100644 --- a/docs/Algorithms.rst +++ b/docs/Algorithms.rst @@ -117,6 +117,7 @@ The "star" topology is the only implementation currently available in pyDarwin. sort of star shape, up to the number of neighbors specified in :ref:`neighbor_number `. +.. _moo-label: .. _MOO_desc: **************************** @@ -136,9 +137,36 @@ This set is then presented to the user, who selects one or more preferred models considerations such as biological plausibility and diagnostic plots. A detailed case study of applying pyDarwin to PopPK model selection using these methods is provided in [#f8]_. +Imagine you are deciding what car to buy. You have two objectives: you would like it to be fast (measured as lap time around the +Nürburgring track) and inexpensive (measured as cost). In addition to these numerical (objective) criteria, you also have subjective +criteria such as comfort and how fun the car is to drive. The subjective criteria cannot easily be captured numerically, and even for +the objective criteria it can be difficult to combine them into a single scalar "goodness" value using penalties, for example: + +.. math:: + + Car \enspace \textit{"goodness"} = Nürburgring \enspace time \enspace (seconds) + \frac{Cost}{500} + +In this example the penalty on cost is cost divided by 500. The value 500 is subjective and somewhat arbitrary; another person would +likely choose a different value, and you might choose a different value after your yearly bonus. If that penalty changes after an +optimization has been run, the optimization would need to be repeated. Multi-objective optimization avoids this arbitrariness by +optimizing several objectives simultaneously and presenting a set of "non-dominated" (Pareto optimal) solutions to the user. For each +solution on the Pareto front there is no other solution that is better in every objective, and dominated solutions (worse in all +objectives than some other solution) can be discarded. + +The MOO results visualize the trade-offs between the objectives, allowing users to revisit and discuss preferred solutions as +preferences change. A representative plot for the trade-off between two objectives (speed and cost) is depicted below for car +performance and cost: + +.. image:: ParetoFrontCars.jpeg + :alt: Car performance vs cost + :width: 700px + :align: center + pyDarwin implements multi-objective optimization using genetic algorithms. The currently available algorithms are -MOGA (based on NSGA-II [#f6]_) and MOGA3 (based on NSGA-III [#f7]_), described in more detail below. These algorithms are configured -through the :ref:`Options List` (for example, via the ``algorithm`` option and user-specified objective definitions). +MOGA (based on NSGA-II [#f6]_) and MOGA3 (based on NSGA-III [#f7]_), described in more detail below. Additional background on +MOGA3 post-processing with user-defined R/Python code is provided in :ref:`MOGA3 post-processing `. +These algorithms are configured through the :ref:`Options List` (for example, via the ``algorithm`` option and +user-specified objective definitions). .. _MOGA_desc: diff --git a/docs/Example10.rst b/docs/Example10.rst new file mode 100644 index 0000000..aea2217 --- /dev/null +++ b/docs/Example10.rst @@ -0,0 +1,740 @@ +.. _startmoga10: + +########################################################### +Example 10: MOGA3 +########################################################### +MOGA3 requires user defined code. The template.txt and tokens.json files are identical to those used for other single +objective algorithms. For additional background on general principles of MOGA3 post-processing and user-defined R/Python code, +see :ref:`MOGA3 post-processing `. + + + +Partitions and population size +-------------------------------- + +In theory, with more than two or more continuous objective, there can be an infinite number of non-dominated +solutions. Keep in mind that to be dominated requires that ALL objective in some other solution are better. +One can imagine a set of solutions with 2 continuous objectives with a list of objectives with values +alternating with decriments of 0.000001, such that all are non-dominated. The solution to this problem, as +well as to improve performance, is to partition the search space such that the direction of the search +is divided into a finite number of directions. The number of parititions will influence the number of +non-dominated returned by the algorithm, with the number of non-dominated being (roughly) equal to or +less than the number of partitions. The non-dominated solutions returned will typically be distributed +throughout the search space. That is, the you are unlikely to get all 2 compartment models with additive +residual error, and just varying by covariates. More likely there will be a mix of 1 and 2 compartment models, +with and without ALAG, different residual errors and a range of covariates. + +The number of partitions possible will be limited by the population size. A useful (although very technical) +guide can be found here: +`Cheng et al `_. + +Practically, the population size will be 50-100 (potentially larger with more complex searchs). The number of +partitions will then be driven by the number of non-dominated models that the user would like returned for +further consideration. Perhaps 6-12 partitions might typically be used. + +The number of paritions is set in the options.json file (`Options.json `_). +An example is below: + + +.. code-block:: JSON + + { + "MOGA" : { + "attribute_mutation_probability" : 0.1, + "constraints" : 1, + "crossover" : "single", + "crossover_rate" : 0.95, + "mutation_rate" : 0.95, + "names" : [ + "OFV", + "Nparms", + "NFailedRSEs" + ], + "objectives" : 3, + "partitions" : 6 + }, + "algorithm" : "MOGA3", + "postprocess": { + "use_r": false, + "use_python": true, + "post_run_python_code": "{project_dir}/RSE3penalty.py", + "r_timeout": 30 + }, + "author" : "Certara", + "downhill_period" : 3, + "engine_adapter" : "nlme", + "final_downhill_search" : true, + "gcc_dir" : "C:\\Program Files\\Certara\\mingw64", + "keep_extensions" : ["csv"], + "keep_files" : ["res.csv","residuals.csv"], + "local_2_bit_search" : false, + "model_run_timeout" : 1200, + "niche_radius" : 2, + "nlme_dir" : "C:\\Program Files\\Certara\\NLME_Engine", + "num_generations" : 6, + "num_niches" : 2, + "num_parallel" : 1, + "population_size" : 30, + + "project_name" : "MOGA3", + "random_seed" : 51424319, + "rscript_path" : "C:/Program Files/R/R-4.5.1/bin/Rscript.exe", + "working_dir" : "{project_dir}" + } + + + + +Post Run R code +--------------- +It is important to insure that R returns (to standard output) the correct number of value in all cases. +This requires using the suppres? function for all other output, e.g., to load the xpose package, the +syntax would be: + +.. code-block:: R + + suppressPackageStartupMessages(library(xpose)) + + +and to load the NONMEM output into an xpose data base: + +.. code-block:: R + + suppressWarnings(xpdb <- xpose_data(file = lstfile, quiet = TRUE)) + +NONMEM +--------------- + +NONMEM Post Run R code +~~~~~~~~~~~~~~~~~~~~~~~~ +Below is R code to return 3 objectives (OFV, n parameters and the number of parameter estimates with RSE > 0.5), +and a constraint if anything goes wrong (0 = feasible solution, 1 = infeasible solution). +Note again that this is a script, not a function. + +.. code-block:: R + + suppressPackageStartupMessages(library(dplyr)) + suppressPackageStartupMessages(library(xpose)) + suppressPackageStartupMessages(library(stringr)) + suppressWarnings({ + suppressMessages({ + OFV <- nparms <- nfailedRSEs <- 99999 + tryCatch({ + lstfile <- list.files(path = ".", + pattern = "lst$") + if(length(lstfile) == 0){ + print(c(OFV, nparms, nfailedRSEs)) + print(c(1)) + } else{ + suppressWarnings({ + xpdb <- xpose_data(file = lstfile, quiet = TRUE) + }) + res <- get_summary(xpdb) + OFVpos <- which(res$descr == 'Objective function value') + OFV <- res$value[OFVpos] + if(OFV == "na") OFV <- 99999 + res <- tryCatch( + prm <- get_prm(xpdb, quiet = TRUE), + error = function(e) e + ) + if (inherits(res, "error")) { + print(c(OFV, nparms, nfailedRSEs)) + print(c(1)) + } else{ + nparms <- length(prm$name) + values <- prm$value + ses <- prm$se + if(!is.na(ses[1])){ + rses <- ses/values + nfailedRSEs <- sum(rses > 0.5) + }else{ + nfailedRSEs <- nparms + } + print(c(OFV, nparms, nfailedRSEs)) + print(c(0)) + } + } + }, error = function(){ + print(c(OFV, nparms, nfailedRSEs)) + print(c(1)) + }) + }) + }) + + + +NONMEM Post Run python code - post_process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Note that the pharmpy-core package is required for post run NONMEM code. + + +.. code-block:: python + + import math + import numpy as np + import os + import re + from pharmpy.tools import * + import warnings + import sys + def read_nmoutput(rundir): + """ + uses pharmpy function to read nonmem outputd return OFV, nparms and n failed RSEs. + Parameters + ---------- + rundir : pyDarwin model run directory + + Returns + ------- + OFV, nparms, nfailedRSEs + + """ + + try: + ofv = nparms = nfailedRSEs = 99999 + mod_file = None + for filename in os.listdir(rundir): + if filename.endswith(".mod") and filename.startswith("NM_") and os.path.isfile and filename[3].isdigit(): + os.path.join(rundir, filename) + mod_file = os.path.join(rundir, filename) + break + if mod_file is None: + return [ofv, nparms, nfailedRSEs], [1] + else: + warnings.filterwarnings("ignore") #pharmpy function throws warning about parameter name THETA etc duplicated + res = read_modelfit_results(mod_file) + warnings.filterwarnings("default") + ofv = getattr(res, "ofv", 9999999) + if np.isnan(ofv): + ofv = 99999 + parms = getattr(res, "parameter_estimates", 999999) + nparms = len(parms) + if getattr(res, 'covstep_successful'): + rses = getattr(res,'relative_standard_errors') + nfailedRSEs = sum(1 for item in rses if item > 0.5) + else: + nfailedRSEs = nparms + return ofv, nparms, nfailedRSEs + except: + return ofv, nparms, nfailedRSEs + + + + + def post_process(rundir): + """post run processing for MOGA3 in python. post_process2 takes the run directory as the argument, return + ofv, number of parameters and number of parameters with RSE > 0.5. Excepteion return available values + and constraint = 1 + + Parameters + ---------- + rundir : pyDarwin model run directory + + Returns + ------- + objectives, constraint [float, float, float],[constraint] + + + Examples + -------- + post_process(rundir) + [10.2, 8, 1],[0] + """ + + try: + ofv, nparms, nfailedRSEs = read_nmoutput(rundir) + return [ofv, nparms, nfailedRSEs], [0] + except Exception as e: + return [ofv, nparms, nfailedRSEs], [1] + + if __name__ == '__main__': + print(post_process(sys.argv[1])) + + +the + +.. code-block:: python + + if __name__ == '__main__': + print(post_process(sys.argv[1])) + + +code is again included so that it can be run from command line for checking. An example of code +to run from command line is: + + +.. code-block:: console + + python RSE3penalty.py temp/1/01 + ([np.float64(5668.41945648538), 11, 11], [0]) + + +NONMEM Post Run python code - post_process2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + +The pharmpy-core package is required for the python code below. while some results are available in the run +object (OFV, number of parameters), others are not (RSEs in this case). The run directory (run_dir in the results +object) is then used to identify the NM_*.mod file. The read_modelfit_results function in pharmpy-tools is then +called with the .mod file to return the requred RSEs. It is not possible to run the post_process2 code from command +line. + + +.. code-block:: python + + import math + import numpy as np + import os + import re + from pharmpy.tools import * + import warnings + def read_nmoutput(rundir): + """ + uses pharmpy function to read nonmem outputd return OFV, nparms and n failed RSEs. + Parameters + ---------- + rundir : pyDarwin model run directory + + Returns + ------- + OFV, nparms, nfailedRSEs + + """ + + try: + ofv = nparms = nfailedRSEs = 99999 + mod_file = None + for filename in os.listdir(rundir): + if filename.endswith(".mod") and filename.startswith("NM_") and os.path.isfile and filename[3].isdigit(): + os.path.join(rundir, filename) + mod_file = os.path.join(rundir, filename) + break + if mod_file is None: + return [ofv, nparms, nfailedRSEs], [1] + else: + warnings.filterwarnings("ignore") #pharmpy function throws warning about he parameter name THETA is duplicated. + res = read_modelfit_results(mod_file) + warnings.filterwarnings("default") + ofv = getattr(res, "ofv", 9999999) + if np.isnan(ofv): + ofv = 99999 + parms = getattr(res, "parameter_estimates", 999999) + nparms = len(parms) + if getattr(res, 'covstep_successful'): + rses = getattr(res,'relative_standard_errors') + nfailedRSEs = sum(1 for item in rses if item > 0.5) + else: + nfailedRSEs = nparms + return ofv, nparms, nfailedRSEs + except: + return ofv, nparms, nfailedRSEs + + + + + def post_process2(run): + """post run processing for MOGA3 in python. post_process2 takes the run object as the argument, return + ofv, number of parameters and number of parameters with RSE > 0.5. Excepteion return available values + and constraint = 1 + + Parameters + ---------- + run : pyDarwin model + + Returns + ------- + objectives, constraint [float, float, float],[constraint] + + + Examples + -------- + post_process(run) + [10.2, 8, 1],[0] + """ + + try: + # read ofv from run, just to demonstrate + results = getattr(run, "result") + ofv = getattr(results, 'ofv') + # but rses are not in run, need to use pharmpy to read output + rundir = getattr(run, "run_dir") + ofv, nparms, nfailedRSEs = read_nmoutput(rundir) + return [ofv, nparms, nfailedRSEs], [0] + except Exception as e: + return [ofv, nparms, nfailedRSEs], [1] + + + + +NLME +--------------- + + +NLME Post Run R code +~~~~~~~~~~~~~~~~~~~~~~~~ +For R post processing code in NLME, the key output is the dmp.txt file. This file, when +sourced in R (e.g., source("dmp.txt")) generates a dmp.txt object that contains most of the +key output for very easy access. +The code below returns 3 objectives and one constraint. As for the NONMEM examples, the +first two objectives are OFV and number of parameters. The third objective is a CmaxPenalty +for the mean absolute percent bias is Cmax, abs(%(Observed - predicted)) * 10. +This script also returns a constraint of 1 (infeasible) if any RSE values are > 0.5. + +.. code-block:: R + + suppressPackageStartupMessages(suppressMessages(library(dplyr))) + suppressPackageStartupMessages(suppressWarnings(library(stringr))) + suppressPackageStartupMessages(suppressWarnings(library(readtext))) + suppressPackageStartupMessages(suppressWarnings(library(tidyr))) + + GetCmaxPenalty <- function(CrashPenalty){ + tryCatch({ + source("dmp.txt") + preds <- dmp.txt$residuals %>% + select(ID5, IPRED, DV) %>% + group_by(ID5) %>% + summarise(ObsCmax = max(DV), PredCmax = max(IPRED)) %>% + mutate(Diff = 100*(ObsCmax-PredCmax)/ObsCmax) %>% + summarise(MeanAbsDiff = mean(Diff)) + return(10*abs(preds$MeanAbsDiff)) + + }, error = function(){ + return(CrashPenalty) + }) + } + + OFV <- nParms <- CmaxPenalty <- CrashPenalty <- 99999 + tryCatch({ + if(!file.exists("dmp.txt")){ + print(c(CrashPenalty,CrashPenalty,CrashPenalty)) + print(c(1)) + }else{ + source("dmp.txt") + OFV <- dmp.txt$logLik * (-2) + CmaxPenalty <- GetCmaxPenalty(CrashPenalty) + nParms <- dmp.txt$nParm + if(!file.exists("nlme7engine.log")){ + print(c(OFV, nParms, CmaxPenalty)) + print(c(1)) + }else{ + log <- readLines("nlme7engine.log") + ScoresuccessLine <- grep("external_coords", log) + if (length(ScoresuccessLine) == 0){ + print(c(OFV, nParms, CmaxPenalty)) + print(c(1)) + }else{ + parms <- data.frame(matrix(-99, nrow = nParms, ncol= 5)) + ScoresuccessLine <- ScoresuccessLine[length(ScoresuccessLine)] + line <- str_trim(log[ScoresuccessLine[1]]) + colnames(parms) <- strsplit(line, "\\s{2,}")[[1]] + start <- ScoresuccessLine[1] + 1 + for(this_parm in 1:nParms){ + line <- str_trim(log[start + this_parm]) + parms[this_parm,] <- strsplit(line, "\\s{2,}")[[1]] + } + parms <- parms %>% mutate(rel_std_err= as.numeric(rel_std_err)) + if(any(parms$rel_std_err > 0.5)){ + print(c(OFV, nParms, CmaxPenalty)) + print(c(1)) + }else{ + print(c(OFV,nParms, CmaxPenalty)) + print(c(0)) + } + } + } + } + }, + error = function(a){ + print(c(OFV, nParms, CmaxPenalty)) + print(c(1)) + } + ) + + +.. _moga3-nlme_post_process: + + +NLME Post Run python code - post_process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Using a post_process function is the most practical way to develop post run python code. The argument +to the post_process function is the nlme run folder, e.g., ./temp/1/01. + + +Most of the typically need information is in the err1.txt file. Unfortunately, there isn't an +equivalent to the dmp.txt file for R in python. The err1.txt must be read and parsed to extract +the model information. Below is example code to return objectives of OFV, number of estimated +parameters, and number of failed RSEs and one constraint for exceptions (1). + + +.. code-block:: python + + import os + import time + import re + import pandas as pd + import sys + + def get_external_coords(rundir): + """read the err1.txt file in the run directory, find the parameter values and SEs and return the number of + failed RSE (se/value> 0.5). + + Parameters + ---------- + rundir : string + run directory + + Returns + ------- + int + number of parameters with RSE > 0.5 + + Examples + -------- + get_external_coords('c:/pyDarwinRun/temp/1/01) + 4 + """ + try: + ## se and parameter estimates in err1.txt + ## use sandwich + ## start read at " Standard errors of estimated parameters" + logfile = os.path.join(rundir, 'err1.txt') + with open(logfile) as file: + lines = file.readlines() + for index, line in enumerate(lines): + if " Standard errors of estimated parameters" in line: + start = index + break + ## find next " external_coords" + lines = lines[index:] + for index, line in enumerate(lines): + if " external_coords" in line: + start = index + break + lines = lines[(index + 2):] + # get end at " std error time=" + + for index, line in enumerate(lines): + if re.search(" std error time=", line): + end = index + break + lines = lines[:end] + external_coor = [] + nFailedRSEs = 0 + nparms = 0 + # Iterate through each string in the array, test if se/value > 0.5 + for line in lines: + split_words = line.split() + external_coor.append(split_words) + value = float(split_words[1]) + se = float(split_words[3]) + nparms += 1 + if se / value > 0.5: + nFailedRSEs += 1 + + return nFailedRSEs, nparms + + except Exception as e: + return 999999 + + def get_ofv(rundir): + """read the err1.txt file in the run directory, find the parameter values and SEs and return the number of + failed RSE (se/value> 0.5). + + Parameters + ---------- + rundir : string + run directory + + Returns + ------- + float OFV values + int number of parameters + + Examples + -------- + >>> get_ofv_nparms('c:/pyDarwinRun/temp/1/01) + [2345.43, 8] + """ + try: + ## se and parameter estimates in err1.txt + ## use sandwich + ## start read at " Standard errors of estimated parameters" + logfile = os.path.join(rundir, 'err1.txt') + with open(logfile) as file: + lines = file.readlines() + for index, line in enumerate(lines): + if " -2*Loglikelihood=" in line: + start = index + break + ofv = float(lines[start + 1]) + return ofv + except Exception as e: + return 999999 + + + def post_process(rundir): + """post run processing for MOGA3 in python. post_process2 takes the run directory as the argument, return + ofv, number of parameters and number of parameters with RSE > 0.5. Excepteion return available values + and constraint = 1 + + Parameters + ---------- + rundir : pyDarwin model run directory + + Returns + ------- + objectives, constraint [float, float, float],[constraint] + + + Examples + -------- + post_process2(run) + [10.2, 8, 1],[0] + """ + ofv = 999999 + nparms = 999999 + nfailedRSEs = 999999 + try: + ofv = get_ofv(rundir) + nfailedRSEs, nparms = get_external_coords(rundir) + return [ofv, nparms, nfailedRSEs], [0] + except Exception as e: + return [ofv, nparms, nfailedRSEs], [1] + + if __name__ == '__main__': + print(post_process(sys.argv[1])) + +The last section of code: + +.. code-block:: python + + if __name__ == '__main__': + print(post_process(sys.argv[1])) + + +is to permit calling the python code from command line. +The command line for calling from the project directory and return values, assuming the temp folder is called temp is: + + +.. code-block:: console + + python RSE3penalty.py .\temp\1\01 + ([8438.103, 7, 3], [0]) + + +.. _moga3-nlme_post_process2: + +NLME Post Run python code - post_process2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Below is python code for OFV, nparms and nFailedRSE using post_process2. +Note that is it not possible to run post_process2 code from command line as the argument is the pyDarwin run object. + + +.. code-block:: python + + import os + import time + import re + import pandas as pd + + + def get_external_coords(rundir): + """read the err1.txt file in the run directory, find the parameter values and SEs and return the number of + failed RSE (se/value> 0.5). + + Parameters + ---------- + rundir : string + run directory + + Returns + ------- + int + number of parameters with RSE > 0.5 + + Examples + -------- + >>> get_external_coords('c:/pyDarwinRun/temp/1/01) + 4 + """ + try: + ## se and parameter estimates in err1.txt + ## use sandwich + ## start read at " Standard errors of estimated parameters" + logfile = os.path.join(rundir, 'err1.txt') + with open(logfile) as file: + lines = file.readlines() + for index, line in enumerate(lines): + if " Standard errors of estimated parameters" in line: + start = index + break + ## find next " external_coords" + lines = lines[index:] + for index, line in enumerate(lines): + if " external_coords" in line: + start = index + break + lines = lines[(index + 2):] + # get end at " std error time=" + + for index, line in enumerate(lines): + if re.search(" std error time=", line): + end = index + break + lines = lines[:end] + external_coor = [] + nFailedRSEs = 0 + # Iterate through each string in the array, test if se/value > 0.5 + for line in lines: + split_words = line.split() + external_coor.append(split_words) + value = float(split_words[1]) + se = float(split_words[3]) + if se / value > 0.5: + nFailedRSEs += 1 + + return nFailedRSEs + + except Exception as e: + return 999999 + + + def post_process2(run): + """post run processing for MOGA3 in python. post_process2 takes the run as the argument, return + ofv, number of parameters and number of parameters with RSE > 0.5. Excepteion return available values + and constraint = 1 + + Parameters + ---------- + run : pyDarwin run + + Returns + ------- + objectives, constraint [float, float, float],[constraint] + + + Examples + -------- + >>> post_process2(run) + [10.2, 8, 1],[0] + """ + ofv = nparms = nfailedRSEs = 999999 + try: + result = getattr(run, 'result') + model = getattr(run, 'model') + ntheta = getattr(model, 'theta_num') + nomega = getattr(model, 'omega_num') + nsigma = getattr(model, 'sigma_num') + nparms = ntheta + nomega + nsigma + ofv = getattr(result, 'ofv') + nfailedRSEs = get_external_coords(getattr(run, 'run_dir')) + return [ofv, nparms, nfailedRSEs], [0] + except Exception as e: + return [ofv, nparms, nfailedRSEs], [1] + + + + diff --git a/docs/Example9.rst b/docs/Example9.rst new file mode 100644 index 0000000..2525e76 --- /dev/null +++ b/docs/Example9.rst @@ -0,0 +1,89 @@ +.. _startmoga9: + +########################################################### +Example 9: MOGA +########################################################### +The MOGA option, specified in the options.json file directs pyDarwin to use the NSGA-II algorithm for selecting +non-dominated models. No user defined code is required or accepted. Two objectives are used, the OFV and the +total number of parameters (esimtated thetas, omega elements and, for NONMEM, sigma). The template.txt and +tokens.json file are idential to those for other algorithms. Note that there are no penalties, as there isn't +a composite fitness function. + +Options.json +------------ +The options.json file specified the MOGA algorithm and options. NONMEM and NLME are specified identically +with regard to the MOGA options. An example for NONMEM is given below: + + .. code-block:: JSON + + { + "MOGA" : { + "attribute_mutation_probability" : 0.1, + "crossover" : "single", + "crossover_rate" : 0.95, + "mutation_rate" : 0.95, + "partitions" : 6 + }, + "algorithm" : "MOGA", + "author" : "Certara", + "downhill_period" : 3, + "engine_adapter" : "nonmem", + "final_downhill_search" : true, + "keep_extensions" : [], + "keep_files" : [], + "local_2_bit_search" : false, + "model_run_timeout" : 1200, + "niche_radius" : 2, + "local_grid_search" : false, + "num_generations" : 6, + "num_niches" : 2, + "num_parallel" : 4, + "population_size" : 30, + "nmfe_path": "c:/nm75g64/util/nmfe75.bat", + "project_name" : "MOGA-nonmem", + "random_seed" : 51424319, + "working_dir" : "{project_dir}" + } + +Note that the path to rscript.exe is not required for MOGA with NONMEM. +A comparable options.json file for NLME is given below + +.. code-block:: JSON + + { + "MOGA" : { + "attribute_mutation_probability" : 0.1, + "crossover" : "single", + "crossover_rate" : 0.95, + "mutation_rate" : 0.95, + "partitions" : 6 + }, + "algorithm" : "MOGA", + "author" : "Certara", + "downhill_period" : 3, + "engine_adapter" : "nlme", + "final_downhill_search" : true, + "gcc_dir" : "C:\\Program Files\\Certara\\mingw64", + "keep_extensions" : [], + "keep_files" : [], + "local_2_bit_search" : false, + "model_run_timeout" : 1200, + "niche_radius" : 2, + "nlme_dir" : "C:\\Program Files\\Certara\\NLME_Engine", + "num_generations" : 6, + "num_niches" : 2, + "num_parallel" : 4, + "population_size" : 30, + + "project_name" : "MOGA-nlme", + "random_seed" : 51424319, + "rscript_path" : "C:/Program Files/R/R-4.5.1/bin/Rscript.exe", + "working_dir" : "{project_dir}" + } + +The path to Rscript.exe is required for all NLME searches, regardless of whether post-run +R code is used. + + + + \ No newline at end of file diff --git a/docs/Examples.rst b/docs/Examples.rst index 9d871f7..9570a02 100644 --- a/docs/Examples.rst +++ b/docs/Examples.rst @@ -8,21 +8,39 @@ The following examples are provided in |ReleaseVersion| to help you get started. `Examples `_ are organized into folders with all required files available for execution. -* :ref:`Example 1: PK Model, Trivial Exhaustive Search `. +.. toctree:: + :hidden: -* :ref:`Example 2: PK Model 2, Simulation model by GP with Python code `. + Example1 + Example2 + Example3 + Example4 + Example5 + Example6 + Example7 + Example8 + Example9 + Example10 -* :ref:`Example 3: PK Model, ODE model `. +* :ref:`Example 1: PK Model, Trivial Exhaustive Search ` -* :ref:`Example 4: PK Model, DMAG by GA with post-run R code `. +* :ref:`Example 2: PK Model 2, Simulation model by GP with Python code ` -* :ref:`Example 5: PK Model, DMAG by GP `. +* :ref:`Example 3: PK Model, ODE model ` -* :ref:`Example 6: PK Model, DMAG by RF with post-run Python code `. +* :ref:`Example 4: PK Model, DMAG by GA with post-run R code ` -* :ref:`Example 7: PK Model, Exhaustive Omega Search `. +* :ref:`Example 5: PK Model, DMAG by GP ` -* :ref:`Example 8: Emax Model, PSO `. +* :ref:`Example 6: PK Model, DMAG by RF with post-run Python code ` + +* :ref:`Example 7: PK Model, Exhaustive Omega Search ` + +* :ref:`Example 8: Emax Model, PSO ` + +* :ref:`Example 9: MOGA ` + +* :ref:`Example 10: MOGA3 ` .. _examples_target: diff --git a/docs/MOGA3Postprocessing.rst b/docs/MOGA3Postprocessing.rst new file mode 100644 index 0000000..16a7bcb --- /dev/null +++ b/docs/MOGA3Postprocessing.rst @@ -0,0 +1,140 @@ +:orphan: + +.. _general_principles: +.. _moga3_postprocessing: + +MOGA3 Post-processing +====================== +In all cases, the post processing code is called from the directory where NONMEM or NLME has run, and before +any files are removed. Therefore, all output from NONMEM (i.e, $TABLE files, .ext, .xml, .phi etc) file and +all NLME (i.e. dmp.txt, err?, nlme?engine.log, tables etc) will be available in the working directory for R +or python to read. In addition, you can be confident that only files that pyDarwin/NONMEM/NLME put there +will be there, there will be no additional files that might cause problems, if, for examples, you looked for +the .xml NONMEM output (e.g., NM_1_01.xml) file in R with + +.. code-block:: R + + list.files(path = ".", pattern = "xml$") + +this will return a list of files with the file extension .xml. Unless NONMEM put some additional files +(e.g., from $TABLE) it will be the only .xml file present, and can then be used to extract model information. +In NONMEM, the file name stem for output files such as the .lst, .ext and .xml file will +be the same as the folder name, specifially NM_{Generation}_{model}, e.g., NM_1_01.xml for the xml file +from generation 1, model 1 (if there are 10-99 models in the population). + + +The MOGA algorithm option in pyDarwin does not accept user defined code. MOGA3 requires user defined code. +In MOGA3, in all cases the code must return the required number of variables. python code must return +2 arrays, one of objectives and one of constraints, from a function. For R, the same values +are passed back to python by outputing them to the standard output in R with two print() statement. +As for post processing code with other algorithms, the R code will be a script and the python code will +be a function. +The lengths of the vectors/arrays must match the number of objectives and constraints specified +in the options.json file. Any other return values will result in a crash in pyDarwin. +Practically, this means the use of error trapping and the return of "crash" values for exceptions with +required structure; 2 vectors or arrays of length defined in the options.json file as, +`objectives and constraints `_. +Further, in the case of R, other output to the standard output (e.g., messages from loading packages) must +be suppressed. +In general post run R and python code is not compatible between NONMEM and NLME, as the output files are very +different, e.g., the xml for NONMEM and the dmp.txt file for NLME. The exception is when the post_process2 +function in python is used, and only information available in the pyDarwin run object is used. +(:ref:`moga3-nlme_post_process2`). + +In general, there is little reason to have more than one constraint in MOGA3, as the results from number +of infeasible results can be used to set a single contraint to a value > 0. + + + +R +------------ +In all cases the R code will include a script. The script may call any defined functions. For MOGA3 the +script should print to the standard output two vectors, the first being the objectives, and the second the constraint(s). +Both outputs may be numeric or character, but character values must be convertable to numeric. In MOGA3 the use of +the print() statement is required, as there will be two outputs to the standard error, and just + +.. code-block:: R + + c(OFV1, OFV2, OFV3) + c(constraint1) + +Would only output the final to the standard output. The correct syntax is: + +.. code-block:: R + + print(c(OFV1, OFV2, OFV3)) + print(c(constraint1)) + +and the code can be tested by calling rscript with the code file as an argument, from the run directory. It should return the +required output the the console, e.g. for DOS command line: + +.. code-block:: console + + c:\git\mogaexamples\moga3NLMER\temp\1\01>"C:/Program Files/R/R-4.5.1/bin/Rscript.exe" ..\..\..\RSE3penaltyb.R + [1] 8438.10000 7.00000 61.60081 + [1] 1 + + + +python +------------ +In contrast to R, post run code in python is a function. In fact, syntax is available for two functions. The first (and +likely the most useful) is to write a function called post_process +(:ref:`moga3-nlme_post_process`) and takes the argument run_dir. The function signature is + + +.. code-block:: python + + def post_process(rundir): + . + . + . + + return [OFV1, OFV2, OFV3], [constraint1] + +where rundir is a string. The function must return 2 vectors with the lenghth of the first described by the +value of "objectives" in the options.json file: + +.. code-block:: JSON + + "MOGA" : { + + "objectives" : 3, + }, + +and the length of the 2nd described by the value of "constraints" + +.. code-block:: JSON + + "MOGA" : { + + "constraints" : 1, + }, + + +If the search is to be unconstrained, it is most stable to still use 1 contraint, but always return [0] in the constraint vector. + +The second syntax is for post_process2, with the function signature below: + +.. code-block:: python + + def post_process2(run): + . + . + . + + return [OFV1, OFV2, OFV3], [constraint1] + +where "run" is the python object model run object. Developing code for post_process2 is challenging in that it must be +developed within a full pyDarwin run in order to create the run objects. This is likely non-trivial for the causal user. +However, Certara is happy to provide support for interested user. In contrast with post_process, one can simply send +the run path as a argument, e.g., + + +.. code-block:: console + + + +or use your favorite IDE, with the NLME run folder as the working directory. + + diff --git a/docs/ParetoFrontCars.jpeg b/docs/ParetoFrontCars.jpeg new file mode 100644 index 0000000..0abf290 Binary files /dev/null and b/docs/ParetoFrontCars.jpeg differ diff --git a/docs/Partitions_popsize.png b/docs/Partitions_popsize.png new file mode 100644 index 0000000..d6ae8ab Binary files /dev/null and b/docs/Partitions_popsize.png differ diff --git a/examples/NONMEM/user/Example4/template.txt b/examples/NONMEM/user/Example4/template.txt index 2fffa6f..5767ab8 100644 --- a/examples/NONMEM/user/Example4/template.txt +++ b/examples/NONMEM/user/Example4/template.txt @@ -52,16 +52,8 @@ $OMEGA $SIGMA {RESERR[2]} -$EST METHOD=COND INTER MAX = 9999 MSFO=MSF1 PRINT = 10 +$EST METHOD=0 MAX = 9999 MSFO=MSF1 PRINT = 10 $COV UNCOND PRINT=E PRECOND=1 PRECONDS=TOS MATRIX = R $TABLE REP ID TIME DV EVID NOPRINT FILE = ORG.DAT ONEHEADER NOAPPEND - -$PROB SIMULATION FOR CMAX - -$INPUT ID TIME AMT {D1LAG[1]} DV MDV EVID WT AGE SEX BUN SCR OCC CRCL COV1 COV2 -$DATA {data_dir}/dataWithPeriod.csv IGNORE=@ REWIND -$MSFI = MSF1 -$SIMULATION (1) ONLYSIM -$TABLE REP ID TIME IOBS EVID NOAPPEND NOPRINT FILE = SIM.DAT ONEHEADER NOAPPEND - \ No newline at end of file + \ No newline at end of file diff --git a/src/darwin/options.py b/src/darwin/options.py index cfb586a..88a7a51 100644 --- a/src/darwin/options.py +++ b/src/darwin/options.py @@ -223,9 +223,9 @@ def _init_options(self, options_file: str, folder): self.effect_limit = opts.get('effect_limit', -1) self.use_effect_limit = self.effect_limit > 0 - if (options.engine_adapter != 'nonmem' or options.algorithm not in ['GA', 'MOGA', 'MOGA3']) and self.use_effect_limit: - log.warn('Can only use effect_limit with GA/MOGA/MOGA3 and NONMEM, turned off') - self.use_effect_limit = False + #if (options.engine_adapter != 'nonmem' or options.algorithm not in ['GA', 'MOGA', 'MOGA3']) and self.use_effect_limit: + # log.warn('Can only use effect_limit with GA/MOGA/MOGA3 and NONMEM, turned off') + # self.use_effect_limit = False self.remove_temp_dir = opts.get('remove_temp_dir', False) self.remove_run_dir = opts.get('remove_run_dir', False)