-
Notifications
You must be signed in to change notification settings - Fork 26
Testing and Comparing Your Model
On this page:
SDEverywhere includes extensive QA (quality assurance) packages and tools that are collectively known as "model-check". The model-check tool can run as you develop your model, either locally on your machine or in the cloud in a continuous integration environment (or both).
With model-check, there are two kinds of tests:
-
Checks are objective tests of a model's behavior.
- These are "objective" in the sense that they always provide a yes/no or right/wrong answer.
- Check tests are good for verifying that a model conforms to some expectations or ground truths.
- They can help catch bugs and unintentional changes that might otherwise go undetected.
- Here are a few examples of useful checks defined for En-ROADS (but there are countless other examples that will vary from model to model):
- Stocks should never be negative
- The population variable values should be within +/- 5% of the historical population data for the years 1900-2025
- The population variable values should be between 8 billion and 12 billion for all defined input scenarios
- The temperature variable values should always be lower with input scenario X (e.g., with a carbon tax) than with input scenario Y (e.g., a baseline scenario with no carbon tax)
-
Comparisons are subjective tests of the behavior of two versions of the same model.
- These are "subjective" in the sense that they don't usually provide a right/wrong answer and are subject to interpretation by the modelers.
- Comparison tests are good for making sense of how a change to the model impacts the output values of that model under a wide variety of input scenarios.
- Comparison tests allow for exercising a model under many different scenarios in a short amount of time.
- The model-check report orders the results so that the most significant changes are at the top, and the results are color coded to help you see at a glance what outputs have changed the most compared to the base/reference/previous version of the model.
- Here are a few examples of useful comparisons defined for En-ROADS (and as with check tests, there are countless other examples depending on your model):
- Baseline scenario (all inputs at default)
- All inputs at their {min,max}imum values (all at once)
- All main sliders at the {min,max}imum values (all at once)
- Each individual input at its {min,max}imum value (while others are at default)
- Low, medium, and high carbon price (for testing values between "min" and "max")
- Fossil fuel phase out (multiple "reduce new infrastructure sliders" set together)
Both checks and comparisons are typically defined in text files in YAML format, though it is possible to define them in JSON format or in TypeScript/JavaScript code if needed.
YAML files are designed to be read and edited by a human, but note that indentation is significant, so you need to be careful. We recommend using VS Code to edit these files and installing the YAML extension. The YAML files that are provided in the Quick Start templates are set up with a reference to the schema at the top (which the YAML extension uses) so that you will get some syntax highlighting (and red squiggles to indicate when the syntax is incorrect).
If you follow the Quick Start instructions, the generated template will include sample checks.yaml and comparisons.yaml files to get you started. Refer to the Creating a Web Application page for an overview of where these files reside in the recommended project structure.
Read the following two subsections for more details on how to define checks and comparisons.
The following is an example of a group of check tests, taken from the SIR example project.
- describe: Population Variables
tests:
- it: should be between 0 and 10000 for all input scenarios
scenarios:
- preset: matrix
datasets:
- name: Infectious Population I
- name: Recovered Population R
- name: Susceptible Population S
predicates:
- gte: 0
lte: 10000- The "describe" and "it" naming convention comes from unit testing frameworks in the software development world. This convention encourages naming tests in natural language that describes how the model should behave. For example, the test above is basically saying "population variables should be within a certain range across all input scenarios".
- A group of tests starts with a
describefield. This is used to group related tests together. - A
describegroup should contain one or more items in thetestsfield. - Each test starts with an
itfield that describes the expected behavior in plain language. The text usually begins with "should" (for example, this variable "should always be positive" or "should be close to historical values"). - Each test includes 3 essential parts --
scenarios,datasets, andpredicates. - You are not limited to a single
describegroup or a singleyamlfile. You can put multipledescribegroups in a single file, or you can spread out and define manyyamlfiles under yourchecksfolder (for example, you can havepopulation.yamlandtemperature.yamland more).
The scenarios field should contain one or more input scenarios for which the expectations hold true.
Click to reveal examples
-
A single scenario that includes a single input at a specific value:
scenarios: - with: Input A at: 50
-
A single scenario that includes a single input at its defined extreme (minimum or maximum) value:
scenarios: - with: Input A at: max
-
A single scenario that includes multiple input values set at the same time:
scenarios: - with: - input: Input A at: 50 - input: Input B at: 20
-
Multiple (distinct) scenarios that have the same expected behavior:
scenarios: - with: Input A at: max - with: Input B at: max
-
A special "matrix" preset that will execute the test once for each input variable at its minimum, and again at its maximum:
scenarios: - preset: matrix
The datasets field should contain one or more datasets (output variables or external datasets) for which the expectations hold true.
Click to reveal examples
-
A single dataset referenced by name:
datasets: - name: Output X
-
Multiple datasets referenced by name (one model output and one external dataset):
datasets: - name: Output X - name: Historical Y source: HistoricalData
-
Multiple datasets in a predefined group:
datasets: - group: Key Outputs
The predicates field should contain one or more predicates, i.e., the behavior you expect to be true for the given scenario/dataset combinations.
Click to reveal examples
-
A predicate that says "greater than 0":
predicates: - gt: 0
-
A predicate that says "greater than 10 and less than 20 in the year 1900":
predicates: - gt: 10 lt: 20 time: 1900
-
A predicate that says "approximately 5 in the years between 1900 and 2000":
predicates: - approx: 5 tolerance: .01 time: [1900, 2000]
-
A predicate that says "approximately 5 for the year 2000 and beyond":
predicates: - approx: 5 tolerance: .01 time: after_incl: 2000
-
A predicate that says "within the historical data bounds for all years up to and including the year 2000":
predicates: - gte: dataset: name: Historical X confidence lower bound lte: dataset: name: Historical X confidence upper bound time: before_incl: 2000
For more examples of different kinds of check tests (including various predicates, combinations of inputs, time ranges, etc), refer to the checks.yaml file in the sample-check-tests example.
The following is a screenshot of the "Checks" tab in a sample model-check report, which shows two expanded test results, one that is failing (note the red X's) and one that is passing (note the green checkmarks).
The following is an example of a comparison scenario definition, taken from the SIR example project.
- scenario:
title: Custom scenario
subtitle: with avg duration=4 and contact rate=2
with:
- input: Average Duration of Illness d
at: 4
- input: Initial contact rate
at: 2- A
comparisons.yamlfile will typically have at minimum one or morescenariodefinitions, but you can also havescenario_group,graph_group, andview_groupdefinitions in the same file. - You are not limited to a single
yamlfile to hold your comparisons. You can put multiple definitions in a single file, and you can spread out and define manyyamlfiles under yourcomparisonsfolder (for example, you can haverenewables.yamlandeconomy.yamland more).
A scenario definition represents an input scenario for which each output variable for the two models will be compared.
The format of a scenario is similar to that of a check test (see above), except that it can contain:
- a
titleandsubtitle(for keeping similar scenarios grouped together in the model-check report) - an optional
id(that allows for the scenario to be referenced in ascenario_grouporview_groupdefinition)
Click to reveal examples
-
A scenario that includes a single input at a specific value:
- scenario title: Input A subtitle: at medium growth with: Input A at: 50
-
A scenario that includes a single input at its defined extreme (minimum or maximum) value:
- scenario title: Input A subtitle: at maximum with: Input A at: max
-
A scenario that includes multiple input values set at the same time:
- scenario title: Inputs A+B subtitle: at medium growth with: - input: Input A at: 50 - input: Input B at: 20
-
A "baseline" scenario that sets all inputs to their default values:
- scenario title: All inputs subtitle: at default with_inputs: all at: default
-
A special "matrix" preset that will generate comparisons for each input variable at its minimum, and again at its maximum:
- scenario: preset: matrix
TODO: This section is under construction. See "More Examples" below for a link to an example of scenario groups.
TODO: This section is under construction. See "More Examples" below for a link to an example of view groups.
For more examples of different kinds of comparison definitions (including different ways to define scenarios, scenario groups, views, etc), refer to the comparisons.yaml file in the sample-check-tests example.
The model-check report includes two separate tabs for viewing comparisons.
The "Comparisons by scenario" tab summary view lists all the input scenarios that were compared:

Clicking on a scenario will take you to a detail view that shows graphs of all output variables under that input scenario:

The "Comparisons by dataset" tab summary view lists all the datasets (output variables and external datasets) that were compared:

Clicking on a dataset will take you to a detail view that shows graphs of that dataset for each tested input scenario:

Every model-check report includes a table summarizing the size and run time (speed) of the two versions of your generated model being compared.
For example:

If you click on the blue and red "heat map" to the right side of that table, it will open a performance testing page:

Click on the "Run" button a few times to get a sense of how the run times compare for the two versions of the model. (Note that it's currently a somewhat hidden feature, so the UI is not fully polished.)
The heat map display is useful for seeing the average time and distribution of outlying samples. To ensure consistent results, it is recommended to run performance tests when your computer is "quiet" (idle).
You may encounter situations in which your SDEverywhere-generated model produces results that differ from your expectations. For example:
- Your generated model produces different results than Vensim. This can happen when:
- Your sliders are defined in
inputs.csvwith different default values than those in your.mdlfile. - Or, you've stumbled upon an issue where SDEverywhere has different behavior than Vensim (due to a bug or some known limitation).
- Your sliders are defined in
- Your generated model produces results that fail under certain scenarios (as defined in your check tests).
For these situations, there is a model-check feature called "Trace View" that can help you diagnose model behavior for each variable and time step. Trace View supports the following kinds of comparisons:
- Compare how two model versions behave for a given scenario.
- Compare how a model version behaves for one scenario versus another scenario.
- Compare how a model version differs from Vensim outputs (provided in a
.datfile).
There are a few ways to enter Trace View:
- Press the "T" key from any of the home/summary views.
- Right-click on a model check test scenario and select "Open Scenario in Trace View".
- Right-click on a model comparison scenario and select "Open Scenario in Trace View".
When you open Trace View, by default it runs the model(s) and compares them for the selected scenarios.
You can use the dropdown selectors to change the source (i.e., the model or .dat file being compared) or the scenario being tested.
When the run completes, Trace View will show every variable for which a non-zero difference was detected.
Each square represents the value of that variable at a given time step.
- A green square indicates that the two sources produced identical values at that time step.
- A yellow square indicates that the two sources produced slightly different values (greater than zero but less than the defined threshold) at that time step.
- A red square indicates that the two sources produced significantly different values (greater than or equal to the defined threshold) at that time step.
Note that variables are grouped and sorted according to the order that they are evaluated in the generated model. Variables that are lower on the page depend on variables that are higher on the page. This ordering can help you pinpoint for which variable and time step the two models begin to produce differing values.
By default, the first non-green square will be selected. You can hover over any square to see a full behavior-over-time graph and other details for that variable. You can also click any square to select it.
- Press the Right Arrow key to select the next non-green square.
- Press the Left Arrow key to select the previous non-green square.
- Press the Home key (Fn+Left Arrow on macOS) to select the first non-green square.
- Press the End key (Fn+Right Arrow on macOS) to select the last non-green square.
- Press the "N" key to select the next non-green square associated with an "output" variable.