Conversation
84435b5 to
930b6bc
Compare
|
some thoughts - we do need to be able to treat initial conditions (or at least S(0) and R(0)) as special in the sense that they're not independent, so we should think carefully about how to handle this. I(0) and H(0) are also clearly interlinked, really only V(0) can probably be argued to be fully independent - we can probably get around that to an extent with some binomials, but dirchlet is the natural distribution if we want to do it "right". |
414e4f6 to
c8dcc36
Compare
0243519 to
b379114
Compare
fbd7e29 to
16e29e6
Compare
4ebf48b to
2f640f4
Compare
|
This PR has some size to it so I can see how it might be helpful to break it up. Clear breaks into separate PRs from what I can see are: |
pearsonca
left a comment
There was a problem hiding this comment.
some work in progress notes here.
|
|
||
| # Initial Conditions vs Seeding | ||
|
|
||
| Both initial conditions and seeding can be used to create similar behaviors. While the previous pages cover in detail each component's functionality, below is a brief table highlighting the major differences to help you decide what to use for your situation. |
There was a problem hiding this comment.
This needs a bit of a tune-up. More like:
| Both initial conditions and seeding can be used to create similar behaviors. While the previous pages cover in detail each component's functionality, below is a brief table highlighting the major differences to help you decide what to use for your situation. | |
| Infectious disease models require some infections to kick off transmission dynamics. These can be present in the starting state for the simulation (as `initial_conditions`) or introduced over time (via `seeding`). Other sections (links?) provide a detailed breakdown, but this table highlights the major differences between these approaches to help you decide what to use for your situation. |
also, are initial conditions and seeding mutually exclusive?
also-also-aside, do we need to think about the parametrization-based approach for seeding as well?
There was a problem hiding this comment.
(i see from text below, yes, seeding still needs to enable parametrization)
There was a problem hiding this comment.
also-also-aside, do we need to think about the parametrization-based approach for seeding as well?
I think at some point we might be interested in unifying the two concepts into one? Might be outside the scope of this PR though.
There was a problem hiding this comment.
also, are initial conditions and seeding mutually exclusive?
No, these can be used in conjunction. They are both provided to gempyor.steps_rk4.rk4_integration and that function uses both.
|
|
||
| | | Initial Conditions | Seeding | | ||
| |-------------------------|-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| | ||
| | Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. | |
There was a problem hiding this comment.
to double check, but does seeding really instantly add to the infectious (or whatever) compartment? or is it generally taking an S and making an I? I think most often it should be modeling a force of exposure, which would mean that (for example) a more R population would experience fewer seeding introductions.
| | Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. | | |
| | Main purpose | Specifying compartments sizes at the initial time. | Instantaneously transferring between compartments at arbitrary times. | |
There was a problem hiding this comment.
You're correct, I did not realize the difference between the two was important and got loose with the language. This is the code that takes care of the instantaneous transition:
flepiMoP/flepimop/gempyor_pkg/src/gempyor/steps_rk4.py
Lines 230 to 249 in ad62f2d
| | Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. | | ||
| | Default functionality | Each subpopulations entire population is placed in the first compartment. | No seeding events occur. | | ||
| | Config section needed | Optional, results in warning that default functionality used. | Optional. | | ||
| | Required input files | Depending on method could be a parquet or CSV file(s). | Yes, a CSV describing seeding events with a format dependent on method used. | |
There was a problem hiding this comment.
Bit confusing - shouldn't we also be mentioning plugins here? Roughly, we support 1) a precalculated file (csv or parquet) that can be read into the necessary format OR 2) a script to calculate that necessary format, using model structure + parameters.
aside: i could also imagine supporting a simple funcational interface, as with other parameters.
There was a problem hiding this comment.
Does this cover that:
documentation/gitbook/gempyor/model-implementation/specifying-initial-conditions.md
Show resolved
Hide resolved
| Would place the entire subpopulation into the 'S', 'child', 'unvaxxed' compartment. While 'S' usually makes sense for a model, declaring the entire subpopulation to be children is likely not reflective of reality. | ||
|
|
||
| #### SetInitialConditions | ||
| ### 'SetInitialConditions' Method |
There was a problem hiding this comment.
new minor issue: supporting both this and FromFile seems wasteful (and confusing?)
If we want to support two schema for the files, and then load based on what the column headers are, that seems fine. But different config methods seems likely to confuse user.
There was a problem hiding this comment.
Yeah, I see that, but this is just maintaining the current behavior. Would we want to change that?
| * `subpop_1`, `subpop_2`, etc. – One column for each different subpopulation, containing the value of the number of individuals in the described compartment in that subpopulation at the given date. Note that these are named after the node names defined by the user in the `geodata` file. | ||
| * `date` – The calendar date in the simulation, in YYYY-MM-DD format. Only values with a date that matches to the simulation `start_date` will be used. | ||
|
|
||
| ### 'SetInitialConditionsFolderDraw' and 'InitialConditionsFolderDraw' Methods |
There was a problem hiding this comment.
related to collapsing the other file types, feels like this should also be introspected. if the user offers a folder instead of a specific file, do folder draw mode.
| ```python | ||
| from typing import Literal | ||
|
|
||
| from gempyor.compartments import Compartments | ||
| from gempyor.initial_conditions import ( | ||
| InitialConditionsABC, | ||
| register_initial_conditions_plugin, | ||
| ) | ||
| from gempyor.parameters import Parameters | ||
| from gempyor.subpopulation_structure import SubpopulationStructure | ||
| import numpy as np | ||
| import numpy.typing as npt | ||
| from pydantic import Field | ||
|
|
||
| class TwoCompartmentInitialConditions(InitialConditionsABC): | ||
| method: Literal["TwoCompartment"] = "TwoCompartment" | ||
| weight: float = Field(gt=0.0, lt=1.0) | ||
|
|
||
| def create_initial_conditions( | ||
| self, | ||
| sim_id: int, | ||
| compartments: Compartments, | ||
| subpopulation_structure: SubpopulationStructure, | ||
| ) -> npt.NDArray[np.float64]: | ||
| y0 = np.zeros((len(compartments.compartments), subpopulation_structure.nsubpops)) | ||
| y0[0, :] = self.weight * subpopulation_structure.subpop_pop | ||
| y0[1, :] = (1.0 - self.weight) * subpopulation_structure.subpop_pop | ||
| return y0 | ||
|
|
||
| register_initial_conditions_plugin(TwoCompartmentInitialConditions) | ||
| ``` |
There was a problem hiding this comment.
this might be a tall-ask, but: is there anyway to make this more like "define a method, with this name + this signature", and then have our machinery parse that on the fly into the proper plugin structure?
as written, this is asking a lot for people. anyway to use importlib here? ala, https://docs.python.org/3/library/importlib.html#importing-programmatically
There was a problem hiding this comment.
though, i do like the ability to specify multiple methods once and then switch between them depending upon config key.
There was a problem hiding this comment.
I could look into that, although this seems more appropriate for when we get to the stage of a general purpose plugin section? Might be a bit beyond the scope of this PR.
| register_initial_conditions_plugin(TwoCompartmentInitialConditions) | ||
| ``` | ||
|
|
||
| Note that in the above plugin an additional argument has been added to `create_initial_conditions` whose name, `alpha`, corresponds to the SEIR parameter name. You can finally use this plugin in your configuration file: |
There was a problem hiding this comment.
nice. also feels like we should be telling people here what happens if they typo alfa instead - presumably some "could not find parameter ..." error?
There was a problem hiding this comment.
That's here:
flepiMoP/flepimop/gempyor_pkg/src/gempyor/parameters.py
Lines 416 to 421 in ac61e06
I can add it to the todo list to add this to the documentation with an example.
| initial_conditions: | ||
| method: TwoCompartment | ||
| module: model_input/my_initial_conditions.py | ||
| weight: 0.5 |
There was a problem hiding this comment.
weight is a weird special parameter.
i need to think a bit about this one - is it holdover from old interface?
rather than a class field, should weight be handled more like ...you can put any variables that only your initial conditions script uses in this config section and they'll get passed in just like normal parameters?
There was a problem hiding this comment.
I think if we want to pass the configuration as arguments rather than having them as attributes of the class that would require a rethink of the current pydantic usage. I'll add it to the todo list to investigate.
Downside to switching to arguments would be you loose any config parse time validations, instead errors like weight being -0.1 would only be discovered at runtime.
| initial_conditions: | ||
| method: plugin | ||
| plugin_file_path: model_input/my_initial_conditions.py | ||
| method: Default |
There was a problem hiding this comment.
meant to change this example? or should there be another example that shows default vs plugin vs ...?
There was a problem hiding this comment.
Yes, meant to change temporarily since the way plugins are used is different.
f976a9d to
606f901
Compare
Added a custom warning for potential configuration file issues. This warning is desirned to indicate configuration issues that do not prevent flepiMoP from running but may produce unexpected behavior or that the user made a mistake of some kind.
Added an internal helper, `_invert_into_dict`, to `gempyor.utils` to invert a single value into multiple keys for a dictionary whose values are a list of some type. This helper does this efficiently by modifying the dictionary in place.
Added the `Compartments.subset_dataframe` method to easily filter a provided pandas DataFrame with specific column structure in a loop.
Made a first pass refactoring initial conditions by address most, but not all, of the pylint suggestions. Includes reordering of logic, addition of docstrings, code formatting, etc. but importantly no changes to logic. Other larger changes include: * Renamed the `InitialConditionsFactory` function to `initial_conditions_factory` to fit pythonic coding style. * Unit tests and minor refactoring of the `check_population` internal validation utility, namely simplifying input arguments. * Added unit tests for the `read_initial_condition_from_tidydataframe` helper function for parsing ICs from a "tidy" data frame.
Moved initial conditions from being a submodule of `gempyor` to being a `subpackage`. Results in no net interface changes, opens the door for more invasive refactoring.
Simplifed the internals of `check_population` by leaning on numpy's functionality more.
First pass at modular initial conditions, enabled by the `InitialConditionsABC` that defines the interface for an initial conditions module. Also implmented `DefaultInitialConditions` as an example of how to use said ABC.
Added `gempyor.model_meta.ModelMeta` class as a data container for core model metadata and providing basic filesystem operations independent of the `ModelInfo` mega class. Use `ModelMeta` class to validate model metadata for `ModelInfo` and do some light filesystem setup. Currently reassigning attributes from `ModelMeta` back to `ModelInfo` for backwards compatibility.
Extracted the `TimeSetup` class from `gempyor.model_info` and placed it into the new `gempyor.time_setup` module and also refactored to use a pydantic representation instead of the previous plain class representation.
First draft of `gempyor.initial_conditions.FileOrFolderDrawInitialConditions`. Limited on documentation and unit tests, but provides the 'SetInitialConditions', 'SetInitialConditionsFolderDraw', 'FromFile', and 'InitialConditionsFolderDraw' initial condition methods. Also had to take modified versions of helper functions from `gempyor._readers` that need to be heavily refactored.
Replaced the previous `InitialConditions` monolithic class with modular classes that subclass `InitialConditionsABC`. These subclasses then register themselves as plugins via `register_initial_conditions_plugin`. These are then accessible via the `initial_conditions_from_plugin` function. Initial condition plugins can then be reset using the `_reset_initial_conditions_plugins` helper, mostly for unit testing purposes. Furthermore, this has been integrated directly into the rest of `gempyor`. The `get_initial_conditions_data` method was added to the model info object to more easily access initial conditions, similar to how seeding is done.
Removed an unused helper module that contained utilities for the prior initial conditions implementation. Those utilities have been moved into `gempyor.initial_conditions._file_or_folder_draw` as internal helpers. Also made corresponding moves/edits to the unit tests.
In particular testing the outputs of the `create_initial_conditions` method provided by `gempyor.initial_conditions.DefaultInitialConditions`.
Added unit tests for the function used to generate an `InitialConditionsABC` instance, namely: * Checking that `DefaultInitialConditions` is returned when 'method' is not specified, with a warning, and * A `ValueError` is raised when the 'specified' method is not found in the registered initial condition plugins.
Rather than requiring an explicit path prefix value be provided make use of the 'path_prefix' attribute from `gempyor.model_meta.ModelMeta` for filesystem interactions.
Initial unit tests for the `gempyor.initial_conditions.FileOrFolderDrawInitialConditions` class. Began primarily with initialization errors and warnings.
* Added `subset_dataframe` to `gempyor.compartments.Compartments` for looping over a dataframe subsetting by compartments, * Added internal utility `_invert_into_dict` for efficiently inserting one value to many keys in a dict, * Allow 'rest' allocations to work when using the 'Set*' methods for initial conditions, and * Refactored `read_initial_condition_from_seir_output` to also use `Compartments.subset_dataframe`.
Made modifications to the `InitialConditionsABC.create_initial_conditions` abstract method to allow it to accept an instance of `Paramters` (represents information about SEIR parameters in the abstract) and a numpy array of actualized parameter values. Subsequent changes were made to: * The two implementations of `InitialConditionsABC`, * The `InitialConditionsABC.get_initial_conditions` wrapper method, * The `ModelInfo.get_initial_conditions_data` wrapper method, and * The various calls to `get_initial_conditions_data` in the seir unit tests and `gempyor.seir/inference` modules. Other minor test adjustments were made to `tests/initial_conditions/` to provide dummy values for those parameters. Unused by current implementations.
Added `gempyor.parameters._inspect_requested_parameters` helper to inspect requested parameters from a provided function. This function will then extract those requested parameters from the given `pdata` (structured like `Parameters.pdata` dict attribute) and `p_draw` numpy array (like the return of `Parameters.parameters_quick_draw`). Also added corresponding documentation and unit tests.
Updated the `create_initial_conditions` method, implemented by subclasses of `InitialConditionsABC`, to accept `*args` for parsed parameters instead of `parameters` and `p_draw` directly. These requested parameters get parsed out in the `get_initial_conditions` method used by `gempyor` to access initial conditions. Minor code adjustments were made to `DefaultInitialConditions` and `FileOrFolderDrawInitialConditions` to accommodate and a small test was added for a custom plugin that requests a parameter argument.
When using initial conditions custom plugins, like ones that can be inferred, the module has to be related when the multiprocessing start method is 'spawn' or 'forkserver' because the child process does not get a copy of the parent process' memory and therefore does not have the dynamically loaded plugin available at start. This can lead to an unsual 'ModuleNotFound' exception with a traceback relating to deserializing the pickled memory state. Luckily, the fix is simple, just need to reload the custom plugin when the child process starts. This should be kept in mind, and likely refactored when addressing #421.
Made minor adjustments to the `inference` R package to support initial condition plugins. Mostly adjusting logic around which code path to go down based on the `method` field of the `initial_conditions` section. Also fixed a bug in the `flepimop-inference-main` command where it was trying to call the `flepimop-inference-slot` command with `R` rather than directly invoking it.
Modified the example contained in `examples/simple_usa_statelevel/` to infer the initial conditions with a custom plugin. This custom initial conditions plugin is very simple, just figuring out what proportion of the population to place into the susceptible vs the infected compartments. However, this is a great way to introduce users into inferable initial conditions. Also added configuration file `examples/tutorials/config_sample_2pop_inference_with_initial_conditions.yml` to demonstrate how to do a similar procedure with R inference.
Updated the initial conditions documentation by: * Split initial conditions vs seeding section into its own page, * Heavily refactored existing initial condiditions guide, * Added a table overview of the builtin methods, * Documented how to implement a plugin and how to request parameters.
606f901 to
c8eafd3
Compare
Describe your changes.
This pull request...
initial_conditionssubmodule to a subpackage.ModelMetaclass to represent the core metadata of a configuration file and do basic filesystem operations.ModelMetainModelInfoto validate meta & setup filesystem.ModelMetatoInitialConditions.from_confuse_configto enable filesystem access.FileOrFolderDrawInitialConditions.DefaultInitialConditions.Unit test/documentFollow up in [Feature request]: Simplify Initial Condition Options For Reading From Files/Directory #593.FileOrFolderDrawInitialConditions.gempyorcodebase.(might be better for a follow up PR)._read_initial_condition_from_tidydataframe._read_initial_condition_from_tidydataframe._read_initial_condition_from_seir_output. (Kinda, compartment filtering functionality needs to be extracted)._read_initial_condition_from_seir_output.Integrate with flu configuration writer script. @MacdonaldJoshuaCaleb to point to relevant script(s) here.Moved to [Feature request]: Update Initial Conditions Handling Inflepiconfig#597.Add initial conditions method for specifying directly in a configuration file.Moved to [Feature request]: Method For Defining Static Initial Conditions In Configuration File #594.path_prefixhandling, should be obtainable from themeta.path_prefixattribute.nlqmtrru) into the correct commit, ultimately it's a mistake.create_initial_conditionsso parameters are provided in a more user friendly way.Extract the initial conditions module loading fromBased handled by follow up work in [Feature request]: Extract And Generalize Plugin System #421.initial_conditions_from_plugininto an internal helper (thinking ahead to extracting/generalizing the plugin system for the rest of gempyor)._inspect_requested_parametersdocstring.Rethink current pydantic usage, users would prefer to receive configuration as arguments rather than as attributes to a class (Refactor initial conditions #572 (comment)).Outside the scope of this PR, likely best for a general purpose plugin system.Allow seeding to also be parameterized similarly to initial conditions in this PR (Refactor initial conditions #572 (comment)).Likely best for follow up work in [Feature request]: Consolidate Initial Conditions And Seeding Into One Concept #587.Investigate functional definitions for initial condition plugins rather than the current subclassing approach (Refactor initial conditions #572 (comment)).Best handled by follow up work in [Feature request]: Extract And Generalize Plugin System #421.Does this pull request make any user interface changes? If so please describe.
The user interface changes are none for builtin methods, but minor changes to how plugins work with additional features.
Those are reflected in updates to the documentation in
documentation/gitbook/gempyor/model-implementation/specifying-initial-conditions.md.