Skip to content

Refactor initial conditions#572

Open
TimothyWillard wants to merge 28 commits intodevfrom
refactor-initial-conditions
Open

Refactor initial conditions#572
TimothyWillard wants to merge 28 commits intodevfrom
refactor-initial-conditions

Conversation

@TimothyWillard
Copy link
Contributor

@TimothyWillard TimothyWillard commented Jun 11, 2025

Describe your changes.

This pull request...

Does this pull request make any user interface changes? If so please describe.

The user interface changes are none for builtin methods, but minor changes to how plugins work with additional features.

Those are reflected in updates to the documentation in documentation/gitbook/gempyor/model-implementation/specifying-initial-conditions.md.

@TimothyWillard TimothyWillard self-assigned this Jun 19, 2025
@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. high priority High priority. next release Marks a PR as a target to include in the next release. labels Jun 27, 2025
@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch from 84435b5 to 930b6bc Compare June 27, 2025 18:26
@MacdonaldJoshuaCaleb
Copy link
Collaborator

some thoughts - we do need to be able to treat initial conditions (or at least S(0) and R(0)) as special in the sense that they're not independent, so we should think carefully about how to handle this. I(0) and H(0) are also clearly interlinked, really only V(0) can probably be argued to be fully independent - we can probably get around that to an extent with some binomials, but dirchlet is the natural distribution if we want to do it "right".

@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch from 414e4f6 to c8dcc36 Compare July 8, 2025 15:01
@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch 9 times, most recently from 0243519 to b379114 Compare July 21, 2025 14:34
@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch 4 times, most recently from fbd7e29 to 16e29e6 Compare July 21, 2025 20:41
@TimothyWillard TimothyWillard marked this pull request as ready for review July 22, 2025 18:47
@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch from 4ebf48b to 2f640f4 Compare July 22, 2025 18:50
@TimothyWillard
Copy link
Contributor Author

TimothyWillard commented Jul 22, 2025

This PR has some size to it so I can see how it might be helpful to break it up. Clear breaks into separate PRs from what I can see are:

  1. Misc utilities: https://github.com/HopkinsIDD/flepiMoP/pull/572/files/74b255dd60d3cbb720ebd44294602e61abe1fabe
  2. ModelMeta, TimeSetup: https://github.com/HopkinsIDD/flepiMoP/pull/572/files/c1c88b361fdf7f0afd445b5ee5fa78d187d0530c..6a5aa03b4323aad44a20ba2b58c320197f1d7a41

Copy link
Contributor

@pearsonca pearsonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some work in progress notes here.


# Initial Conditions vs Seeding

Both initial conditions and seeding can be used to create similar behaviors. While the previous pages cover in detail each component's functionality, below is a brief table highlighting the major differences to help you decide what to use for your situation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit of a tune-up. More like:

Suggested change
Both initial conditions and seeding can be used to create similar behaviors. While the previous pages cover in detail each component's functionality, below is a brief table highlighting the major differences to help you decide what to use for your situation.
Infectious disease models require some infections to kick off transmission dynamics. These can be present in the starting state for the simulation (as `initial_conditions`) or introduced over time (via `seeding`). Other sections (links?) provide a detailed breakdown, but this table highlights the major differences between these approaches to help you decide what to use for your situation.

also, are initial conditions and seeding mutually exclusive?

also-also-aside, do we need to think about the parametrization-based approach for seeding as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i see from text below, yes, seeding still needs to enable parametrization)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also-also-aside, do we need to think about the parametrization-based approach for seeding as well?

I think at some point we might be interested in unifying the two concepts into one? Might be outside the scope of this PR though.

Copy link
Contributor Author

@TimothyWillard TimothyWillard Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, are initial conditions and seeding mutually exclusive?

No, these can be used in conjunction. They are both provided to gempyor.steps_rk4.rk4_integration and that function uses both.


| | Initial Conditions | Seeding |
|-------------------------|-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to double check, but does seeding really instantly add to the infectious (or whatever) compartment? or is it generally taking an S and making an I? I think most often it should be modeling a force of exposure, which would mean that (for example) a more R population would experience fewer seeding introductions.

Suggested change
| Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. |
| Main purpose | Specifying compartments sizes at the initial time. | Instantaneously transferring between compartments at arbitrary times. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, I did not realize the difference between the two was important and got loose with the language. This is the code that takes care of the instantaneous transition:

if is_a_new_day:
# Prevalence is saved at the begining of the day, while incidence is during the day
states[today, :, :] = states_next
for seeding_instance_idx in range(
seeding_data["day_start_idx"][today],
seeding_data["day_start_idx"][min(today + int(np.ceil(dt)), len(seeding_data["day_start_idx"]) - 1)],
):
this_seeding_amounts = seeding_amounts[seeding_instance_idx]
seeding_subpops = seeding_data["seeding_subpops"][seeding_instance_idx]
seeding_sources = seeding_data["seeding_sources"][seeding_instance_idx]
seeding_destinations = seeding_data["seeding_destinations"][seeding_instance_idx]
# this_seeding_amounts = this_seeding_amounts < states_next[seeding_sources] ? this_seeding_amounts : states_next[seeding_instance_idx]
states_next[seeding_sources][seeding_subpops] -= this_seeding_amounts
states_next[seeding_sources][seeding_subpops] = states_next[seeding_sources][seeding_subpops] * (
states_next[seeding_sources][seeding_subpops] > 0
)
states_next[seeding_destinations][seeding_subpops] += this_seeding_amounts
# ADD TO cumulative, this is debatable,
states_daily_incid[today][seeding_destinations][seeding_subpops] += this_seeding_amounts

| Main purpose | Specifying compartments sizes at the initial time. | Instantaneously changing compartment sizes at arbitrary times. |
| Default functionality | Each subpopulations entire population is placed in the first compartment. | No seeding events occur. |
| Config section needed | Optional, results in warning that default functionality used. | Optional. |
| Required input files | Depending on method could be a parquet or CSV file(s). | Yes, a CSV describing seeding events with a format dependent on method used. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit confusing - shouldn't we also be mentioning plugins here? Roughly, we support 1) a precalculated file (csv or parquet) that can be read into the necessary format OR 2) a script to calculate that necessary format, using model structure + parameters.

aside: i could also imagine supporting a simple funcational interface, as with other parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover that:

| Inference integration | Yes, initial condition plugins can request model parameters. | None, seeding does not integrate well with inference yet. |

Would place the entire subpopulation into the 'S', 'child', 'unvaxxed' compartment. While 'S' usually makes sense for a model, declaring the entire subpopulation to be children is likely not reflective of reality.

#### SetInitialConditions
### 'SetInitialConditions' Method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new minor issue: supporting both this and FromFile seems wasteful (and confusing?)

If we want to support two schema for the files, and then load based on what the column headers are, that seems fine. But different config methods seems likely to confuse user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see that, but this is just maintaining the current behavior. Would we want to change that?

* `subpop_1`, `subpop_2`, etc. – One column for each different subpopulation, containing the value of the number of individuals in the described compartment in that subpopulation at the given date. Note that these are named after the node names defined by the user in the `geodata` file.
* `date` – The calendar date in the simulation, in YYYY-MM-DD format. Only values with a date that matches to the simulation `start_date` will be used.

### 'SetInitialConditionsFolderDraw' and 'InitialConditionsFolderDraw' Methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to collapsing the other file types, feels like this should also be introspected. if the user offers a folder instead of a specific file, do folder draw mode.

Comment on lines +189 to +219
```python
from typing import Literal

from gempyor.compartments import Compartments
from gempyor.initial_conditions import (
InitialConditionsABC,
register_initial_conditions_plugin,
)
from gempyor.parameters import Parameters
from gempyor.subpopulation_structure import SubpopulationStructure
import numpy as np
import numpy.typing as npt
from pydantic import Field

class TwoCompartmentInitialConditions(InitialConditionsABC):
method: Literal["TwoCompartment"] = "TwoCompartment"
weight: float = Field(gt=0.0, lt=1.0)

def create_initial_conditions(
self,
sim_id: int,
compartments: Compartments,
subpopulation_structure: SubpopulationStructure,
) -> npt.NDArray[np.float64]:
y0 = np.zeros((len(compartments.compartments), subpopulation_structure.nsubpops))
y0[0, :] = self.weight * subpopulation_structure.subpop_pop
y0[1, :] = (1.0 - self.weight) * subpopulation_structure.subpop_pop
return y0

register_initial_conditions_plugin(TwoCompartmentInitialConditions)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a tall-ask, but: is there anyway to make this more like "define a method, with this name + this signature", and then have our machinery parse that on the fly into the proper plugin structure?

as written, this is asking a lot for people. anyway to use importlib here? ala, https://docs.python.org/3/library/importlib.html#importing-programmatically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though, i do like the ability to specify multiple methods once and then switch between them depending upon config key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could look into that, although this seems more appropriate for when we get to the stage of a general purpose plugin section? Might be a bit beyond the scope of this PR.

register_initial_conditions_plugin(TwoCompartmentInitialConditions)
```

Note that in the above plugin an additional argument has been added to `create_initial_conditions` whose name, `alpha`, corresponds to the SEIR parameter name. You can finally use this plugin in your configuration file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. also feels like we should be telling people here what happens if they typo alfa instead - presumably some "could not find parameter ..." error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's here:

msg = (
f"The requested parameter, '{param_name}', not "
f"found in the arguments of {f.__name__}. The "
f"available parameters are: {pdata.keys()}."
)
raise ValueError(msg)

I can add it to the todo list to add this to the documentation with an example.

initial_conditions:
method: TwoCompartment
module: model_input/my_initial_conditions.py
weight: 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight is a weird special parameter.

i need to think a bit about this one - is it holdover from old interface?

rather than a class field, should weight be handled more like ...you can put any variables that only your initial conditions script uses in this config section and they'll get passed in just like normal parameters?

Copy link
Contributor Author

@TimothyWillard TimothyWillard Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we want to pass the configuration as arguments rather than having them as attributes of the class that would require a rethink of the current pydantic usage. I'll add it to the todo list to investigate.

Downside to switching to arguments would be you loose any config parse time validations, instead errors like weight being -0.1 would only be discovered at runtime.

initial_conditions:
method: plugin
plugin_file_path: model_input/my_initial_conditions.py
method: Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meant to change this example? or should there be another example that shows default vs plugin vs ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, meant to change temporarily since the way plugins are used is different.

@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch from f976a9d to 606f901 Compare August 18, 2025 13:04
Added a custom warning for potential configuration file issues. This
warning is desirned to indicate configuration issues that do not prevent
flepiMoP from running but may produce unexpected behavior or that the
user made a mistake of some kind.
Added an internal helper, `_invert_into_dict`, to `gempyor.utils` to
invert a single value into multiple keys for a dictionary whose values
are a list of some type. This helper does this efficiently by modifying
the dictionary in place.
Added the `Compartments.subset_dataframe` method to easily filter a
provided pandas DataFrame with specific column structure in a loop.
Made a first pass refactoring initial conditions by address most, but
not all, of the pylint suggestions. Includes reordering of logic,
addition of docstrings, code formatting, etc. but importantly no changes
to logic. Other larger changes include:

* Renamed the `InitialConditionsFactory` function to 
  `initial_conditions_factory` to fit pythonic coding style.
* Unit tests and minor refactoring of the `check_population` internal
  validation utility, namely simplifying input arguments.
* Added unit tests for the `read_initial_condition_from_tidydataframe`
  helper function for parsing ICs from a "tidy" data frame.
Moved initial conditions from being a submodule of `gempyor` to being a
`subpackage`. Results in no net interface changes, opens the door for
more invasive refactoring.
Simplifed the internals of `check_population` by leaning on numpy's
functionality more.
First pass at modular initial conditions, enabled by the
`InitialConditionsABC` that defines the interface for an initial
conditions module. Also implmented `DefaultInitialConditions` as an
example of how to use said ABC.
Added `gempyor.model_meta.ModelMeta` class as a data container for core
model metadata and providing basic filesystem operations independent of
the `ModelInfo` mega class.

Use `ModelMeta` class to validate model metadata for `ModelInfo` and do
some light filesystem setup. Currently reassigning attributes from
`ModelMeta` back to `ModelInfo` for backwards compatibility.
Extracted the `TimeSetup` class from `gempyor.model_info` and placed it
into the new `gempyor.time_setup` module and also refactored to use a
pydantic representation instead of the previous plain class
representation.
First draft of
`gempyor.initial_conditions.FileOrFolderDrawInitialConditions`. Limited
on documentation and unit tests, but provides the
'SetInitialConditions', 'SetInitialConditionsFolderDraw', 'FromFile',
and 'InitialConditionsFolderDraw' initial condition methods. Also had to
take modified versions of helper functions from `gempyor._readers` that
need to be heavily refactored.
Replaced the previous `InitialConditions` monolithic class with modular
classes that subclass `InitialConditionsABC`. These subclasses then
register themselves as plugins via `register_initial_conditions_plugin`.
These are then accessible via the `initial_conditions_from_plugin`
function. Initial condition plugins can then be reset using the
`_reset_initial_conditions_plugins` helper, mostly for unit testing
purposes.

Furthermore, this has been integrated directly into the rest of
`gempyor`. The `get_initial_conditions_data` method was added to the
model info object to more easily access initial conditions, similar to
how seeding is done.
Removed an unused helper module that contained utilities for the prior
initial conditions implementation. Those utilities have been moved into
`gempyor.initial_conditions._file_or_folder_draw` as internal helpers.
Also made corresponding moves/edits to the unit tests.
In particular testing the outputs of the `create_initial_conditions`
method provided by
`gempyor.initial_conditions.DefaultInitialConditions`.
Added unit tests for the function used to generate an
`InitialConditionsABC` instance, namely:
* Checking that `DefaultInitialConditions` is returned when 'method' is
  not specified, with a warning, and
* A `ValueError` is raised when the 'specified' method is not found in
  the registered initial condition plugins.
Rather than requiring an explicit path prefix value be provided make use
of the 'path_prefix' attribute from `gempyor.model_meta.ModelMeta` for
filesystem interactions.
Initial unit tests for the
`gempyor.initial_conditions.FileOrFolderDrawInitialConditions` class.
Began primarily with initialization errors and warnings.
* Added `subset_dataframe` to `gempyor.compartments.Compartments` for
  looping over a dataframe subsetting by compartments,
* Added internal utility `_invert_into_dict` for efficiently inserting
  one value to many keys in a dict,
* Allow 'rest' allocations to work when using the 'Set*' methods for
  initial conditions, and
* Refactored `read_initial_condition_from_seir_output` to also use
  `Compartments.subset_dataframe`.
Made modifications to the
`InitialConditionsABC.create_initial_conditions` abstract method to
allow it to accept an instance of `Paramters` (represents information
about SEIR parameters in the abstract) and a numpy array of actualized
parameter values. Subsequent changes were made to:

* The two implementations of `InitialConditionsABC`,
* The `InitialConditionsABC.get_initial_conditions` wrapper method,
* The `ModelInfo.get_initial_conditions_data` wrapper method, and
* The various calls to `get_initial_conditions_data` in the seir unit
  tests and `gempyor.seir/inference` modules.

Other minor test adjustments were made to `tests/initial_conditions/` to
provide dummy values for those parameters. Unused by current
implementations.
Added `gempyor.parameters._inspect_requested_parameters` helper to
inspect requested parameters from a provided function. This function
will then extract those requested parameters from the given `pdata`
(structured like `Parameters.pdata` dict attribute) and `p_draw` numpy
array (like the return of `Parameters.parameters_quick_draw`). Also
added corresponding documentation and unit tests.
Updated the `create_initial_conditions` method, implemented by
subclasses of `InitialConditionsABC`, to accept `*args` for parsed
parameters instead of `parameters` and `p_draw` directly. These
requested parameters get parsed out in the `get_initial_conditions`
method used by `gempyor` to access initial conditions. Minor code
adjustments were made to `DefaultInitialConditions` and
`FileOrFolderDrawInitialConditions` to accommodate and a small test was
added for a custom plugin that requests a parameter argument.
When using initial conditions custom plugins, like ones that can be
inferred, the module has to be related when the multiprocessing start
method is 'spawn' or 'forkserver' because the child process does not get
a copy of the parent process' memory and therefore does not have the
dynamically loaded plugin available at start. This can lead to an unsual
'ModuleNotFound' exception with a traceback relating to deserializing
the pickled memory state. Luckily, the fix is simple, just need to
reload the custom plugin when the child process starts. This should be
kept in mind, and likely refactored when addressing #421.
Made minor adjustments to the `inference` R package to support initial
condition plugins. Mostly adjusting logic around which code path to go
down based on the `method` field of the `initial_conditions` section.

Also fixed a bug in the `flepimop-inference-main` command where it was
trying to call the `flepimop-inference-slot` command with `R` rather
than directly invoking it.
Modified the example contained in `examples/simple_usa_statelevel/` to
infer the initial conditions with a custom plugin. This custom initial
conditions plugin is very simple, just figuring out what proportion of
the population to place into the susceptible vs the infected
compartments. However, this is a great way to introduce users into
inferable initial conditions.

Also added configuration file
`examples/tutorials/config_sample_2pop_inference_with_initial_conditions.yml`
to demonstrate how to do a similar procedure with R inference.
Updated the initial conditions documentation by:
* Split initial conditions vs seeding section into its own page,
* Heavily refactored existing initial condiditions guide,
* Added a table overview of the builtin methods,
* Documented how to implement a plugin and how to request parameters.
@TimothyWillard TimothyWillard force-pushed the refactor-initial-conditions branch from 606f901 to c8eafd3 Compare August 22, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. high priority High priority. next release Marks a PR as a target to include in the next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request]: gempyor simulate / inference refactoring to enable fitting initial conditions

4 participants