expand model specification possibilities for structured populations #14
expand model specification possibilities for structured populations #14MacdonaldJoshuaCaleb wants to merge 22 commits intomainfrom
Conversation
…DDA/op_system into expand_specifcation_options
Added a complex structure section with YAML configuration for a vaccination model, detailing states, aliases, and equations.
…DDA/op_system into expand_specifcation_options
Add transitions version section with detailed YAML structure for vaccination and infection states.
This reverts commit 7d94512.
TimothyWillard
left a comment
There was a problem hiding this comment.
A couple of observations:
- I am concerned about the lack of documentation/unit tests for private helpers. It seems like the tests cover a broad surface area by accessing the public entry points, but given the amount of bespoke parsing I would feel much more confident in the behavior if individual parsing components were better tested. Seems like we're on track to end up in a similar state as we were in with https://github.com/HopkinsIDD/flepiMoP where we were unconfident in making changes and parsing anomalies/errors where hard to diagnose and correct.
- What's the point of the
_raise_*indirection? If those helpers really do provide some formatting help, why are they not custom exceptions? Do we expect users to be able to diagnose issues more quickly by adding to the traceback? - I think we could leverage types here to minimize the amount of custom validation that has to be done. Could the
NormalizedRhsbe enhanced by using pydantic? I do not think it is unreasonable to ask users to provide a spec as a structured object rather than an unstructured dict. Could even provide a class method to do the parsing later on, but this seems separate from the process of building a RHS. - It seems like there's some consolidation opportunities between
normalize_transitions_rhsandnormalize_expr_rhs? Looks like there are some overlapping code paths.
These observations are to some extent outside the scope of this PR, but just wanted to note them.
| [[tool.mypy.overrides]] | ||
| module = ["flepimop2.typing"] | ||
| ignore_missing_imports = true |
There was a problem hiding this comment.
What's the issue here? I think ideally we would like external package providers to reliably use flepimop2.typing for typing their own packages. What's the error you get?
| _raise_invalid_expression(detail=f"invalid expression syntax: {exc.msg}") | ||
|
|
||
|
|
||
| def _validate_call(node: ast.Call, *, expr: str) -> None: |
There was a problem hiding this comment.
why not provide node.func directly?
| if y_arr.ndim != 1: | ||
| _raise_state_shape_error(expected="1D array", got=y_arr.shape) | ||
| if y_arr.size != n_state: | ||
| _raise_state_shape_error(expected=f"(n_state={n_state},)", got=y_arr.shape) |
There was a problem hiding this comment.
Couldn't this be consolidated down to one error with y_arr.shape != (n_state,)?
| try: | ||
| val = eval(codeobj, {"__builtins__": _SAFE_BUILTINS}, env) # noqa: S307 | ||
| except NameError as exc: | ||
| _raise_parameter_error(detail=f"unknown symbol in equation: {exc!s}") | ||
| except (ValueError, TypeError, ArithmeticError) as exc: | ||
| _raise_invalid_expression(detail=f"equation evaluation failed: {exc!r}") |
There was a problem hiding this comment.
Why not let it fail? Is there additional value that we provide to the user in terms of debugging or catching the error by catching and reraising?
| @@ -360,32 +419,42 @@ def _make_eval_fn( | |||
| alias_code = _collect_alias_code(aliases) | |||
| eq_code = _collect_eq_code(equations) | |||
|
|
|||
| def eval_fn(t: np.float64, y: Float64Array, **params: object) -> Float64Array: | |||
| y_arr = np.asarray(y, dtype=np.float64) | |||
| if y_arr.ndim != 1: | |||
| _raise_state_shape_error(expected="1D array", got=y_arr.shape) | |||
| if y_arr.size != n_state: | |||
| _raise_state_shape_error(expected=f"(n_state={n_state},)", got=y_arr.shape) | |||
|
|
|||
| def _sum_state(env: Mapping[str, object]) -> np.float64: | |||
| values = [ | |||
| np.float64(float(cast("Any", v))) | |||
| for k, v in env.items() | |||
| if k in name_to_idx | |||
| ] | |||
| return np.float64(sum(values)) | |||
|
|
|||
| def _sum_prefix(prefix: str, env: Mapping[str, object]) -> np.float64: | |||
| values = [ | |||
| np.float64(float(cast("Any", v))) | |||
| for k, v in env.items() | |||
| if k.startswith(prefix) and k in name_to_idx | |||
| ] | |||
| return np.float64(sum(values)) | |||
|
|
|||
| def _build_env( | |||
| t: np.float64, y_arr: Float64Array, params: Mapping[str, object] | |||
| ) -> dict[str, object]: | |||
| env: dict[str, object] = {"np": np, "t": np.float64(t)} | |||
| for s, i in name_to_idx.items(): | |||
| env[s] = np.float64(y_arr[i]) | |||
| env.update(params) | |||
| env["sum_state"] = lambda: _sum_state(env) | |||
| env["sum_prefix"] = lambda prefix: _sum_prefix(str(prefix), env) | |||
| return env | |||
|
|
|||
| def eval_fn(t: np.float64, y: Float64Array, **params: object) -> Float64Array: | |||
| y_arr = _validate_state_vector(np.asarray(y, dtype=np.float64), n_state=n_state) | |||
|
|
|||
| env = _build_env(np.float64(t), y_arr, params) | |||
|
|
|||
| if alias_code: | |||
| env.update(_resolve_aliases(alias_code, base_env=env)) | |||
|
|
|||
| out = np.empty((n_state,), dtype=np.float64) | |||
| for i, codeobj in enumerate(eq_code): | |||
| try: | |||
| val = eval(codeobj, {"__builtins__": _SAFE_BUILTINS}, env) # noqa: S307 | |||
| except NameError as exc: | |||
| _raise_parameter_error(detail=f"unknown symbol in equation: {exc!s}") | |||
| except (ValueError, TypeError, ArithmeticError) as exc: | |||
| _raise_invalid_expression(detail=f"equation evaluation failed: {exc!r}") | |||
| out[i] = np.float64(val) | |||
|
|
|||
| return out | |||
| return _evaluate_equations(eq_code=eq_code, env=env, n_state=n_state) | |||
|
|
|||
| return eval_fn | |||
There was a problem hiding this comment.
I am concerned about the scalability of this approach. Adding additional custom helpers would require modifying this function directly to add them. I understand currently there are only sum_state and sum_prefix but is there a structural change that could be done to extract these out and make this more maintainable?
| if "[" not in entry or "]" not in entry: | ||
| expanded.append(entry) | ||
| continue | ||
| m = re.fullmatch(r"\s*([A-Za-z_][A-Za-z0-9_]*)\[(.+)\]\s*", entry) |
There was a problem hiding this comment.
Non-dynamic regexes, especially along hot paths, can be extracted out to a constant via re.compile
This pull request significantly expands and refactors the expression compilation and evaluation logic in the
op_systemmodule, making it more extensible, robust, and easier to maintain. The main improvements include broadening the set of allowed NumPy functions, introducing helper functions for state aggregation, centralizing validation and evaluation logic, and enhancing test coverage for new features and edge cases.Core logic improvements:
expm1,log2,log10,sin,cos,tan,sinh,cosh,tanh,hypot, andarctan2. Also added a whitelist for helper functions (sum_state,sum_prefix)._validate_callfunction, improving clarity and making it easier to extend allowed function sets in the future. [1] [2]_validate_state_vectorand_evaluate_equations, ensuring consistent error handling and reducing code duplication. [1] [2]New features:
sum_stateandsum_prefixin equations and aliases, allowing aggregation over state variables directly in user expressions.Testing and configuration:
pyproject.tomlto ignore missing imports for theflepimop2.typingmodule in mypy, preventing unnecessary type checking errors during development.See the README.MD for updated API demonstration.
@shauntruelove I have added an asymmetric vaccination example to the README as requested, @pearsonca the ability to have flexible rate specification has also been added. Finally, I have removed the need to have [I1, I2, I3, ...] in both
State:andChain:. NowStatecan simply beState: [S, I, R]and then just use the chain helper.