Skip to content

expand model specification possibilities for structured populations #14

Open
MacdonaldJoshuaCaleb wants to merge 22 commits intomainfrom
expand_specifcation_options
Open

expand model specification possibilities for structured populations #14
MacdonaldJoshuaCaleb wants to merge 22 commits intomainfrom
expand_specifcation_options

Conversation

@MacdonaldJoshuaCaleb
Copy link
Collaborator

@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb commented Feb 12, 2026

This pull request significantly expands and refactors the expression compilation and evaluation logic in the op_system module, making it more extensible, robust, and easier to maintain. The main improvements include broadening the set of allowed NumPy functions, introducing helper functions for state aggregation, centralizing validation and evaluation logic, and enhancing test coverage for new features and edge cases.

Core logic improvements:

  • Expanded the set of allowed NumPy functions in expressions to include additional mathematical, trigonometric, and geometry-related functions such as expm1, log2, log10, sin, cos, tan, sinh, cosh, tanh, hypot, and arctan2. Also added a whitelist for helper functions (sum_state, sum_prefix).
  • Refactored call validation logic into a dedicated _validate_call function, improving clarity and making it easier to extend allowed function sets in the future. [1] [2]
  • Centralized state vector validation and equation evaluation into _validate_state_vector and _evaluate_equations, ensuring consistent error handling and reducing code duplication. [1] [2]

New features:

  • Added support for reducer helper functions sum_state and sum_prefix in equations and aliases, allowing aggregation over state variables directly in user expressions.
  • Updated the environment-building logic to inject these new helper functions for use in compiled expressions.

Testing and configuration:

  • Added comprehensive tests for the expanded NumPy function whitelist, new helper functions, and stricter function call validation, ensuring correct behavior and preventing unauthorized function usage. [1] [2]
  • Updated the pyproject.toml to ignore missing imports for the flepimop2.typing module in mypy, preventing unnecessary type checking errors during development.

See the README.MD for updated API demonstration.

@shauntruelove I have added an asymmetric vaccination example to the README as requested, @pearsonca the ability to have flexible rate specification has also been added. Finally, I have removed the need to have [I1, I2, I3, ...] in both State: and Chain:. Now State can simply be State: [S, I, R] and then just use the chain helper.

@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb self-assigned this Feb 12, 2026
@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb added the enhancement New feature or request label Feb 12, 2026
@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb marked this pull request as draft February 12, 2026 15:54
@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb marked this pull request as ready for review February 17, 2026 18:07
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of observations:

  1. I am concerned about the lack of documentation/unit tests for private helpers. It seems like the tests cover a broad surface area by accessing the public entry points, but given the amount of bespoke parsing I would feel much more confident in the behavior if individual parsing components were better tested. Seems like we're on track to end up in a similar state as we were in with https://github.com/HopkinsIDD/flepiMoP where we were unconfident in making changes and parsing anomalies/errors where hard to diagnose and correct.
  2. What's the point of the _raise_* indirection? If those helpers really do provide some formatting help, why are they not custom exceptions? Do we expect users to be able to diagnose issues more quickly by adding to the traceback?
  3. I think we could leverage types here to minimize the amount of custom validation that has to be done. Could the NormalizedRhs be enhanced by using pydantic? I do not think it is unreasonable to ask users to provide a spec as a structured object rather than an unstructured dict. Could even provide a class method to do the parsing later on, but this seems separate from the process of building a RHS.
  4. It seems like there's some consolidation opportunities between normalize_transitions_rhs and normalize_expr_rhs? Looks like there are some overlapping code paths.

These observations are to some extent outside the scope of this PR, but just wanted to note them.

Comment on lines +85 to +87
[[tool.mypy.overrides]]
module = ["flepimop2.typing"]
ignore_missing_imports = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue here? I think ideally we would like external package providers to reliably use flepimop2.typing for typing their own packages. What's the error you get?

_raise_invalid_expression(detail=f"invalid expression syntax: {exc.msg}")


def _validate_call(node: ast.Call, *, expr: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not provide node.func directly?

Comment on lines +381 to +384
if y_arr.ndim != 1:
_raise_state_shape_error(expected="1D array", got=y_arr.shape)
if y_arr.size != n_state:
_raise_state_shape_error(expected=f"(n_state={n_state},)", got=y_arr.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be consolidated down to one error with y_arr.shape != (n_state,)?

Comment on lines +401 to +406
try:
val = eval(codeobj, {"__builtins__": _SAFE_BUILTINS}, env) # noqa: S307
except NameError as exc:
_raise_parameter_error(detail=f"unknown symbol in equation: {exc!s}")
except (ValueError, TypeError, ArithmeticError) as exc:
_raise_invalid_expression(detail=f"equation evaluation failed: {exc!r}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let it fail? Is there additional value that we provide to the user in terms of debugging or catching the error by catching and reraising?

Comment on lines 411 to 459
@@ -360,32 +419,42 @@ def _make_eval_fn(
alias_code = _collect_alias_code(aliases)
eq_code = _collect_eq_code(equations)

def eval_fn(t: np.float64, y: Float64Array, **params: object) -> Float64Array:
y_arr = np.asarray(y, dtype=np.float64)
if y_arr.ndim != 1:
_raise_state_shape_error(expected="1D array", got=y_arr.shape)
if y_arr.size != n_state:
_raise_state_shape_error(expected=f"(n_state={n_state},)", got=y_arr.shape)

def _sum_state(env: Mapping[str, object]) -> np.float64:
values = [
np.float64(float(cast("Any", v)))
for k, v in env.items()
if k in name_to_idx
]
return np.float64(sum(values))

def _sum_prefix(prefix: str, env: Mapping[str, object]) -> np.float64:
values = [
np.float64(float(cast("Any", v)))
for k, v in env.items()
if k.startswith(prefix) and k in name_to_idx
]
return np.float64(sum(values))

def _build_env(
t: np.float64, y_arr: Float64Array, params: Mapping[str, object]
) -> dict[str, object]:
env: dict[str, object] = {"np": np, "t": np.float64(t)}
for s, i in name_to_idx.items():
env[s] = np.float64(y_arr[i])
env.update(params)
env["sum_state"] = lambda: _sum_state(env)
env["sum_prefix"] = lambda prefix: _sum_prefix(str(prefix), env)
return env

def eval_fn(t: np.float64, y: Float64Array, **params: object) -> Float64Array:
y_arr = _validate_state_vector(np.asarray(y, dtype=np.float64), n_state=n_state)

env = _build_env(np.float64(t), y_arr, params)

if alias_code:
env.update(_resolve_aliases(alias_code, base_env=env))

out = np.empty((n_state,), dtype=np.float64)
for i, codeobj in enumerate(eq_code):
try:
val = eval(codeobj, {"__builtins__": _SAFE_BUILTINS}, env) # noqa: S307
except NameError as exc:
_raise_parameter_error(detail=f"unknown symbol in equation: {exc!s}")
except (ValueError, TypeError, ArithmeticError) as exc:
_raise_invalid_expression(detail=f"equation evaluation failed: {exc!r}")
out[i] = np.float64(val)

return out
return _evaluate_equations(eq_code=eq_code, env=env, n_state=n_state)

return eval_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the scalability of this approach. Adding additional custom helpers would require modifying this function directly to add them. I understand currently there are only sum_state and sum_prefix but is there a structural change that could be done to extract these out and make this more maintainable?

if "[" not in entry or "]" not in entry:
expanded.append(entry)
continue
m = re.fullmatch(r"\s*([A-Za-z_][A-Za-z0-9_]*)\[(.+)\]\s*", entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-dynamic regexes, especially along hot paths, can be extracted out to a constant via re.compile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants