Skip to content

add vectorized ops for solver logic and testing to Dev#592

Open
MacdonaldJoshuaCaleb wants to merge 49 commits intodevfrom
dev_add_vectorization
Open

add vectorized ops for solver logic and testing to Dev#592
MacdonaldJoshuaCaleb wants to merge 49 commits intodevfrom
dev_add_vectorization

Conversation

@MacdonaldJoshuaCaleb
Copy link
Collaborator

@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb commented Jul 30, 2025

Describe your changes

This pull request introduces a fully vectorized and Numba-accelerated backend for epidemic simulation logic, located in vectorization_experiments.py. These functions serve as a performance-oriented alternative to the current solver implementation, which relies heavily on nested for-loops and dynamic memory access. The new backend explicitly separates the following components for modular testing and future optimization:

  • Proportion sum and exponentiation logic
  • Transition rate computation from parameter arrays (including symbolic expressions)
  • Transition amount calculation using deterministic rules
  • Flux assembly
  • Solver stepping with fixed dt

All operations are performed over preallocated NumPy arrays and are compatible with fastmath=True in Numba. This enables large performance gains while maintaining model fidelity.

A corresponding test suite is added under tests/steps_rk4/test_vector_build.py. It validates shape agreements, parameter–transition consistency, correct handling of symbolic expressions, and deterministic solver behavior. All tests now pass with real 2-population configuration files as input.

The backend is intended to be drop-in compatible with the current model parsing logic and configuration structure, with no required changes to existing user-facing YAML or CSV files.

A follow-up step will involve benchmarking this vectorized backend against the legacy solver to quantify speedup and memory improvements. We should also discuss where and how to surface this backend in the main package interface, e.g., as a solver="vectorized" option in run_simulation.

RHS function architecture notes:

  • This code constructs a time-dependent RHS function rhs(t, y) by capturing the static structure of the simulation (transitions, mobility, proportions, etc.) into a closure.
  • This function is built once via build_rhs(...) and reused across all time steps — it is not rebuilt at each time point.

Static components:

The following inputs are assumed time-invariant and can be precomputed during simulation object initialization (or deserialization) and reused:

  • transitions: structure of source/target compartments and parameters
  • proportion_info, transition_sum_compartments: model structure
  • mobility_data, mobility_row_indices, mobility_data_indices: CSR mobility matrix
  • population, proportion_who_move: demographic structure
  • param_expr_lookup: symbolic parameter expressions (optional)
  • Time- and state-varying inputs (t, y) are passed at evaluation time only.
  • In a future class-based refactor (e.g., Simulation), these would naturally be stored as attributes
    (self.transitions, self.mobility_data, etc.), and rhs_fn could be built once and cached in the constructor for reuse.

Does this pull request make any user interface changes?

No changes to user-facing files or CLI behavior are introduced. This PR is strictly additive: it introduces a new experimental backend and test suite without modifying the current simulation pipeline.

@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb self-assigned this Jul 30, 2025
@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb added gempyor Concerns the Python core. medium priority Medium priority. performance Performance related improvements, either runtime or memory. labels Jul 30, 2025
@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb changed the title add vecotirezed ops for solver logic and testing to Dev add vectorized ops for solver logic and testing to Dev Jul 30, 2025
@TimothyWillard TimothyWillard force-pushed the dev_add_vectorization branch from 6b58c58 to c24fe30 Compare July 31, 2025 13:10
Copy link
Contributor

@pearsonca pearsonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few broad questions to start out with + some software engineering asks (type annotation + some docstring)

# === 3. Binomial Stochastic (NumPy) ===


def compute_transition_amounts_numpy_binomial(source_numbers, total_rates, dt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this deal with competing rates? i.e. same source, different sink

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not handle competing rates correctly when multiple transitions share the same source state. It draws each transition independently based on its own binomial probability, without adjusting for overlapping source pools.

I'm currently assuming non-overlapping transitions or trusting that the input transitions are already disjoint per source. If we want to enforce probabilistic consistency under shared-source competition, we’d need to replace this with a grouped multinomial draw per source–node pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think eventually we want to deal with competing rates.

in the interim, a fast-fail => informative error when there are competing rates would be ideal

Comment on lines 324 to 348
@njit(parallel=True, fastmath=True)
def compute_transition_amounts_parallel(
source_numbers: np.ndarray, total_rates: np.ndarray, dt: float
) -> np.ndarray:
"""
Compute deterministic transition amounts in parallel using Euler integration logic.

This is used when the workload exceeds a defined threshold and parallelization
is expected to provide a speedup.

Args:
source_numbers: Array of shape (n_transitions, n_nodes), source population counts.
total_rates: Array of shape (n_transitions, n_nodes), transition rates per unit time.
dt: Time step size.

Returns:
Array of shape (n_transitions, n_nodes) with computed transition amounts.
"""
n_transitions, n_nodes = total_rates.shape
amounts = np.zeros((n_transitions, n_nodes))
for t_idx in prange(n_transitions):
for node in range(n_nodes):
rate = 1 - np.exp(-dt * total_rates[t_idx, node])
amounts[t_idx, node] = source_numbers[t_idx, node] * rate
return amounts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a question than a comment per se since I'm not familiar with numba, but is there a reason with this function body could not just be:

source_numbers * (1.0 - np.exp(-dt * total_rates))

At least, that's how I would write this with just numpy. Does numba do some sort of optimization of the looping here? And if so is there documentation somewhere describing that? Or does numba not play nice with numpy's builtin vectorization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to switch to an element-wise op, but i'd expect so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.exp is element wise, if that's what you're concerned about. scipy.linalg.expm computes a matrix exponential.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the * op - or is that by default the by-element op?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the * op - or is that by default the by-element op?

Yes, that's also element wise. Matrix multiplication is done via either numpy.matmul or the @ operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unlike matlab where operations are default matrix wise most python ops are default element wise, regarding @TimothyWillard question, yes this has to do with how numba optimizes looping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unlike matlab where operations are default matrix wise most python ops are default element wise, regarding @TimothyWillard question, yes this has to do with how numba optimizes looping

Gotcha, also might make sense to note that in this function's docstring along with an example asserting equivalence?

Comment on lines 19 to 55
@pytest.fixture(scope="module")
def modelinfo_from_config(tmp_path_factory):
tmp_path = tmp_path_factory.mktemp("model_input")

original_dir = Path(__file__).resolve()
while original_dir.name != "flepimop":
original_dir = original_dir.parent
tutorial_dir = original_dir.parent / "examples/tutorials"

input_dir = tmp_path / "model_input"
input_dir.mkdir()

input_files = [
"geodata_sample_2pop.csv",
"ic_2pop.csv",
"mobility_sample_2pop.csv",
]
for fname in input_files:
shutil.copyfile(tutorial_dir / "model_input" / fname, input_dir / fname)

config_file = "config_sample_2pop_modifiers.yml"
config_path = tmp_path / config_file
shutil.copyfile(tutorial_dir / config_file, config_path)

config = confuse.Configuration("TestModel", __name__)
config.set_file(str(config_path))

return (
ModelInfo(
config=config,
config_filepath=str(config_path),
path_prefix=str(tmp_path),
setup_name="sample_2pop",
seir_modifiers_scenario="Ro_all",
),
config,
)
Copy link
Contributor

@TimothyWillard TimothyWillard Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of pytest's fixtures for DRY, could this be moved into gempyor.testing (contains testing utilities)? Maybe also could wrap setup_example_from_tutorials to get rid of input file setup. Same applies to the other fixture.

}


def test_fastmath_equivalence_prod(model_and_inputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unnecessary to fully document unit tests (args, return, etc.), but a one line comment of the goal plus typing is helpful.

fn_safe = njit(fastmath=False)(fn_fast.py_func)
out1 = fn_safe(arr)
out2 = fn_fast(arr)
assert np.allclose(out1, out2, rtol=1e-5, atol=1e-7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question then a comment, but why are we adjusting numpy's default tolerances? I know there's some guidance in the documentation, https://numpy.org/doc/2.1/reference/generated/numpy.allclose.html, on regimes where the defaults aren't suitable. Is this one of those?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loosened the atol slightly here to accommodate floating-point differences arising from fastmath=True, particularly under accumulation or exponential transforms. The goal is to catch meaningful deviations while allowing for safe reordering-related variance. The chosen values (rtol=1e-5, atol=1e-7) strike a balance: they're stricter than common simulation tolerances, but loose enough to avoid false positives due to fused operations or reassociation. One might argue that it should be lessened even more, even at this tolerance - would it make a menagiful difference to actual simulations? thoughts here @pearsonca?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. In that case might be helpful to use @pytest.mark.parameterize to scale the array being tested to make sure these tolerances hold up across input sizes. Also can probably do away with the model_and_inputs fixture request because we just need some two dimensional array as an input for prod_along_axis0.

Comment on lines 254 to 256
probs = 1.0 - np.exp(-dt * total_rates)
draws = np.random.binomial(source_numbers.astype(np.int32), probs)
return draws.astype(np.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the casting to float/int 32 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unneeded, removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's still present?



@njit(fastmath=True)
def prod_along_axis0(arr_2d: np.ndarray) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@MacdonaldJoshuaCaleb MacdonaldJoshuaCaleb Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy's prod isn't compatible with numba @pearsonca

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Maybe a note in the documentation would be helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(entirely possibly i'm misunderstanding what "supported" here means)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried to use it numba threw an error. I'll double check the documentation and make sure it wasn't just me being dumb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do see "supported without any arguments" which suggests the axis part doesn't work - i.e. can't just do it at top level - but could still use for the innermost loop, tho?

@MacdonaldJoshuaCaleb
Copy link
Collaborator Author

@pearsonca @TimothyWillard

ModelBuilder Overview

The ModelBuilder class is a central configuration object for assembling all core inputs to a compartmental modeling run. It is designed to incrementally replace the legacy ModelInfo class and its spaghetti-like dependencies.

Responsibilities

  • Initial Condition Parsing
    Loads a tidy-format CSV and produces a (n_compartments, n_nodes) initial_array with aggregation of duplicate entries and configurable handling of unknown labels.

  • Compartment Mappings
    Exposes:

    • compartment_to_index: maps compartment names to integers
    • index_to_compartment: reverse mapping
  • Validation via Pydantic

    • Uses pydantic.BaseModel for strict typing, error reporting, and defaulting
    • The @model_validator(mode="after") hook is used to construct derived fields like initial_array automatically after basic input validation

Current Fields

Field Description
config_path Path to YAML config (used for metadata)
ic_file Path to initial condition CSV
compartment_labels Ordered list like ["S", "E", "I", "R"]
n_nodes Number of subpopulations
node_col, comp_col, count_col CSV column names
allow_missing_* Flags for skipping vs raising on unknowns
allowed_node_labels Optional whitelist of valid node labels
proportional_ic Stub for future support (not implemented yet)
initial_array Constructed output, excluded from serialization
compartment_to_index, index_to_compartment Bidirectional lookup tables

Already Tested

  • Shape, type, and content of the initial_array
  • Validation of missing compartments and unknown node labels
  • Proper aggregation of duplicate (node, compartment) entries
  • Pydantic validation behavior
  • Support for string- and int-labeled nodes
  • Flexibility in column naming

Roadmap: Replacing ModelInfo

ModelBuilder is the foundation for a broader refactor that will decompose and replace the monolithic ModelInfo class and its tangled dependencies. This modularization aims to make the entire system:

  • Simpler to test
  • More declarative/configurable
  • Easier to extend for new models/scenarios

Planned Object Graph

ModelBuilder
├── initial_array
├── compartment_to_index
├── mobility_matrix (future)
├── population_vector (future)
├── CompartmentModel (future)
├── ParameterBuilder (future)
└── TimeGrid (future)

This approach also removes the need to pass around confuse.Configuration objects, breaking the brittle runtime config dependency tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gempyor Concerns the Python core. medium priority Medium priority. performance Performance related improvements, either runtime or memory.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants