-
Notifications
You must be signed in to change notification settings - Fork 430
Description
Hi,
I am trying to debug an OOM issue for a multi species mlabec based transport reaction solver. I use an operator split scheme with chemistry integration using amrex's TimeIntegrator framework. I have a 128 x 128 x 64 flat grid problem with 17 species that fails after 1600 iterations using 128 CPUs with error:
slurmstepd: error: Detected 1 oom_kill event in StepId=11904697.0. Some of the step tasks have been OOM Killed.
srun: error: x1007c4s5b1n1: task 56: Out Of Memory
This happens only with sundials integrators (e.g., integration.type="SUNDIALS", integration.sundials.type="DIRK"
integration.sundials.method="ARKODE_BACKWARD_EULER_1_1") and not with amrex's native integrators (integration.type="RungeKutta"
integration.rk.type=3).
Is there something obvious I am missing, like a pointer deletion step? My code is here - https://github.com/NREL/multiscale-biomass-pyrolysis/blob/f0cc31827cec5e42b627287b09737c96da137ee1/TranspReact/src/ScalarSolve.cpp#L33
Thank you very much for your help
Hari