`CBC_FLUDS` uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

eappen-nelluvelil · 2025-10-03T20:45:22Z

This PR introduces a free-list memory pool allocator using the Boost Pool's simple segregated storage class to manage angular flux data within the CBC_FLUDS class. During a sweep, as soon as the CBC algorithm sends a cell's angular flux data to all of its local downwind dependencies, the storage for that cell's angular flux data can be reused for another
local cell that has not yet been solved.

For a given CBC_SPDS, the CBC algorithm currently stores angular fluxes for all cells in the CBC_SPDS, for angles in the quadrature, and for all groups in a given group set, which results in high memory intensity during sweeps. The changes in this commit include the following:

The CBC_SPDS class has a new method to simulate a local sweep, which determines the minimum number of free slots (blocks of memory) that the CBC_FLUDS class needs to properly store cell angular fluxes during a given sweep. The simulated sweep iterates through the CBC_SPDS's task list in the same manner as in the CBC_AngleSet::AngleSetAdvance method, and tracks when cells need to have a free slot assigned to them, and when cells with associated slots can return said slots back to the pool. This is done for cells that have purely local upwind and downwind dependencies. For cells that have either remote upwind or remote downwind dependencies, the simulated
sweep sets aside a permanent slot for each one of these cells. Cells with either remote upwind or remote downwind dependencies cannot cannot have their corresponding memory blocks be returned to the free pool even after all of their local downwind dependencies have received the appropriate angular fluxes. This is due to the non-deterministic and asynchronous nature of the CBC algorithm's communication patterns. A simulated sweep can neither determine ahead of time when a cell with remote upwind dependencies will get its necessary angular fluxes nor determine ahead of time when a cell with remote downwind dependencies communicates its fluxes to said dependencies.
The CBC_FLUDS class uses the Boost Pool's library simple segregated storage class to implement a free-list pool allocator. Using the minimum number of slots calculated by the simulated sweep, the CBC_FLUDS class constructs a backing buffer with as many elements as the product of the minimum number of pool slots, the number of angles in the associated angle set, and the number of groups in the associated group set. The simple segregated storage object manages this backing buffer, and uses an internal free-list to hand out free pool slots and return slots back to the pool for cells whose angular flux data no longer needs to be stored during the sweep.
The CBC_AngleSet::AngleSetAdvance method uses the CBC_FLUDS::Allocate method to associate a free slot to a cell that is ready to be solved. After a cell has been swept, its local predecessors have their dependency consumption counts incremented. When a local predecessor's dependency consumption count equals or exceeds its local downwind dependency count, its corresponding slot is returned back to the pool via the CBC_FLUDS::Deallocate method.

Below is a strong-scaling plot for the CBC algorithm run with the CBC_FLUDS class with this pool allocator on LLNL Dane, whose node specs are given below.

112 Intel Sapphire Rapids cores/node
105MB cache/CPU
2.28GB DDR5 per core

The scaling study was run on 1, 2, 4, 8, 16, 32, 64, and 128 nodes, with 64 ranks per node (rpn).

opensn-strong-scaling-cbc-pool-allocator

Below is a plot showing the strong-scaling results from PR 808 and this PR for 2, 4, 8, 16, 32, and 128 nodes. The 1-node results for both PRs have been omitted on this plot as PR 808 is not able to run on 1 node on LLNL Dane due to memory limitations.

Below is a weak-scaling plot for the CBC algorithm run with the CBC_FLUDS class with this pool allocator on LLNL Dane.

The weak-scaling study was run on 1, 2, 4, 8, 16, 32, and 64 nodes, with 64 rpn.

eappen-nelluvelil · 2025-10-03T20:50:35Z

@wdarylhawkins I'd like to request a review for this PR. I forgot to add assignees to this PR, and I'm not able to edit the PR to add reviewers.

ragusa

@wdhawkins I have no problem approving this. However, for this type of PR, should we get in the habit of copy/pasting scaling results in the discussion/conversation dialog box of the PR? I kinda feel I need to know whether the results changed or not. And if this does not apply here, just tell me as well.

ragusa · 2025-10-06T18:48:30Z

@wdhawkins I have no problem approving this. However, for this type of PR, should we get in the habit of copy/pasting scaling results in the discussion/conversation dialog box of the PR? I kinda feel I need to know whether the results changed or not. And if this does not apply here, just tell me as well.

My bad. That information was already present. I just didn't see it without scrolling more.

eappen-nelluvelil · 2025-10-09T17:04:11Z

@wdhawkins I made the following changes:

Updated the UpwindPsi, OutgoingPsi, NLUpwindPsi, NLOutgoingPsi methods in CBC_FLUDS, as well as their calls in cbc_sweep_chunk.cc, to more closely mirror how they're implemented in AAH_FLUDS and how they are used in aah_sweep_chunk.cc

…ost Pool's simple segregated storage class to manage angular flux data within the `CBC_FLUDS` class. During a sweep, as soon as the CBC algorithm sends a cell's angular flux data to all of its local downwind dependencies, the storage for that cell's angular flux data can be reused for another local cell that has not yet been solved. For a given `CBC_SPDS`, the CBC algorithm currently stores angular fluxes for all cells in the `CBC_SPDS`, for angles in the quadrature, and for all groups in a given group set, which results in high memory intensity during sweeps. The changes in this commit include the following: - The `CBC_SPDS` class has a new method to simulate a local sweep, which determines the minimum number of free slots (blocks of memory) that the `CBC_FLUDS` class needs to properly store cell angular fluxes during a given sweep. The simulated sweep iterates through the `CBC_SPDS`'s task list in the same manner as in the `CBC_AngleSet::AngleSetAdvance` method, and determines the peak number of cells whose angular fluxes have to be stored during a sweep. This is done for cells that have purely local upwind and downwind dependencies. For cells that have either remote upwind or remote downwind dependencies, the simulated sweep sets aside a slot for each one of these cells. Cells with either remote upwind or remote downwind dependencies cannot cannot have their corresponding memory blocks be returned to the free pool after all of their local downwind dependencies have received the appropriate angular fluxes. This is due to the non-deterministic and asynchronous nature of the CBC algorithm's communication patterns. A simulated sweep can neither determine ahead of time when a cell with remote upwind dependencies will get its necessary angular fluxes nor determine ahead of time when a cell with remote downwind dependencies communicates its fluxes to said dependencies. - The `CBC_FLUDS` class uses the Boost Pool's library simple segregated storage class to implement a free-list pool allocator. Using the minimum number of slots calculated by the simulated sweep, the `CBC_FLUDS` class constructs a backing buffer with as many elements as the product of the minimum number of pool slots, the number of angles in the associated angle set, and the number of groups in the associated group set. The simple segregated storage object manages this backing buffer, and uses an internal free-list to hand out free pool slots and take back pool slots for cells whose angular flux data no longer needs to be stored during the sweep. - The `CBC_AngleSet::AngleSetAdvance` method uses the `CBC_FLUDS::Allocate` method to associate a free slot to a cell that is ready to be solved. After a cell has been swept, its local predecessors have their dependency consumption counts incremented. When a local predecessor's dependency consumption count equals or exceeds its local downwind dependency count, its corresponding slot is returned back to the pool via the `CBC_FLUDS::Deallocate` method.

eappen-nelluvelil marked this pull request as draft October 3, 2025 20:46

eappen-nelluvelil marked this pull request as ready for review October 3, 2025 20:46

wdhawkins requested review from andrsd, ragusa and wdhawkins October 4, 2025 14:48

ragusa reviewed Oct 6, 2025

View reviewed changes

ragusa self-requested a review October 6, 2025 18:48

ragusa approved these changes Oct 6, 2025

View reviewed changes

andrsd approved these changes Oct 6, 2025

View reviewed changes

eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from c049b4d to d58c250 Compare October 9, 2025 16:52

eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from d58c250 to 5309776 Compare October 9, 2025 17:56

eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch 6 times, most recently from 23d5694 to 2239fb6 Compare October 21, 2025 17:37

eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from 2239fb6 to 8e43370 Compare October 22, 2025 20:25

wdhawkins marked this pull request as draft November 8, 2025 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`CBC_FLUDS` uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

`CBC_FLUDS` uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

Uh oh!

eappen-nelluvelil commented Oct 3, 2025 •

edited

Loading

Uh oh!

eappen-nelluvelil commented Oct 3, 2025

Uh oh!

ragusa left a comment

Uh oh!

ragusa commented Oct 6, 2025

Uh oh!

eappen-nelluvelil commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CBC_FLUDS uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

Are you sure you want to change the base?

CBC_FLUDS uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

Uh oh!

Conversation

eappen-nelluvelil commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eappen-nelluvelil commented Oct 3, 2025

Uh oh!

ragusa left a comment

Choose a reason for hiding this comment

Uh oh!

ragusa commented Oct 6, 2025

Uh oh!

eappen-nelluvelil commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`CBC_FLUDS` uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

`CBC_FLUDS` uses memory pool allocator to minimize memory usage for cell angular fluxes during sweeps #790

eappen-nelluvelil commented Oct 3, 2025 •

edited

Loading