Skip to content

Conversation

@eappen-nelluvelil
Copy link
Contributor

@eappen-nelluvelil eappen-nelluvelil commented Oct 3, 2025

This PR introduces a free-list memory pool allocator using the Boost Pool's simple segregated storage class to manage angular flux data within the CBC_FLUDS class. During a sweep, as soon as the CBC algorithm sends a cell's angular flux data to all of its local downwind dependencies, the storage for that cell's angular flux data can be reused for another
local cell that has not yet been solved.

For a given CBC_SPDS, the CBC algorithm currently stores angular fluxes for all cells in the CBC_SPDS, for angles in the quadrature, and for all groups in a given group set, which results in high memory intensity during sweeps. The changes in this commit include the following:

  • The CBC_SPDS class has a new method to simulate a local sweep, which determines the minimum number of free slots (blocks of memory) that the CBC_FLUDS class needs to properly store cell angular fluxes during a given sweep. The simulated sweep iterates through the CBC_SPDS's task list in the same manner as in the CBC_AngleSet::AngleSetAdvance method, and tracks when cells need to have a free slot assigned to them, and when cells with associated slots can return said slots back to the pool. This is done for cells that have purely local upwind and downwind dependencies. For cells that have either remote upwind or remote downwind dependencies, the simulated
    sweep sets aside a permanent slot for each one of these cells. Cells with either remote upwind or remote downwind dependencies cannot cannot have their corresponding memory blocks be returned to the free pool even after all of their local downwind dependencies have received the appropriate angular fluxes. This is due to the non-deterministic and asynchronous nature of the CBC algorithm's communication patterns. A simulated sweep can neither determine ahead of time when a cell with remote upwind dependencies will get its necessary angular fluxes nor determine ahead of time when a cell with remote downwind dependencies communicates its fluxes to said dependencies.
  • The CBC_FLUDS class uses the Boost Pool's library simple segregated storage class to implement a free-list pool allocator. Using the minimum number of slots calculated by the simulated sweep, the CBC_FLUDS class constructs a backing buffer with as many elements as the product of the minimum number of pool slots, the number of angles in the associated angle set, and the number of groups in the associated group set. The simple segregated storage object manages this backing buffer, and uses an internal free-list to hand out free pool slots and return slots back to the pool for cells whose angular flux data no longer needs to be stored during the sweep.
  • The CBC_AngleSet::AngleSetAdvance method uses the CBC_FLUDS::Allocate method to associate a free slot to a cell that is ready to be solved. After a cell has been swept, its local predecessors have their dependency consumption counts incremented. When a local predecessor's dependency consumption count equals or exceeds its local downwind dependency count, its corresponding slot is returned back to the pool via the CBC_FLUDS::Deallocate method.

Below is a strong-scaling plot for the CBC algorithm run with the CBC_FLUDS class with this pool allocator on LLNL Dane, whose node specs are given below.

  • 112 Intel Sapphire Rapids cores/node
  • 105MB cache/CPU
  • 2.28GB DDR5 per core

The scaling study was run on 1, 2, 4, 8, 16, 32, 64, and 128 nodes, with 64 ranks per node (rpn).

opensn-strong-scaling-cbc-pool-allocator

Below is a plot showing the strong-scaling results from PR 808 and this PR for 2, 4, 8, 16, 32, and 128 nodes. The 1-node results for both PRs have been omitted on this plot as PR 808 is not able to run on 1 node on LLNL Dane due to memory limitations.

image

Below is a weak-scaling plot for the CBC algorithm run with the CBC_FLUDS class with this pool allocator on LLNL Dane.

opensn-weak-scaling-cbc-pool-allocator

The weak-scaling study was run on 1, 2, 4, 8, 16, 32, and 64 nodes, with 64 rpn.

@eappen-nelluvelil eappen-nelluvelil marked this pull request as draft October 3, 2025 20:46
@eappen-nelluvelil eappen-nelluvelil marked this pull request as ready for review October 3, 2025 20:46
@eappen-nelluvelil
Copy link
Contributor Author

@wdarylhawkins I'd like to request a review for this PR. I forgot to add assignees to this PR, and I'm not able to edit the PR to add reviewers.

Copy link
Contributor

@ragusa ragusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wdhawkins I have no problem approving this. However, for this type of PR, should we get in the habit of copy/pasting scaling results in the discussion/conversation dialog box of the PR? I kinda feel I need to know whether the results changed or not. And if this does not apply here, just tell me as well.

@ragusa
Copy link
Contributor

ragusa commented Oct 6, 2025

@wdhawkins I have no problem approving this. However, for this type of PR, should we get in the habit of copy/pasting scaling results in the discussion/conversation dialog box of the PR? I kinda feel I need to know whether the results changed or not. And if this does not apply here, just tell me as well.

My bad. That information was already present. I just didn't see it without scrolling more.

@ragusa ragusa self-requested a review October 6, 2025 18:48
@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from c049b4d to d58c250 Compare October 9, 2025 16:52
@eappen-nelluvelil
Copy link
Contributor Author

@wdhawkins I made the following changes:

  • Updated the UpwindPsi, OutgoingPsi, NLUpwindPsi, NLOutgoingPsi methods in CBC_FLUDS, as well as their calls in cbc_sweep_chunk.cc, to more closely mirror how they're implemented in AAH_FLUDS and how they are used in aah_sweep_chunk.cc

@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from d58c250 to 5309776 Compare October 9, 2025 17:56
@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch 6 times, most recently from 23d5694 to 2239fb6 Compare October 21, 2025 17:37
…ost Pool's

simple segregated storage class to manage angular flux data within the
`CBC_FLUDS` class. During a sweep, as soon as the CBC algorithm sends a
cell's angular flux data to all of its local downwind dependencies, the
storage for that cell's angular flux data can be reused for another
local cell that has not yet been solved.

For a given `CBC_SPDS`, the CBC algorithm currently stores angular
fluxes for all cells in the `CBC_SPDS`, for angles in the quadrature,
and for all groups in a given group set, which results in high memory
intensity during sweeps. The changes in this commit include the
following:
- The `CBC_SPDS` class has a new method to simulate a local sweep, which
  determines the minimum number of free slots (blocks of memory) that
the `CBC_FLUDS` class needs to properly store cell angular fluxes during
a given sweep. The simulated sweep iterates through the `CBC_SPDS`'s
task list in the same manner as in the `CBC_AngleSet::AngleSetAdvance`
method, and determines the peak number of cells whose angular fluxes
have to be stored during a sweep. This is done for cells that have
purely local upwind and downwind dependencies. For cells that have
either remote upwind or remote downwind dependencies, the simulated
sweep sets aside a slot for each one of these cells. Cells with either remote upwind or remote
downwind dependencies cannot cannot have their corresponding memory
blocks be returned to the free pool after all of their local downwind
dependencies have received the appropriate angular fluxes. This is due to the
non-deterministic and asynchronous nature of the CBC algorithm's
communication patterns. A simulated sweep can neither determine ahead of time
when a cell with remote upwind dependencies will get its necessary
angular fluxes nor determine ahead of time when a cell with remote downwind
dependencies communicates its fluxes to said dependencies.
- The `CBC_FLUDS` class uses the Boost Pool's library simple segregated
  storage class to implement a free-list pool allocator. Using the
minimum number of slots calculated by the simulated sweep, the
`CBC_FLUDS` class constructs a backing buffer with as many elements as
the product of the minimum number of pool slots, the number of angles in the associated
angle set, and the number of groups in the associated group set. The
simple segregated storage object manages this backing buffer, and uses
an internal free-list to hand out free pool slots and take back pool
slots for cells whose angular flux data no longer needs to be stored
during the sweep.
- The `CBC_AngleSet::AngleSetAdvance` method uses the
  `CBC_FLUDS::Allocate` method to associate a free slot to a cell that
is ready to be solved. After a cell has been swept, its local
predecessors have their dependency consumption counts incremented. When
a local predecessor's dependency consumption count equals or exceeds its
local downwind dependency count, its corresponding slot is returned back
to the pool via the `CBC_FLUDS::Deallocate` method.
@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-fluds-boost-memory-pool-allocator branch from 2239fb6 to 8e43370 Compare October 22, 2025 20:25
@wdhawkins wdhawkins marked this pull request as draft November 8, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants