Skip to content

Conversation

@unalmis
Copy link
Collaborator

@unalmis unalmis commented Sep 18, 2025

Inverse stream maps

Other performance improvements

  • Check-pointing to increase speed and reduce memory consumption of reverse mode differentiation Checkpointing to reduce reverse mode AD memory usage #1347.
  • Adds low_ram mode which is same speed and less memory for objective.compute, but slower for objective.grad since JAX is poor at iterative algorithms.
  • Fully resolves Memory regression in bounce integrals #1864 by avoiding materialization of a large tensor in memory. Previously, we had closed the issue by adding nuffts as a workaround. The improvement here actually solves the JAX regression.
  • Reuses some computations in identifying bounce points to make more efficient.
  • Various small improvements to increase cache hits, fusing, and reduce floating point error.
  • Default quadrature over pitch angles in Gamma_c etc. switched from second to third order method (becomes fourth order if sufficiently smooth). The idea is even though it is not smooth in the pitch angle in the continuous limit at the bifurcations, the way we discretize the bounce integrals should smooth these bifurcations to some extent, so use of a third order method will not be damaging.
  • Transforms an improper field line integral to one on a compact domain where the integrand is periodic to achieve faster convergence.
  • Improves performance complexity of interp_to_argmin for Bounce2D from h^4 to spectral.
  • Resolves Use OOP for surface integrals with faster methods for tensor product grids #1389 (the non-cosmetic part)

Usability

Bugs

  • Fixes bug in test_compute_everything.
  • Fixes an edge case bug in interp_to_argmin for Bounce1D if grid had negative zeta values.

Benchmarks

Just go to #2026 and run effective_ripple_profile.py. You will see the large performance improvement from master. The CI benchmarks do not reveal this because those benchmarks are essentially just noise. Note that, using the same parameter inputs, the resolution of this branch is also higher than master due to the faster convergence.

  • If you set use_bounce1d=True on that script, you will run out of memory as expected since it is an inferior approach (as expected, you get the OOM in the jacobian before you compute a single bounce integral).
  • If you set nufft_eps=0, you need 175 GB to run that script on master (you'll get an OOM and JAX will tell you it needs 175GB), but only 35 GB on this branch.
  • Using nuffts, the script requires only 6.7 GB on this branch.

Examples

HELIOTRON

Master branch

test_theta_chebyshev_HELIOTRON

This branch

test_delta_fourier_chebyshev_HELIOTRON

W7-X

Master branch

test_theta_chebyshev_W7-X

This branch

test_delta_fourier_chebyshev_W7-X

NCSX

Master branch

test_theta_chebyshev_NCSX

This branch

test_delta_fourier_chebyshev_NCSX

Removal of spectral aliasing

Figure_1

@unalmis unalmis self-assigned this Sep 18, 2025
@unalmis unalmis added performance New feature or request to make the code faster robustness Make the code more robust labels Sep 18, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@unalmis unalmis changed the base branch from master to ku/nufft September 18, 2025 07:45
@unalmis unalmis marked this pull request as draft September 18, 2025 07:47
@unalmis unalmis added the theory Requires theory work before coding label Sep 18, 2025
@unalmis unalmis changed the title New inverse stream maps to accelerate convergence New inverse stream map to accelerate convergence Sep 18, 2025
@unalmis unalmis added the Bug fix Something was fixed label Sep 18, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 18, 2025

Memory benchmark result

|               Test Name                |      %Δ      |    Master (MB)     |      PR (MB)       |    Δ (MB)    |    Time PR (s)     |  Time Master (s)   |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
  test_objective_jac_w7x                 |    6.51 %    |     3.800e+03      |     4.047e+03      |    247.31    |       38.06        |       36.25        |
  test_proximal_jac_w7x_with_eq_update   |   -1.44 %    |     6.580e+03      |     6.485e+03      |    -94.96    |       161.29       |       161.18       |
  test_proximal_freeb_jac                |   -0.16 %    |     1.319e+04      |     1.317e+04      |    -20.81    |       85.46        |       86.87        |
  test_proximal_freeb_jac_blocked        |   -0.28 %    |     7.488e+03      |     7.467e+03      |    -20.79    |       74.62        |       72.81        |
  test_proximal_freeb_jac_batched        |   -0.20 %    |     7.456e+03      |     7.441e+03      |    -14.63    |       73.70        |       73.32        |
  test_proximal_jac_ripple               |   -3.65 %    |     3.596e+03      |     3.465e+03      |   -131.35    |       65.10        |       65.80        |
  test_proximal_jac_ripple_bounce1d      |   -2.47 %    |     3.615e+03      |     3.526e+03      |    -89.21    |       79.18        |       76.65        |
  test_eq_solve                          |   -0.58 %    |     2.007e+03      |     1.995e+03      |    -11.73    |       94.18        |       94.70        |

For the memory plots, go to the summary of Memory Benchmarks workflow and download the artifact.

@unalmis unalmis removed Bug fix Something was fixed theory Requires theory work before coding labels Sep 18, 2025
@unalmis unalmis changed the title New inverse stream map to accelerate convergence better inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis changed the title better inverse stream map to accelerate convergence new inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis added the theory Requires theory work before coding label Sep 20, 2025
@unalmis unalmis force-pushed the ku/bounce branch 2 times, most recently from a6d949b to d685405 Compare September 22, 2025 04:33
"""
in_axes, argnums = _parse_in_axes(in_axes)
if isinstance(argnums, int):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code

@unalmis unalmis removed the theory Requires theory work before coding label Sep 22, 2025
@register_compute_fun(
name="(J^theta_PEST_p)|PEST",
label="\\partial_{\\phi}|_{\\rho, \\vartheta} J^{\\theta_{PEST}}",
label="\\partial_{\\phi}|_{\\rho, \\vartheta} J^{\\vartheta}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latex label used to have both vartheta and theta pest which is confusing in plots.

@register_compute_fun(
name="(J^theta_PEST_v)|PEST",
label="\\partial_{\\vartheta} J^{\\theta_{PEST}}",
label="\\partial_{\\vartheta}|_{\\rho, \\phi} J^{\\vartheta}",
Copy link
Collaborator Author

@unalmis unalmis Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@partial(jit, static_argnames=["tol", "maxiter"])
def _map_clebsch_coordinates(
@partial(jit, static_argnames=["inbasis", "outbasis", "tol", "maxiter"])
def _map_poloidal_coordinates(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a private method so we are free to make changes as we please. slowly pushing it to be more like eq.map_coordinates but it's not quite the same because it takes advantage of the 1D mesh grid nature of problem

@unalmis unalmis added the P3 Highest Priority, someone is/should be actively working on this label Sep 23, 2025
@unalmis unalmis dismissed f0uriest’s stale review September 23, 2025 08:44

addressed request

@unalmis unalmis marked this pull request as ready for review September 23, 2025 08:44
@unalmis unalmis requested review from a team, f0uriest and rahulgaur104 and removed request for a team September 23, 2025 08:44
@PlasmaControl PlasmaControl deleted a comment from f0uriest Dec 19, 2025
@PlasmaControl PlasmaControl deleted a comment from f0uriest Dec 19, 2025
@PlasmaControl PlasmaControl deleted a comment from f0uriest Dec 19, 2025
@PlasmaControl PlasmaControl deleted a comment from f0uriest Dec 19, 2025
@unalmis unalmis requested review from a team, YigitElma, ddudt, dpanici, f0uriest and rahulgaur104 and removed request for a team, YigitElma, ddudt, dpanici, f0uriest and rahulgaur104 January 2, 2026 06:37
@unalmis unalmis added the easy Short and simple to code or review label Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix Something was fixed easy Short and simple to code or review P3 Highest Priority, someone is/should be actively working on this performance New feature or request to make the code faster robustness Make the code more robust run_benchmarks Run timing benchmarks on this PR against current master branch stable Awaiting merge to master. Only updates will be merging from master.

Projects

None yet

5 participants