Skip to content

Conversation

@seimtpow
Copy link

Adds the ability to vary the stencil used to create the Jacobian colouring where as previously used a 5-point stencil.
Added options:

  • solver:stencil:cross = N
    e.g. for N == 2
    *     
    *
* * x * * 
    *
    *
  • solver:stencil:square = N
    e.g. for N == 2
* * * * *
* * * * *
* * x * *
* * * * *
* * * * *
  • solver:stencil:taxi = N
    e.g. for N == 2
    *     
  * * *
* * x * * 
  * * *
    *
  • solver:force_symmetric_coloring = true, will make sure that the jacobian colouring matrix is symmetric, this will often include a few extra non-zeros that the stencil will miss otherwise

Seimon Powell added 3 commits April 24, 2025 14:54
4 new options added

solver:stencil:<cross,square,taxi> = N where <option> is either cross, square, or taxi:
cross - all points (|x| == 0 and |y| <= N) or (|x| <= N and |y| == 0)
square - all points |x| < N or |y| < N
taxi - all points |x| + |y| < N
can be combined will just create a union of every stencil
New default is solver:stencil:taxi = 2
Also option to force colouring matrix to be symmetric
solver:force_symmetric_coloring=true
@bendudson
Copy link
Contributor

Thanks @seimtpow ! This is great work!

I think the merge has squashed the improvements in this PR: #3009
that allow the matrix-free method to be used for the Jacobian-vector product, with the finite difference Jacobian being used for the preconditioner. This greatly improved the speed and robustness in #3009.

@mikekryjak
Copy link
Contributor

Hi @bendudson, what do you mean by squashed in this context? Which merge do you mean?

@bendudson
Copy link
Contributor

Hey @mikekryjak ! The changes that were made in #3009 are removed by this PR: The matrix_free_operator, prune_jacobian, scale_vars and scale_rhs options are not in @seimtpow 's branch, and will be removed when this is merged. The matrix_free_operator option was a significant improvement, so I'm working on merging that into this branch.

PR boutproject#3009 included changes to allow matrix-free Jacobian-vector
products, while using the finite difference Jacobian for the
preconditioner. That gave significant speedups that will hopefully
also help now that the coloring stencil is fixed.
If the stencil offset exceeds MXG in X, or MYG in Y, then
the index offsets will go outside the range of the guard cells.

Fixes failure of test-beuler
Revert earlier change to offset array. Instead fix index limits
to be < LocalNx.
bendudson
bendudson previously approved these changes Apr 25, 2025
Copy link
Contributor

@bendudson bendudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seimtpow ! I hope adding the matrix free Jacobian-vector product operator helps, and my merge hasn't somehow messed up your improvements.

@bendudson
Copy link
Contributor

Some tests, all with the same Hermes-3 version 98f4b4c

Running examples/tokamak-2D/recycling from scratch to t = 100 /wci

Run Solver BOUT++ Settings RHS calls TIme [s]
1 CVODE BOUT++ default example (atol=1e-12) 2677 2.23e+01
2 CVODE BOUT++ default readthedocs (atol=1e-7) 32393 2.60e+02
3 Beuler BOUT++ default readthedocs failed failed
4 Beuler BOUT++ default matrix_free_operator = false 12295 1.43e+02
5 Beuler BOUT++ 697f729, Seimon readthedocs 22893 8.31e+02
6 Beuler BOUT++ 697f729, Seimon stencil:cross=2 failed failed
7 Beuler BOUT++ 697f729, Seimon stencil:cross=2, symmetric=true failed failed
8 Beuler BOUT++ 540ab7d, After merge readthedocs 444634 1.42e+04
9 Beuler BOUT++ 540ab7d, After merge atol = 1e-12 295163 9.78e+03
10 Beuler BOUT++ 540ab7d, After merge matrix_free_operator = false 22893 8.36e+02
11 Beuler BOUT++ 540ab7d, After merge matrix_free_operator = false, stencil:cross=2, symmetric=true failed failed
12 Beuler BOUT++ 540ab7d, After merge atol = 1e-12, matrix_free_operator = false 22893 9.34e+02
13 Beuler BOUT++ 540ab7d, After merge atol = 1e-12, matrix_free_operator = false, stencil:cross=2 failed failed
14 Beuler BOUT++ 540ab7d, After merge atol = 1e-12, matrix_free_operator = false, stencil:cross=2, symmetric=true failed failed
  • 3 and 4 show that matrix_free_operator is not helping in 2D (before Seimon's changes) though it improved performance in 1D
  • I thought the difference between 4 and 5 might be due to the wider stencil in Seimon’s branch. Changing back to a cross (6 and 7) I thought would be the same as 4, but both fail to converge. 
Should cross=2 not be the same as the previous implementation?
  • 10 and 5 being the same indicates that the merge didn’t break anything in Seimon’s branch: Both do the same thing when turning off matrix_free_operator
  • I’ve checked again, and 10 and 12 really have the same number of RHS calls, though atol is 1e-7 in 10 and 1e-12 in 12. I think this just means that atol plays no part in the convergence criteria. Reason = 4 (Newton computed step size small i.e. stol) is printed every time when setting diagnose = true

Is examples/tokamak-2D/recycling a particularly cursed setup? None of the beuler settings I’ve tried have come close to CVODE. The beuler preconditioner must be much better than the one CVODE is using, so other differences in e.g. nonlinear and linear step control must be playing a role.


It looks like we should disable matrix_free_operator by default, if we’re optimizing for the 2D case.

@seimtpow
Copy link
Author

@bendudson cross == 1 should be the same as before, except in 3D where I am always adding every cell in Z to the stencil and I think before it was just one forward and one back.

@bendudson
Copy link
Contributor

Thanks @seimtpow ! I can confirm that cross=1 gives the same result as the original code. More tests:

Run Solver BOUT++ Settings RHS calls TIme [s]
15 Beuler BOUT++ 697f729, Seimon stencil:cross=1 12295 1.68e+02
16 Beuler BOUT++ 540ab7d, After merge matrix_free_operator = false, cross=1 12295 1.44e+02
17 Beuler BOUT++ 540ab7d, After merge matrix_free_operator = false, square=1 19866 4.64e+02
18 Beuler BOUT++ 540ab7d, After merge square=1 failed failed

Runs 15 and 16 both give the same number of RHS calls as run 4.

This uses a matrix-free method to calculate Jacobian-vector products.
It seems to improve performance in 1D, but degrades performance in 2D.
Effect likely problem dependent.
@bendudson
Copy link
Contributor

Checking the latest version:

Run Solver BOUT++ Settings RHS calls TIme [s]
19 Beuler 348e65b readthedocs 22893 8.26e+02
20 Beuler 348e65b cross=1 12295 1.58e+02
21 Beuler 348e65b square=1 19866 4.36e+02

I would love to know how CVODE can be 7x faster for this initial part of the recycling example. We could try:

  • Apply some of CVODE's methods to the PETSc settings/solvers
  • Adapt this preconditioner for CVODE: It should be possible to use the coloring and apply the same preconditioner to CVODE.

Using @seimtpow's PR comments to add material to the manual.
Copy link
Contributor

@bendudson bendudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seimtpow ! Good to merge I think.

@mikekryjak
Copy link
Contributor

mikekryjak commented Apr 27, 2025 via email

@bendudson
Copy link
Contributor

Hey @mikekryjak ! Taxi stencil is the default, so readthedocs runs have that (5, 8, 9, 10, 12, and 19).
20 is the same as 4, which is better than 3 because matrix_free_operator has been disabled.

In this case it looks like the cost of calculating the wider taxi stencil (19) outweighs the benefit from the additional accuracy. I suspect this is because this is starting from scratch: The solution is changing rapidly, so the Jacobian is also changing. There isn't much benefit in calculating a Jacobian if it becomes outdated almost immediately.

I suspect that the new stencil will speed up cases that are closer to convergence, where the solution changes aren't as dramatic. The best solver may vary in time e.g. CVODE at first , followed by beuler with a small stencil, then beuler with a wider taxi stencil.

@bendudson bendudson merged commit ca7a82e into boutproject:next Apr 28, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants