BENCH: set up benchmarks for the core transforms. by DWesl · Pull Request #22 · jswhit/pyspharm

DWesl · 2024-08-31T14:35:54Z

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

Benchmarks based on airspeed velocity. Run directions and results below.

$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                       ok
[75.00%] ··· ======== ============ ============
             --                 method
             -------- -------------------------
              ntrunc    computed      stored
             ======== ============ ============
                21      97.2±4μs     72.6±7μs
                42      366±30μs     286±30μs
                63      939±80μs    713±300μs
                84     3.28±0.7ms   2.96±0.5ms
               127      8.89±1ms     6.35±2ms
               168      33.2±1ms     35.0±8ms
               252      90.8±5ms     102±10ms
             ======== ============ ============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                     ok
[100.00%] ··· ======== ============ ============
              --                 method
              -------- -------------------------
               ntrunc    computed      stored
              ======== ============ ============
                 21     115±100μs    85.2±20μs
                 42      249±40μs     184±20μs
                 63      759±90μs     448±20μs
                 84     1.47±0.2ms   1.00±0.1ms
                127     5.67±0.3ms   4.63±0.2ms
                168      10.6±1ms     11.2±2ms
                252      30.9±3ms    27.7±10ms
              ======== ============ ============

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

DWesl · 2024-09-07T16:47:57Z

Re-run of the benchmarks after passing complex coefficients to spectogrd. The 20% variation in the untouched grdtospec benchmark is apparently normal, so this would be useful primarily for finding large changes from one commit to the next, rather than tracking changes over time.

[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                      ok
[75.00%] ··· ======== ============= =============
             --                  method
             -------- ---------------------------
              ntrunc     computed       stored
             ======== ============= =============
                21       78.9±8μs      62.9±2μs
                42       296±9μs       228±2μs
                63      664±200μs     586±100μs
                84     2.88±0.04ms   2.70±0.04ms
               127      6.40±0.5ms    5.36±0.4ms
               168      28.1±0.7ms    27.2±0.5ms
               252       72.0±2ms      73.1±4ms
             ======== ============= =============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                    ok
[100.00%] ··· ======== ============ =============
              --                 method
              -------- --------------------------
               ntrunc    computed       stored
              ======== ============ =============
                 21      64.8±2μs     47.5±0.9μs
                 42      214±3μs       135±8μs
                 63      529±80μs      348±10μs
                 84     1.21±0.3ms     752±80μs
                127     4.73±0.3ms   3.87±0.02ms
                168     8.86±0.9ms    8.07±0.2ms
                252      27.1±1ms      23.4±1ms
              ======== ============ =============

jswhit · 2024-09-21T19:48:41Z

Thanks for all your recent contributions @DWesl! Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope. The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

DWesl · 2024-09-23T15:41:20Z

Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope.

My first thought was to wrap the files not exposed to python in Fortran 90 modules, but that broke the Python binding, so I'm inclined to agree with you.

The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

I think there's one other, which uses Clenshaw-Curtis quadrature rather than Gaussian for that integration, but it seems it upscales the grid internally as well.

I think there's a paper some while back showing exact transforms on a regular lat-lon grid can't happen without 2N latitudes, but frequently what we can get with N+1 latitudes is fine. (There's other papers investigating similar things with how well Clenshaw-Curtis quadrature deals with polynomials beyond the degree for which it is exact, compared to Gaussian quadrature, and extending that to certain rational functions). As partial evidence for that, the fast Legendre transform in the IFS is also approximate.

QianyeSu · 2025-08-17T04:25:34Z

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

Benchmarks based on airspeed velocity. Run directions and results below.

$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                       ok
[75.00%] ··· ======== ============ ============
             --                 method
             -------- -------------------------
              ntrunc    computed      stored
             ======== ============ ============
                21      97.2±4μs     72.6±7μs
                42      366±30μs     286±30μs
                63      939±80μs    713±300μs
                84     3.28±0.7ms   2.96±0.5ms
               127      8.89±1ms     6.35±2ms
               168      33.2±1ms     35.0±8ms
               252      90.8±5ms     102±10ms
             ======== ============ ============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                     ok
[100.00%] ··· ======== ============ ============
              --                 method
              -------- -------------------------
               ntrunc    computed      stored
              ======== ============ ============
                 21     115±100μs    85.2±20μs
                 42      249±40μs     184±20μs
                 63      759±90μs     448±20μs
                 84     1.47±0.2ms   1.00±0.1ms
                127     5.67±0.3ms   4.63±0.2ms
                168      10.6±1ms     11.2±2ms
                252      30.9±3ms    27.7±10ms
              ======== ============ ============

@DWesl
Hi, I've just rewritten the .f files from pyspharm into modern Fortran and added OpenMP SIMD directives to all the hot loops. My tests show a 10–20 % speed-up. If you're interested, take a look at the spharm submodule in my Skyborn repo:https://github.com/QianyeSu/Skyborn

DWesl added 4 commits August 31, 2024 10:27

BLD: Add dependencies for examples and benchmarks.

bf6eb89

BENCH: Fix dtype for harmonic coefficients.

19db6e7

BENCH: Have asv track main branch.

a34f9b9

Merge branch 'master' into asv-benchmarks

71a58ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

BENCH: set up benchmarks for the core transforms.#22

BENCH: set up benchmarks for the core transforms.#22
DWesl wants to merge 5 commits intojswhit:masterfrom
DWesl:asv-benchmarks

DWesl commented Aug 31, 2024

Uh oh!

DWesl commented Sep 7, 2024

Uh oh!

jswhit commented Sep 21, 2024 •

edited

Loading

Uh oh!

DWesl commented Sep 23, 2024

Uh oh!

QianyeSu commented Aug 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

DWesl commented Aug 31, 2024

Uh oh!

DWesl commented Sep 7, 2024

Uh oh!

jswhit commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DWesl commented Sep 23, 2024

Uh oh!

QianyeSu commented Aug 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jswhit commented Sep 21, 2024 •

edited

Loading