Skip to content

Add NPS benchmark for search speed regression testing#46

Merged
luccabb merged 4 commits intomasterfrom
add-nps-benchmark
Feb 16, 2026
Merged

Add NPS benchmark for search speed regression testing#46
luccabb merged 4 commits intomasterfrom
add-nps-benchmark

Conversation

@luccabb
Copy link
Owner

@luccabb luccabb commented Feb 15, 2026

Summary

  • Add a node counter (self.nodes) to AlphaBeta, incremented in negamax() and quiescence_search()
  • Add moonfish/bench.py with 48 positions from Stockfish's bench suite and a run_bench() function that reports per-position and total nodes, time, and NPS
  • Add --mode bench to the CLI (moonfish --mode bench)
  • Add CI workflow (.github/workflows/nps-benchmark.yml) that runs on PRs and posts results as a PR comment

Node count is deterministic (RNG is seeded) and serves as the primary signal — if it changes, the PR changed search behavior. NPS is informational only since CI runner performance varies.

Test plan

  • moonfish --mode bench runs all 48 positions and prints NPS results
  • Running twice produces identical node counts (543,813 at depth 3)
  • Existing test_alpha_beta tests pass (node counter doesn't break anything)

Add a node counter to AlphaBeta and a bench mode that searches 48
positions from Stockfish's bench suite, reporting per-position and
total nodes, time, and NPS. Node count is deterministic and serves
as the primary signal for detecting search behavior changes.

Includes a CI workflow that runs on PRs and posts results as a comment.
@greptile-apps
Copy link

greptile-apps bot commented Feb 15, 2026

Greptile Summary

Adds a deterministic NPS benchmark suite for regression testing search speed. A self.nodes counter is added to AlphaBeta, incremented in negamax() and quiescence_search(), and reset per search_move() call. A new bench.py module searches 48 Stockfish bench positions with a seeded RNG for reproducible node counts. The CLI gains --mode bench and a CI workflow posts benchmark results as PR comments.

  • The --depth CLI flag is silently ignored when running bench mode — run_bench(depth=5) is hardcoded in main.py:18 instead of using config.negamax_depth
  • Node counting in alpha_beta.py is minimal and correctly placed; it does not affect search behavior or existing tests
  • CI workflow has appropriate contents: read and pull-requests: write permissions and only triggers on engine code changes

Confidence Score: 4/5

  • This PR is safe to merge with one minor fix needed for the hardcoded bench depth.
  • The core engine change (node counter) is minimal and correct. The bench module is well-structured with deterministic seeding. The only issue is the hardcoded depth in main.py which silently ignores the CLI flag — a straightforward fix but worth addressing before merge.
  • moonfish/main.py — hardcoded depth ignores CLI --depth parameter

Important Files Changed

Filename Overview
moonfish/engines/alpha_beta.py Adds self.nodes counter initialized to 0, incremented in both negamax() and quiescence_search(), and reset in search_move(). Clean, minimal change with no impact on search behavior.
moonfish/bench.py New benchmark module with 48 Stockfish bench positions. Seeds RNG for deterministic node counts. Correctly handles terminal positions and reports per-position and total NPS.
moonfish/main.py Adds bench mode to CLI. However, run_bench(depth=5) hardcodes the depth, silently ignoring the --depth CLI parameter.
.github/workflows/nps-benchmark.yml CI workflow runs bench on PRs that touch engine code, parses output, and posts results as a PR comment. Permissions are appropriately scoped. Output parsing relies on consistent print format from bench.py.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (main.py)
    participant Bench as bench.py
    participant Engine as AlphaBeta
    participant QSearch as quiescence_search

    CLI->>Bench: run_bench(depth)
    Bench->>Bench: random.seed(0)
    loop 48 positions
        Bench->>Bench: _make_board(position)
        Bench->>Engine: search_move(board)
        Engine->>Engine: self.nodes = 0
        Engine->>Engine: negamax(board, depth)
        Engine->>Engine: self.nodes += 1
        Engine->>QSearch: quiescence_search(board)
        QSearch->>QSearch: self.nodes += 1
        Engine-->>Bench: best_move
        Bench->>Bench: read engine.nodes, accumulate totals
    end
    Bench->>CLI: print NPS results
Loading

Last reviewed commit: c5bbecd

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +17 to +18
elif config.mode == "bench":
run_bench(depth=5)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--depth CLI flag silently ignored in bench mode

run_bench(depth=5) hardcodes depth to 5, ignoring the --depth value passed via the CLI and stored in config.negamax_depth. A user running moonfish --mode bench --depth 3 would still get depth 5.

The CI workflow also passes --depth 5 which currently has no effect since the value is hardcoded here.

Suggested change
elif config.mode == "bench":
run_bench(depth=5)
elif config.mode == "bench":
run_bench(depth=config.negamax_depth)

@github-actions
Copy link

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric Wins Losses Draws Total Win %
Overall 20 74 6 100 20.0%
As White 10 36 4 50 20.0%
As Black 10 38 2 50 20.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 5

vs Stockfish Skill Level 4

Metric Wins Losses Draws Total Win %
Overall 20 73 7 100 20.0%
As White 13 33 4 50 26.0%
As Black 7 40 3 50 14.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 6
  • Draw by fifty moves rule: 1

vs Stockfish Skill Level 5

Metric Wins Losses Draws Total Win %
Overall 8 85 7 100 8.0%
As White 4 41 5 50 8.0%
As Black 4 44 2 50 8.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 7
Configuration
  • 5 chunks × 20 rounds × 3 skill levels = 300 total games
  • Each opening played with colors reversed (-repeat) for fairness
  • Moonfish: 60s per move
  • Stockfish: 60+5 time control

@github-actions
Copy link

⚡ NPS Benchmark Results

Metric Value
Depth 5
Positions 48
Total nodes 21939310
Total time 4904.51s
Nodes/second 4473

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown
Position  1/48: nodes=155456     time=31.17s  nps=4986
Position  2/48: nodes=762397     time=176.60s  nps=4316
Position  3/48: nodes=9587       time=1.32s  nps=7289
Position  4/48: nodes=857677     time=192.73s  nps=4450
Position  5/48: nodes=78423      time=18.52s  nps=4235
Position  6/48: nodes=519814     time=121.44s  nps=4280
Position  7/48: nodes=354259     time=83.56s  nps=4239
Position  8/48: nodes=387596     time=75.70s  nps=5120
Position  9/48: nodes=1652736    time=336.43s  nps=4912
Position 10/48: nodes=472629     time=92.16s  nps=5128
Position 11/48: nodes=516877     time=128.93s  nps=4008
Position 12/48: nodes=960140     time=238.87s  nps=4019
Position 13/48: nodes=618484     time=140.97s  nps=4387
Position 14/48: nodes=700607     time=147.29s  nps=4756
Position 15/48: nodes=654854     time=134.16s  nps=4880
Position 16/48: nodes=261335     time=49.35s  nps=5295
Position 17/48: nodes=17256      time=2.63s  nps=6572
Position 18/48: nodes=12611      time=1.53s  nps=8237
Position 19/48: nodes=38487      time=5.87s  nps=6552
Position 20/48: nodes=86927      time=12.06s  nps=7208
Position 21/48: nodes=16944      time=2.41s  nps=7021
Position 22/48: nodes=475        time=0.06s  nps=8246
Position 23/48: nodes=10664      time=1.41s  nps=7585
Position 24/48: nodes=33008      time=5.35s  nps=6165
Position 25/48: nodes=10136      time=1.46s  nps=6936
Position 26/48: nodes=79572      time=13.09s  nps=6076
Position 27/48: nodes=82542      time=11.61s  nps=7107
Position 28/48: nodes=308023     time=56.57s  nps=5444
Position 29/48: nodes=231702     time=50.67s  nps=4572
Position 30/48: nodes=2547       time=0.38s  nps=6749
Position 31/48: nodes=1474637    time=300.61s  nps=4905
Position 32/48: nodes=727292     time=145.69s  nps=4992
Position 33/48: nodes=2470627    time=790.65s  nps=3124
Position 34/48: nodes=1291369    time=339.29s  nps=3806
Position 35/48: nodes=557752     time=105.16s  nps=5303
Position 36/48: nodes=1931624    time=405.97s  nps=4758
Position 37/48: nodes=1551790    time=305.94s  nps=5072
Position 38/48: nodes=14491      time=1.54s  nps=9413
Position 39/48: nodes=5184       time=0.50s  nps=10269
Position 40/48: nodes=22316      time=0.95s  nps=23529
Position 41/48: nodes=131447     time=18.36s  nps=7158
Position 42/48: nodes=83030      time=12.15s  nps=6831
Position 43/48: nodes=23479      time=3.00s  nps=7815
Position 44/48: nodes=102571     time=16.88s  nps=6077
Position 45/48: nodes=45937      time=8.24s  nps=5577
Position 46/48: nodes=1611999    time=315.27s  nps=5113
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

@github-actions
Copy link

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric Wins Losses Draws Total Win %
Overall 30 63 7 100 30.0%
As White 18 28 4 50 36.0%
As Black 12 35 3 50 24.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 7

vs Stockfish Skill Level 4

Metric Wins Losses Draws Total Win %
Overall 22 68 10 100 22.0%
As White 15 31 4 50 30.0%
As Black 7 37 6 50 14.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 9

vs Stockfish Skill Level 5

Metric Wins Losses Draws Total Win %
Overall 8 87 5 100 8.0%
As White 6 42 2 50 12.0%
As Black 2 45 3 50 4.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 5
Configuration
  • 5 chunks × 20 rounds × 3 skill levels = 300 total games
  • Each opening played with colors reversed (-repeat) for fairness
  • Moonfish: 60s per move
  • Stockfish: 60+5 time control

@github-actions
Copy link

⚡ NPS Benchmark Results

Metric Value
Depth 5
Positions 48
Total nodes 21939310
Total time 4887.20s
Nodes/second 4489

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown
Position  1/48: nodes=155456     time=31.03s  nps=5009
Position  2/48: nodes=762397     time=175.75s  nps=4337
Position  3/48: nodes=9587       time=1.29s  nps=7427
Position  4/48: nodes=857677     time=190.96s  nps=4491
Position  5/48: nodes=78423      time=18.40s  nps=4261
Position  6/48: nodes=519814     time=121.26s  nps=4286
Position  7/48: nodes=354259     time=83.37s  nps=4249
Position  8/48: nodes=387596     time=75.58s  nps=5128
Position  9/48: nodes=1652736    time=336.27s  nps=4914
Position 10/48: nodes=472629     time=91.50s  nps=5165
Position 11/48: nodes=516877     time=127.83s  nps=4043
Position 12/48: nodes=960140     time=240.63s  nps=3990
Position 13/48: nodes=618484     time=140.80s  nps=4392
Position 14/48: nodes=700607     time=147.77s  nps=4741
Position 15/48: nodes=654854     time=133.95s  nps=4888
Position 16/48: nodes=261335     time=49.35s  nps=5295
Position 17/48: nodes=17256      time=2.61s  nps=6622
Position 18/48: nodes=12611      time=1.52s  nps=8269
Position 19/48: nodes=38487      time=5.85s  nps=6581
Position 20/48: nodes=86927      time=11.91s  nps=7296
Position 21/48: nodes=16944      time=2.39s  nps=7076
Position 22/48: nodes=475        time=0.06s  nps=8345
Position 23/48: nodes=10664      time=1.39s  nps=7652
Position 24/48: nodes=33008      time=5.28s  nps=6248
Position 25/48: nodes=10136      time=1.44s  nps=7035
Position 26/48: nodes=79572      time=12.87s  nps=6182
Position 27/48: nodes=82542      time=11.50s  nps=7178
Position 28/48: nodes=308023     time=55.80s  nps=5520
Position 29/48: nodes=231702     time=50.02s  nps=4631
Position 30/48: nodes=2547       time=0.37s  nps=6836
Position 31/48: nodes=1474637    time=299.02s  nps=4931
Position 32/48: nodes=727292     time=144.86s  nps=5020
Position 33/48: nodes=2470627    time=788.27s  nps=3134
Position 34/48: nodes=1291369    time=336.75s  nps=3834
Position 35/48: nodes=557752     time=104.88s  nps=5318
Position 36/48: nodes=1931624    time=404.74s  nps=4772
Position 37/48: nodes=1551790    time=304.48s  nps=5096
Position 38/48: nodes=14491      time=1.50s  nps=9650
Position 39/48: nodes=5184       time=0.49s  nps=10475
Position 40/48: nodes=22316      time=0.93s  nps=23892
Position 41/48: nodes=131447     time=18.12s  nps=7253
Position 42/48: nodes=83030      time=12.08s  nps=6874
Position 43/48: nodes=23479      time=2.98s  nps=7878
Position 44/48: nodes=102571     time=16.73s  nps=6129
Position 45/48: nodes=45937      time=8.18s  nps=5612
Position 46/48: nodes=1611999    time=314.38s  nps=5127
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

@github-actions
Copy link

⚡ NPS Benchmark Results

Metric Value
Depth 5
Positions 48
Total nodes 21939310
Total time 4525.11s
Nodes/second 4848

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown
Position  1/48: nodes=155456     time=28.91s  nps=5377
Position  2/48: nodes=762397     time=163.94s  nps=4650
Position  3/48: nodes=9587       time=1.22s  nps=7859
Position  4/48: nodes=857677     time=180.13s  nps=4761
Position  5/48: nodes=78423      time=17.31s  nps=4530
Position  6/48: nodes=519814     time=113.04s  nps=4598
Position  7/48: nodes=354259     time=76.97s  nps=4602
Position  8/48: nodes=387596     time=70.23s  nps=5519
Position  9/48: nodes=1652736    time=314.34s  nps=5257
Position 10/48: nodes=472629     time=85.60s  nps=5521
Position 11/48: nodes=516877     time=118.91s  nps=4346
Position 12/48: nodes=960140     time=220.47s  nps=4355
Position 13/48: nodes=618484     time=128.62s  nps=4808
Position 14/48: nodes=700607     time=135.70s  nps=5162
Position 15/48: nodes=654854     time=122.44s  nps=5348
Position 16/48: nodes=261335     time=45.55s  nps=5737
Position 17/48: nodes=17256      time=2.40s  nps=7202
Position 18/48: nodes=12611      time=1.40s  nps=9021
Position 19/48: nodes=38487      time=5.35s  nps=7195
Position 20/48: nodes=86927      time=11.00s  nps=7904
Position 21/48: nodes=16944      time=2.22s  nps=7634
Position 22/48: nodes=475        time=0.05s  nps=9062
Position 23/48: nodes=10664      time=1.29s  nps=8249
Position 24/48: nodes=33008      time=4.88s  nps=6766
Position 25/48: nodes=10136      time=1.33s  nps=7626
Position 26/48: nodes=79572      time=11.86s  nps=6708
Position 27/48: nodes=82542      time=10.56s  nps=7820
Position 28/48: nodes=308023     time=51.13s  nps=6024
Position 29/48: nodes=231702     time=45.80s  nps=5059
Position 30/48: nodes=2547       time=0.34s  nps=7418
Position 31/48: nodes=1474637    time=274.78s  nps=5366
Position 32/48: nodes=727292     time=133.47s  nps=5449
Position 33/48: nodes=2470627    time=721.64s  nps=3423
Position 34/48: nodes=1291369    time=312.35s  nps=4134
Position 35/48: nodes=557752     time=97.66s  nps=5711
Position 36/48: nodes=1931624    time=378.25s  nps=5106
Position 37/48: nodes=1551790    time=284.37s  nps=5456
Position 38/48: nodes=14491      time=1.39s  nps=10418
Position 39/48: nodes=5184       time=0.46s  nps=11381
Position 40/48: nodes=22316      time=0.87s  nps=25674
Position 41/48: nodes=131447     time=16.84s  nps=7807
Position 42/48: nodes=83030      time=11.15s  nps=7447
Position 43/48: nodes=23479      time=2.75s  nps=8541
Position 44/48: nodes=102571     time=15.43s  nps=6646
Position 45/48: nodes=45937      time=7.56s  nps=6074
Position 46/48: nodes=1611999    time=293.18s  nps=5498
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

@github-actions
Copy link

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric Wins Losses Draws Total Win %
Overall 37 54 9 100 37.0%
As White 20 23 7 50 40.0%
As Black 17 31 2 50 34.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 8
  • Draw by insufficient mating material: 1

vs Stockfish Skill Level 4

Metric Wins Losses Draws Total Win %
Overall 18 77 5 100 18.0%
As White 11 35 4 50 22.0%
As Black 7 42 1 50 14.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 4

vs Stockfish Skill Level 5

Metric Wins Losses Draws Total Win %
Overall 4 89 7 100 4.0%
As White 1 45 4 50 2.0%
As Black 3 44 3 50 6.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 7
Configuration
  • 5 chunks × 20 rounds × 3 skill levels = 300 total games
  • Each opening played with colors reversed (-repeat) for fairness
  • Moonfish: 60s per move
  • Stockfish: 60+5 time control

@luccabb luccabb merged commit 6172f0e into master Feb 16, 2026
10 checks passed
@luccabb luccabb deleted the add-nps-benchmark branch February 16, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant