Skip to content

Align local benchmark skill levels with CI#45

Merged
luccabb merged 1 commit intomasterfrom
align-benchmark-skill-levels
Feb 11, 2026
Merged

Align local benchmark skill levels with CI#45
luccabb merged 1 commit intomasterfrom
align-benchmark-skill-levels

Conversation

@luccabb
Copy link
Owner

@luccabb luccabb commented Feb 9, 2026

Summary

  • Changed local benchmark default skill levels from 1 2 3 4 5 to 3 4 5 to match CI configuration
  • Added scripts/** to benchmark CI path trigger so script changes trigger the workflow

Test plan

  • Verify CI benchmark workflow triggers on this PR (since scripts/** path is now included)
  • Compare local and CI benchmark results for skill levels 3, 4, 5

Change local benchmark default from skill levels 1-5 to 3-4-5 to match
the CI workflow. Also add scripts/** to the benchmark CI path trigger
so workflow changes are tested automatically.
@greptile-apps
Copy link

greptile-apps bot commented Feb 9, 2026

Greptile Overview

Greptile Summary

This PR makes local benchmarking match the CI benchmark configuration by narrowing the default Stockfish skill levels in scripts/benchmark.sh to 3 4 5. It also updates the benchmark GitHub Actions workflow path filter so changes under scripts/** will trigger the benchmark job, ensuring the benchmark script and CI workflow stay in sync.

I didn’t find any issues that need fixing before merge in the touched files; both changes are minimal and consistent with the stated intent.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk.
  • The PR only adjusts benchmark configuration: adding scripts/** to a workflow path filter and changing the benchmark script’s default skill levels while preserving env overrides. No production code paths are affected, and the changes are syntactically valid.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/benchmark.yml Adds 'scripts/**' to the workflow path filter so benchmark CI runs when benchmark scripts change; change is syntactically valid and matches intent.
scripts/benchmark.sh Changes default SKILL_LEVELS from '1 2 3 4 5' to '3 4 5' to align with CI; keeps env override behavior intact.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Git as GitHub
    participant GA as GitHub Actions
    participant Script as scripts/benchmark.sh

    Dev->>Git: Push changes to PR
    Git-->>GA: Trigger workflow (paths include moonfish/**, opening_book/**, scripts/**, pyproject.toml, requirements.txt)
    GA->>Script: Run benchmark script
    Script-->>GA: Uses default SKILL_LEVELS="3 4 5" (unless overridden by env)
    GA-->>Git: Publish benchmark job result
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric Wins Losses Draws Total Win %
Overall 43 47 10 100 43.0%
As White 22 23 5 50 44.0%
As Black 21 24 5 50 42.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 9
  • Draw by fifty moves rule: 1

vs Stockfish Skill Level 4

Metric Wins Losses Draws Total Win %
Overall 16 75 9 100 16.0%
As White 8 39 3 50 16.0%
As Black 8 36 6 50 16.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 9

vs Stockfish Skill Level 5

Metric Wins Losses Draws Total Win %
Overall 12 80 8 100 12.0%
As White 5 40 5 50 10.0%
As Black 7 40 3 50 14.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 6
Configuration
  • 5 chunks × 20 rounds × 3 skill levels = 300 total games
  • Each opening played with colors reversed (-repeat) for fairness
  • Moonfish: 60s per move
  • Stockfish: 60+5 time control

@luccabb luccabb merged commit a17b487 into master Feb 11, 2026
27 checks passed
@luccabb luccabb deleted the align-benchmark-skill-levels branch February 11, 2026 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant