Skip to content

Clean up benchmarks folder and CI#213

Merged
penelopeysm merged 9 commits intomainfrom
py/clean-benchmarks
Feb 22, 2026
Merged

Clean up benchmarks folder and CI#213
penelopeysm merged 9 commits intomainfrom
py/clean-benchmarks

Conversation

@penelopeysm
Copy link
Copy Markdown
Member

@penelopeysm penelopeysm commented Feb 21, 2026

This PR gets rid of the severely outdated scripts in the perf folder (nobody bothered to update them, even when Libtask got rewritten, and even when there was a major effort in updating Libtask for Julia 1.12).

In its place we add a new set of integration tests with Turing. This set is fairly basic and just aims to show that sampling with SMC / PG actually works. This replaces the old strategy of running the entire Turing test suite, which was wasteful and led to incidents like #207.

Over time this set of tests can be expanded to cover other problematic models, such as those in the other open issues.

penelopeysm and others added 2 commits February 21, 2026 23:41
- Remove broken Turing integration tests (p0.jl, p1.jl, p2.jl) and the
  runtests.jl that included them.
- Use Chairmarks (it's just much faster)
- Improved printing of results
- General modernisation of the script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Libtask.jl documentation for PR #213 is available at:
https://TuringLang.github.io/Libtask.jl/previews/PR213/

jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true # ${{ matrix.version == 'nightly' }}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI was always passing even if tests failed (!!!!!)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@penelopeysm penelopeysm changed the title Clean up benchmarks folder Clean up benchmarks folder and CI Feb 21, 2026
@penelopeysm
Copy link
Copy Markdown
Member Author

penelopeysm commented Feb 22, 2026

We are spawning a slightly silly number of runners for the new Turing integration test. I think it is important to make sure that the Turing integration test passes on all combinations, because Turing is the raison d'etre of Libtask. We should test Turing integration as seriously as we test Libtask itself.

The good news is that all of these jobs are very short, so it's totally fine in terms of the GitHub runner workload.

@penelopeysm penelopeysm merged commit 53afd4c into main Feb 22, 2026
22 checks passed
@penelopeysm penelopeysm deleted the py/clean-benchmarks branch February 22, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant