Clean up benchmarks folder and CI by penelopeysm · Pull Request #213 · TuringLang/Libtask.jl

penelopeysm · 2026-02-21T23:43:17Z

This PR gets rid of the severely outdated scripts in the perf folder (nobody bothered to update them, even when Libtask got rewritten, and even when there was a major effort in updating Libtask for Julia 1.12).

In its place we add a new set of integration tests with Turing. This set is fairly basic and just aims to show that sampling with SMC / PG actually works. This replaces the old strategy of running the entire Turing test suite, which was wasteful and led to incidents like #207.

Over time this set of tests can be expanded to cover other problematic models, such as those in the other open issues.

- Remove broken Turing integration tests (p0.jl, p1.jl, p2.jl) and the runtests.jl that included them. - Use Chairmarks (it's just much faster) - Improved printing of results - General modernisation of the script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-21T23:44:27Z

Libtask.jl documentation for PR #213 is available at:
https://TuringLang.github.io/Libtask.jl/previews/PR213/

penelopeysm · 2026-02-21T23:49:33Z

.github/workflows/Testing.yaml

-jobs:
-  test:
-    runs-on: ${{ matrix.os }}
-    continue-on-error: true # ${{ matrix.version == 'nightly' }}


CI was always passing even if tests failed (!!!!!)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Closes #208 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

penelopeysm · 2026-02-22T00:22:48Z

We are spawning a slightly silly number of runners for the new Turing integration test. I think it is important to make sure that the Turing integration test passes on all combinations, because Turing is the raison d'etre of Libtask. We should test Turing integration as seriously as we test Libtask itself.

The good news is that all of these jobs are very short, so it's totally fine in terms of the GitHub runner workload.

penelopeysm and others added 2 commits February 21, 2026 23:41

Move perf/ to benchmarks/ which is more descriptive

1e1e60b

penelopeysm added 2 commits February 21, 2026 23:48

update CI workflows

dc8c386

Format

f4eddb1

penelopeysm commented Feb 21, 2026

View reviewed changes

Modernise integration test workflow and drop Turing.jl

83cb7ba

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

penelopeysm changed the title ~~Clean up benchmarks folder~~ Clean up benchmarks folder and CI Feb 21, 2026

penelopeysm and others added 3 commits February 22, 2026 00:17

Add Turing integration tests for SMC and PG samplers

210f1fe

Closes #208 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

increase iterations

8c66e9f

disable fail fast

d83165c

fix

4b76adf

penelopeysm merged commit 53afd4c into main Feb 22, 2026
22 checks passed

penelopeysm deleted the py/clean-benchmarks branch February 22, 2026 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up benchmarks folder and CI#213

Clean up benchmarks folder and CI#213
penelopeysm merged 9 commits intomainfrom
py/clean-benchmarks

penelopeysm commented Feb 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

penelopeysm Feb 21, 2026

Uh oh!

penelopeysm commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

penelopeysm commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

penelopeysm Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

penelopeysm commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

penelopeysm commented Feb 21, 2026 •

edited

Loading

penelopeysm commented Feb 22, 2026 •

edited

Loading