Skip to content

Eliminate dead refs#220

Merged
penelopeysm merged 10 commits intomainfrom
py/perf
Mar 25, 2026
Merged

Eliminate dead refs#220
penelopeysm merged 10 commits intomainfrom
py/perf

Conversation

@penelopeysm
Copy link
Copy Markdown
Member

@penelopeysm penelopeysm commented Mar 25, 2026

Closes #193 by performing live variable analysis on the BBCode intermediate representation.

Using the example from #193 (but changed to use an input argument, otherwise the entire function gets constant-folded away):

using Libtask

function f(x)
    a = 2*x
    b = 2*a
    produce(b)
    c = 3*b
    produce(c)
    return nothing
end

Libtask.generate_ir(:transformed_bb, f, 1.0)

Libtask.generate_ir(:optimised_bb, f, 1.0)

the BBCode before the ref elimination pass looks like this. I've highlighted operations that are removed with a (*):

BBCode (3 args, 4 blocks)
#32 ─
│   %30 = Libtask.resume_block_is(_1, 15)
│   %31 = Libtask.resume_block_is(_1, 13)
│   %33 = nothing
└── switch %30 => #15, %31 => #13, fallthrough #14
#14 ─
│   %7 = Base.mul_float(2.0, _3)::Float64
│   %16 = Libtask.set_ref_at!(_1, 1, %7)              (*)
│   %17 = Libtask.get_ref_at(_1, 1)                   (*)
│   %8 = Base.mul_float(2.0, %17)::Float64
│   %18 = Libtask.set_ref_at!(_1, 2, %8)
│   %19 = Libtask.set_resume_block!(_1, 15)
│   %20 = Libtask.get_ref_at(_1, 2)                   (*)
│   %21 = Libtask.ProducedValue(%20)
└── return %21
#15 ─
│   %23 = Libtask.get_ref_at(_1, 2)
│   %10 = Base.mul_float(3.0, %23)::Float64
│   %24 = Libtask.set_ref_at!(_1, 3, %10)             (*)
│   %25 = Libtask.set_resume_block!(_1, 13)
│   %26 = Libtask.get_ref_at(_1, 3)                   (*)
│   %27 = Libtask.ProducedValue(%26)
└── return %27
#13 ─
│   %29 = Libtask.set_resume_block!(_1, -1)
└── return Main.nothing

and afterwards it looks like

BBCode (3 args, 4 blocks)
#32 ─
│   %30 = Libtask.resume_block_is(_1, 15)
│   %31 = Libtask.resume_block_is(_1, 13)
│   %33 = nothing
└── switch %30 => #15, %31 => #13, fallthrough #14
#14 ─
│   %7 = Base.mul_float(2.0, _3)::Float64
│   %8 = Base.mul_float(2.0, %7)::Float64
│   %18 = Libtask.set_ref_at!(_1, 2, %8)
│   %19 = Libtask.set_resume_block!(_1, 15)
│   %21 = Libtask.ProducedValue(%8)
└── return %21
#15 ─
│   %23 = Libtask.get_ref_at(_1, 2)
│   %10 = Base.mul_float(3.0, %23)::Float64
│   %25 = Libtask.set_resume_block!(_1, 13)
│   %27 = Libtask.ProducedValue(%10)
└── return %27
#13 ─
│   %29 = Libtask.set_resume_block!(_1, -1)
└── return Main.nothing

Benchmarks

Running cd benchmarks; julia --project=. benchmark.jl, the change in time for the taped calls:

  • rosenbrock 3.408 ms -> 2.132 ms
  • ackley 26.259 ms -> 25.703 ms
  • matrix_test 365.375 μs -> 270.667 µs
  • neural_net 3.208 µs -> 3.177 µs

so some very significant gains on the first and third, the others less so.

With Turing, the performance gains from this PR are probably very diluted because of the amount of other stuff that needs to be done:

using Turing
J = 8
y = [28, 8, -3, 7, -1, 1, 18, 12]
sigma = [15, 10, 16, 11, 9, 11, 10, 18]
@model function eesc(J, y, sigma)
    mu ~ Normal(0, 5)
    tau ~ truncated(Cauchy(0, 5); lower=0)
    theta ~ MvNormal(fill(mu, J), tau^2 * I)
    for i in 1:J
        y[i] ~ Normal(theta[i], sigma[i])
    end
end
model = eesc(J, y, sigma)

@time sample(model, PG(20), 100; chain_type=Any, progress=false);
# main    1.314166 seconds (9.02 M allocations: 1.010 GiB, 6.64% gc time)
# this PR 1.207591 seconds (8.40 M allocations: 1019.048 MiB, 6.38% gc time)

@penelopeysm penelopeysm marked this pull request as ready for review March 25, 2026 15:18
@github-actions
Copy link
Copy Markdown

Libtask.jl documentation for PR #220 is available at:
https://TuringLang.github.io/Libtask.jl/previews/PR220/

@penelopeysm penelopeysm merged commit 5320a5a into main Mar 25, 2026
22 checks passed
@penelopeysm penelopeysm deleted the py/perf branch March 25, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance by not keeping unnecessary refs

1 participant