Ryan edits 1 by rsenne · Pull Request #6 · rsenne/ParallelMCMC.jl

rsenne · 2026-03-23T18:38:44Z

Adds proper GPU support

…e.jl. Minor tweaks to tests/erro handling

…perability with Turing/MCMCChains/AbstracTMCMC

… version

Copilot

Pull request overview

This PR expands ParallelMCMC’s MALA implementation to support batched/GPU execution and adds a fuller AbstractMCMC-compatible public interface (including Adaptive MALA and LogDensityProblems integration), along with a broad new test suite.

Changes:

Add mass-matrix (Cholesky) support, plus batched mala_step_batched/logq_mala_batched intended to work on GPU inputs.
Introduce AbstractMCMC interface types (DensityModel, samplers/states/transitions) and implement AbstractMCMC.step/bundle_samples for MALA, Adaptive MALA, and DEER.
Add LogDensityProblems extension + extensive tests (CPU + conditional CUDA GPU tests) and update test dependencies.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/ParallelMCMC.jl`	Wires in the new interface layer, adjusts imports, and exports the new public API.
`src/MALA/MALA.jl`	Adds mass-matrix support, efficient full-step API, and batched/GPU-oriented MALA kernels.
`src/interface.jl`	Implements AbstractMCMC model/sampler/state/transition types and sampling/bundling behavior.
`src/DEER/DEER.jl`	Adds configurable AD backend plumbing and a damping update simplification.
`ext/LogDensityProblemsExt.jl`	Adds `DensityModel(ld)` constructor via Julia’s extension mechanism.
`Project.toml`	Updates dependencies/weakdeps/extensions and compat entries to support the new features.
`test/Project.toml`	Adds CUDA/MCMCChains/Statistics to the test environment.
`test/LocalPreferences.toml`	Forces a local CUDA runtime preference for the test environment.
`test/test-GPU-MALA.jl`	Adds conditional CUDA tests validating device residency and correctness properties.
`test/test-Batched-MALA.jl`	Adds CPU tests for batched MALA shapes, rejection behavior, and stationarity.
`test/test-Adaptive-MALA.jl`	Adds tests for dual-averaging adaptation and Chains output internals.
`test/test-AbstractMCMC-Interface.jl`	Adds end-to-end interface tests for `step`/`sample`/Chains bundling.
`test/test-DEER-Interface.jl`	Adds end-to-end interface tests for DEER construction and sampling.
`test/test-Deer-vs-MALA.jl`	Minor change within existing test logic.

Comments suppressed due to low confidence (1)

src/interface.jl:609

DensityModel is already a concrete type defined earlier in this module. Defining function DensityModel end attempts to rebind the same name to a generic function, which will error at load time ("invalid redefinition of constant DensityModel"). Remove this line; the extension can add ParallelMCMC.DensityModel(::Any) methods without a placeholder.

This method is defined in the `LogDensityProblemsExt` extension and is only
available when `LogDensityProblems` has been loaded.
"""
function DensityModel end   # extended by LogDensityProblemsExt

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/LocalPreferences.toml

Copilot · 2026-03-23T18:43:33Z

src/MALA/MALA.jl

+    μ = x .+ ϵ .* _apply_M(gradlogp_x, cholM)
    d = length(x)
    r = y .- μ
-    return -0.5 * dot(r, r) / (2ϵ) - (d / 2) * log(4π * ϵ)
+    return -0.5 * _quad_Minv(r, cholM) / (2ϵ) - (d / 2) * log(4π * ϵ) - 0.5 * _logdet_M(cholM)
 end


logq_mala’s constant terms (0.0, -0.5, 4π, etc.) force Float64 arithmetic even when x/ϵ are Float32 (notably on GPU), which can significantly slow kernels and undermine the goal of staying in Float32. Consider making constants type-stable w.r.t. ϵ/eltype(x) (e.g., use T = typeof(ϵ) and construct T(4)*T(π) / zero(T) / T(0.5) so the result stays in Float32 when appropriate).

Copilot · 2026-03-23T18:43:34Z

src/MALA/MALA.jl

+    μ = X .+ ε .* _apply_M_batched(gradlogp_X, cholM)
+    R = Y .- μ
+    q = _quad_Minv_batched(R, cholM)
+    ldet = _logdet_M(cholM)
+    return @. -0.5 * q / (2ε) - (D / 2) * log(4π * ε) - 0.5 * ldet


logq_mala_batched will promote to Float64 for Float32 inputs because of Float64 literals (-0.5, 4π, and _logdet_M(::Nothing)=0.0). On GPU this can be a large performance hit. Consider rewriting the expression to keep computations in eltype(X)/typeof(ε) (typed constants and zero(T)), so batched GPU paths remain Float32 throughout.

Suggested change

μ = X .+ ε .* _apply_M_batched(gradlogp_X, cholM)

R = Y .- μ

q = _quad_Minv_batched(R, cholM)

ldet = _logdet_M(cholM)

return @. -0.5 * q / (2ε) - (D / 2) * log(4π * ε) - 0.5 * ldet

T = typeof(ε)

εT = T(ε)

μ = X .+ εT .* _apply_M_batched(gradlogp_X, cholM)

R = Y .- μ

q = _quad_Minv_batched(R, cholM)

ldet = _logdet_M(cholM)

half = T(-0.5)

two = T(2)

fourπ = T(4π)

D_over_2 = T(D) / two

qT = T.(q)

ldetT = T(ldet)

return @. half * qT / (two * εT) - D_over_2 * log(fourπ * εT) - half * ldetT

Copilot · 2026-03-23T18:43:34Z

test/test-GPU-MALA.jl

+function logp_scaled(X)
+    D = size(X, 1)
+    scales = CUDA.CuArray(Float32.(1:D))         # D-vector on GPU
+    return vec(-0.5f0 .* sum(X .^ 2 ./ scales; dims=1))
+end
+function gradlogp_scaled(X)
+    D = size(X, 1)
+    scales = CUDA.CuArray(Float32.(1:D))


logp_scaled/gradlogp_scaled allocate a new scales CuArray on every call. In the stationary-distribution loop this happens thousands of times and can dominate runtime. Consider precomputing scales once (outside the functions, or by closing over it) and reusing it.

Suggested change

function logp_scaled(X)

D = size(X, 1)

scales = CUDA.CuArray(Float32.(1:D)) # D-vector on GPU

return vec(-0.5f0 .* sum(X .^ 2 ./ scales; dims=1))

end

function gradlogp_scaled(X)

D = size(X, 1)

scales = CUDA.CuArray(Float32.(1:D))

const _scaled_normal_scales_cache = Dict{Int, CUDA.CuArray{Float32,1}}()

# Retrieve or create the GPU scales vector for a given dimension D.

# This avoids repeated CuArray allocations inside tight loops.

function _get_scaled_normal_scales(D::Int)

scales = get(_scaled_normal_scales_cache, D, nothing)

if scales === nothing || length(scales) != D

scales = CUDA.CuArray(Float32.(1:D)) # D-vector on GPU

_scaled_normal_scales_cache[D] = scales

end

return scales

end

function logp_scaled(X)

D = size(X, 1)

scales = _get_scaled_normal_scales(D)

return vec(-0.5f0 .* sum(X .^ 2 ./ scales; dims=1))

end

function gradlogp_scaled(X)

D = size(X, 1)

scales = _get_scaled_normal_scales(D)

codecov · 2026-03-23T18:50:44Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

into Ryan_Edits_1

rsenne and others added 6 commits March 12, 2026 08:08

Implements interoperability with sampling libraries. Creates interfac…

a3dd5b2

…e.jl. Minor tweaks to tests/erro handling

Adds preconditioning support for MALA and updates API a bit

d84233a

Add GPU compatible Deer-Mala. Adds adaptrive step size. Begins intero…

1387e33

…perability with Turing/MCMCChains/AbstracTMCMC

Begin adding Parallel GPU DEER ascross sequence. Remove across chains…

b85721a

… version

get GPU tests passing

e8c4996

ADD MORE GPU TESTS

6141d2a

Copilot AI review requested due to automatic review settings March 23, 2026 18:38

Copilot started reviewing on behalf of rsenne March 23, 2026 18:39 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

rsenne added 3 commits March 23, 2026 19:42

type preservation fixes

a2738a0

Merge branch 'Ryan_Edits_1' of https://github.com/rsenne/ParallelMCMC.jl

baff806

into Ryan_Edits_1

remove LocalPreferences.toml

19d3961

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ryan edits 1#6

Ryan edits 1#6
rsenne wants to merge 9 commits intomainfrom
Ryan_Edits_1

rsenne commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

codecov bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    μ = X .+ ε .* _apply_M_batched(gradlogp_X, cholM)
-    R = Y .- μ
-    q = _quad_Minv_batched(R, cholM)
-    ldet = _logdet_M(cholM)
-    return @. -0.5 * q / (2ε) - (D / 2) * log(4π * ε) - 0.5 * ldet
+    T = typeof(ε)
+    εT = T(ε)
+    μ = X .+ εT .* _apply_M_batched(gradlogp_X, cholM)
+    R = Y .- μ
+    q = _quad_Minv_batched(R, cholM)
+    ldet = _logdet_M(cholM)
+    half = T(-0.5)
+    two = T(2)
+    fourπ = T(4π)
+    D_over_2 = T(D) / two
+    qT = T.(q)
+    ldetT = T(ldet)
+    return @. half * qT / (two * εT) - D_over_2 * log(fourπ * εT) - half * ldetT

Conversation

rsenne commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 23, 2026

Welcome to Codecov 🎉

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants