Skip to content

eliminate allocs from llh calc#254

Open
thorek1 wants to merge 4 commits intomainfrom
remove-allocs
Open

eliminate allocs from llh calc#254
thorek1 wants to merge 4 commits intomainfrom
remove-allocs

Conversation

@thorek1
Copy link
Owner

@thorek1 thorek1 commented Jan 30, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 30, 2026 12:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate allocations from likelihood calculations to improve performance. The changes focus on using views instead of copies and leveraging in-place operations with workspace buffers.

Changes:

  • Added @view macros to avoid copying matrix slices in perturbation solver and Kalman filter
  • Modified Lyapunov workspace allocation logic to check for zero-sized workspaces
  • Removed copy() calls when returning Lyapunov solver results
  • Optimized convergence checking in Lyapunov solvers to use workspace buffers
  • Changed finite check in Kalman filter from !all(isfinite.()) to any(x -> !isfinite(x),)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/perturbation.jl Converted matrix slicing operations to views to avoid allocations, added views to mul! operation arguments
src/options_and_caches.jl Changed Lyapunov workspace allocation check from ws.n != n to ws.n == 0
src/filter/kalman.jl Used view for data column access and optimized finite check
src/algorithms/lyapunov.jl Eliminated allocations in convergence checks, added conditional to avoid collecting already-dense matrices, removed copy operations on return values
Comments suppressed due to low confidence (1)

src/perturbation.jl:76

  • Using @view for Ā₀ᵤ and then calling lu! on it (line 76) is problematic. The lu! function performs in-place factorization, which will modify the parent matrix A₀ in the region corresponding to [1:T.nPresent_only, T.present_only_idx]. This can lead to incorrect results if A₀ is used elsewhere after this operation, or corruption of data in adjacent regions of the parent matrix. Similarly, A₋ᵤ is used as the output argument in mul! operations (lines 85, 87), which will modify the parent matrix A₋.

Consider either:

  1. Keeping these as copies (non-views) since they will be modified
  2. Using a pre-allocated workspace buffer instead of modifying the parent matrices
    Ā₀ᵤ  = @view A₀[1:T.nPresent_only, T.present_only_idx]
    A₊ᵤ  = @view A₊[1:T.nPresent_only,:]
    Ã₀ᵤ  = @view A₀[1:T.nPresent_only, T.present_but_not_only_idx]
    A₋ᵤ  = @view A₋[1:T.nPresent_only,:]

    # end # timeit_debug
    # @timeit_debug timer "Invert Ā₀ᵤ" begin

    Ā̂₀ᵤ = ℒ.lu!(Ā₀ᵤ, check = false)

Comment on lines +1240 to 1253
if ws.n == 0
workspaces.lyapunov_1st_order = Lyapunov_workspace(n)
end
return workspaces.lyapunov_1st_order
elseif order == :second_order
ws = workspaces.lyapunov_2nd_order
if ws.n != n
if ws.n == 0
workspaces.lyapunov_2nd_order = Lyapunov_workspace(n)
end
return workspaces.lyapunov_2nd_order
elseif order == :third_order
ws = workspaces.lyapunov_3rd_order
if ws.n != n
if ws.n == 0
workspaces.lyapunov_3rd_order = Lyapunov_workspace(n)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workspace allocation logic has been changed from checking ws.n != n to checking ws.n == 0. This means that if the dimension n changes between calls (but is not zero), the workspace will not be reallocated to match the new size. This could lead to dimension mismatch errors or incorrect results when the workspace buffers are used with matrices of a different size than what they were allocated for.

The original condition ws.n != n correctly handled size changes, while ws.n == 0 only handles the initial allocation case.

Copilot uses AI. Check for mistakes.
# end

return copy(𝐂), iters, reached_tol # return info on convergence
return 𝐂, iters, reached_tol # return info on convergence
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value has been changed from copy(𝐂) to just 𝐂, which returns the workspace buffer directly. This means the caller now receives a mutable reference to the internal workspace buffer. If the caller modifies this matrix or if the workspace is reused in a subsequent call, the previously returned matrix will be unexpectedly modified. This breaks the expected semantics where each call to the function returns an independent result.

The same issue exists on lines 404 and 456 in the bicgstab and gmres methods. Consider whether callers expect to receive ownership of the result or whether they're aware they're receiving a workspace buffer that may be modified.

Suggested change
return 𝐂, iters, reached_tol # return info on convergence
return copy(𝐂), iters, reached_tol # return info on convergence

Copilot uses AI. Check for mistakes.
# end

return copy(𝐗), workspace.bicgstab_workspace.stats.niter, reached_tol
return 𝐗, workspace.bicgstab_workspace.stats.niter, reached_tol
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as line 352: returning 𝐗 directly instead of copy(𝐗) means the caller receives a reference to the internal workspace buffer, which can be modified unexpectedly.

Suggested change
return 𝐗, workspace.bicgstab_workspace.stats.niter, reached_tol
return copy(𝐗), workspace.bicgstab_workspace.stats.niter, reached_tol

Copilot uses AI. Check for mistakes.
# end

return copy(𝐗), workspace.gmres_workspace.stats.niter, reached_tol
return 𝐗, workspace.gmres_workspace.stats.niter, reached_tol
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as lines 352 and 404: returning 𝐗 directly instead of copy(𝐗) means the caller receives a reference to the internal workspace buffer, which can be modified unexpectedly.

Suggested change
return 𝐗, workspace.gmres_workspace.stats.niter, reached_tol
return copy(𝐗), workspace.gmres_workspace.stats.niter, reached_tol

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Jan 30, 2026

Benchmark Results

main d7dd37b... main / d7dd37b...
FS2000/NSSS 15 ± 0.44 μs 14.8 ± 0.29 μs 1.01 ± 0.036
FS2000/covariance 0.125 ± 0.029 ms 0.112 ± 0.0098 ms 1.12 ± 0.28
FS2000/irf 0.0967 ± 0.016 ms 0.0952 ± 0.014 ms 1.02 ± 0.22
FS2000/jacobian 0.797 ± 0.004 μs 0.798 ± 0.006 μs 0.999 ± 0.009
FS2000/lyapunov/bartels_stewart 0.12 ± 0.015 ms 0.114 ± 0.0063 ms 1.06 ± 0.14
FS2000/lyapunov/bicgstab 28.1 ± 1.2 μs 31.3 ± 1.9 μs 0.899 ± 0.068
FS2000/lyapunov/doubling 0.0382 ± 0.011 ms 0.0328 ± 0.00077 ms 1.16 ± 0.33
FS2000/lyapunov/gmres 28 ± 1.2 μs 20.4 ± 0.99 μs 1.37 ± 0.087
FS2000/qme/doubling 0.106 ± 0.0089 ms 0.172 ± 0.0072 ms 0.613 ± 0.057
FS2000/qme/schur 0.0826 ± 0.011 ms 0.0846 ± 0.013 ms 0.976 ± 0.2
NAWM_EAUS_2008/NSSS 10.3 ± 0.37 ms 10.6 ± 1 ms 0.973 ± 0.1
NAWM_EAUS_2008/covariance 0.0338 ± 0.0066 s 0.0351 ± 0.0012 s 0.965 ± 0.19
NAWM_EAUS_2008/irf 17.4 ± 0.92 ms 18 ± 0.81 ms 0.965 ± 0.067
NAWM_EAUS_2008/jacobian 0.0663 ± 0.0055 ms 0.0657 ± 0.0041 ms 1.01 ± 0.1
NAWM_EAUS_2008/lyapunov/bartels_stewart 29.9 ± 1 ms 29.3 ± 1.3 ms 1.02 ± 0.058
NAWM_EAUS_2008/lyapunov/bicgstab 0.196 ± 0.00052 s 0.201 ± 0.00015 s 0.978 ± 0.0027
NAWM_EAUS_2008/lyapunov/doubling 16.3 ± 1.6 ms 16 ± 0.062 ms 1.02 ± 0.1
NAWM_EAUS_2008/lyapunov/gmres 0.151 ± 0.0012 s 0.153 ± 0.0005 s 0.988 ± 0.0083
NAWM_EAUS_2008/qme/doubling 22.9 ± 0.96 ms 24.7 ± 1.6 ms 0.927 ± 0.073
NAWM_EAUS_2008/qme/schur 16.5 ± 0.59 ms 22.5 ± 0.76 ms 0.731 ± 0.036
Smets_Wouters_2007/NSSS 0.165 ± 0.013 ms 0.173 ± 0.01 ms 0.952 ± 0.094
Smets_Wouters_2007/covariance 1.64 ± 0.046 ms 1.7 ± 0.083 ms 0.968 ± 0.055
Smets_Wouters_2007/irf 0.671 ± 0.032 ms 0.701 ± 0.027 ms 0.958 ± 0.059
Smets_Wouters_2007/jacobian 10.7 ± 19 μs 19.3 ± 21 μs 0.556 ± 1.1
Smets_Wouters_2007/lyapunov/bartels_stewart 1.57 ± 0.019 ms 1.54 ± 0.019 ms 1.02 ± 0.017
Smets_Wouters_2007/lyapunov/bicgstab 5.97 ± 0.018 ms 5.76 ± 0.019 ms 1.04 ± 0.0047
Smets_Wouters_2007/lyapunov/doubling 0.963 ± 0.012 ms 0.928 ± 0.0029 ms 1.04 ± 0.013
Smets_Wouters_2007/lyapunov/gmres 6.74 ± 0.027 ms 6.76 ± 0.038 ms 0.997 ± 0.0068
Smets_Wouters_2007/qme/doubling 1.73 ± 0.024 ms 1.76 ± 0.024 ms 0.985 ± 0.019
Smets_Wouters_2007/qme/schur 1.25 ± 0.028 ms 1.33 ± 0.028 ms 0.941 ± 0.029
time_to_load 19.3 ± 0.093 s 19.5 ± 0.32 s 0.989 ± 0.017
main d7dd37b... main / d7dd37b...
FS2000/NSSS 0.379 k allocs: 20.1 kB 0.379 k allocs: 20.1 kB 1
FS2000/covariance 0.991 k allocs: 0.113 MB 0.94 k allocs: 0.0797 MB 1.41
FS2000/irf 1.75 k allocs: 0.141 MB 1.75 k allocs: 0.141 MB 1
FS2000/jacobian 0 allocs: 0 B 0 allocs: 0 B
FS2000/lyapunov/bartels_stewart 0.074 k allocs: 0.0656 MB 0.071 k allocs: 0.063 MB 1.04
FS2000/lyapunov/bicgstab 0.053 k allocs: 17.1 kB 0.047 k allocs: 11.8 kB 1.45
FS2000/lyapunov/doubling 0.051 k allocs: 29.6 kB 18 allocs: 0.516 kB 57.5
FS2000/lyapunov/gmres 0.053 k allocs: 17.1 kB 0.047 k allocs: 11.8 kB 1.45
FS2000/qme/doubling 0.131 k allocs: 31.3 kB 0.113 k allocs: 26.7 kB 1.17
FS2000/qme/schur 0.223 k allocs: 0.0813 MB 0.205 k allocs: 0.0767 MB 1.06
NAWM_EAUS_2008/NSSS 2.57 k allocs: 3.85 MB 2.62 k allocs: 4.18 MB 0.921
NAWM_EAUS_2008/covariance 4.54 k allocs: 15.7 MB 4.51 k allocs: 10.1 MB 1.55
NAWM_EAUS_2008/irf 12 k allocs: 13 MB 12 k allocs: 13.3 MB 0.98
NAWM_EAUS_2008/jacobian 4 allocs: 0.709 MB 4 allocs: 0.709 MB 1
NAWM_EAUS_2008/lyapunov/bartels_stewart 0.173 k allocs: 4.92 MB 0.17 k allocs: 4.52 MB 1.09
NAWM_EAUS_2008/lyapunov/bicgstab 0.056 k allocs: 2.43 MB 0.05 k allocs: 1.62 MB 1.5
NAWM_EAUS_2008/lyapunov/doubling 0.058 k allocs: 5.26 MB 19 allocs: 6.98 kB 771
NAWM_EAUS_2008/lyapunov/gmres 0.056 k allocs: 2.43 MB 0.05 k allocs: 1.62 MB 1.5
NAWM_EAUS_2008/qme/doubling 0.186 k allocs: 4.29 MB 0.152 k allocs: 3.62 MB 1.19
NAWM_EAUS_2008/qme/schur 0.336 k allocs: 7.35 MB 0.302 k allocs: 6.68 MB 1.1
Smets_Wouters_2007/NSSS 1.34 k allocs: 0.11 MB 1.47 k allocs: 0.352 MB 0.311
Smets_Wouters_2007/covariance 2.97 k allocs: 1.18 MB 3.03 k allocs: 0.933 MB 1.27
Smets_Wouters_2007/irf 5.52 k allocs: 1.1 MB 5.65 k allocs: 1.34 MB 0.823
Smets_Wouters_2007/jacobian 4 allocs: 0.0611 MB 4 allocs: 0.0611 MB 1
Smets_Wouters_2007/lyapunov/bartels_stewart 0.084 k allocs: 0.441 MB 0.081 k allocs: 0.407 MB 1.08
Smets_Wouters_2007/lyapunov/bicgstab 0.055 k allocs: 0.202 MB 0.049 k allocs: 0.135 MB 1.49
Smets_Wouters_2007/lyapunov/doubling 0.057 k allocs: 0.435 MB 18 allocs: 1 kB 445
Smets_Wouters_2007/lyapunov/gmres 0.055 k allocs: 0.202 MB 0.049 k allocs: 0.135 MB 1.49
Smets_Wouters_2007/qme/doubling 0.18 k allocs: 0.389 MB 0.146 k allocs: 0.329 MB 1.18
Smets_Wouters_2007/qme/schur 0.323 k allocs: 0.729 MB 0.289 k allocs: 0.67 MB 1.09
time_to_load 0.143 k allocs: 10.6 kB 0.143 k allocs: 10.6 kB 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 81.15942% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.03%. Comparing base (326b63b) to head (78c1445).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/get_functions.jl 7.14% 13 Missing ⚠️
src/filter/inversion.jl 45.45% 6 Missing ⚠️
src/custom_autodiff_rules/zygote.jl 88.63% 5 Missing ⚠️
src/moments.jl 66.66% 5 Missing ⚠️
src/MacroModelling.jl 62.50% 3 Missing ⚠️
src/algorithms/lyapunov.jl 84.61% 2 Missing ⚠️
src/filter/kalman.jl 80.00% 2 Missing ⚠️
src/options_and_caches.jl 95.91% 2 Missing ⚠️
src/custom_autodiff_rules/forwarddiff.jl 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (326b63b) and HEAD (78c1445). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (326b63b) HEAD (78c1445)
16 10
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #254       +/-   ##
===========================================
- Coverage   81.79%   43.03%   -38.77%     
===========================================
  Files          23       23               
  Lines       14202    14030      -172     
===========================================
- Hits        11617     6038     -5579     
- Misses       2585     7992     +5407     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…e temporary allocations and improve performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants