Mode marginalization #145

p-slash · 2025-11-20T20:10:55Z

Implemented mode marginalization. I am using the following transformation: $$\mathbf{C} \rightarrow \mathbf{C} + \mathbf{F} \mathbf{S}\mathbf{F}^\mathrm{T}$$, where $\mathbf{F}$ is the template matrix of shape (ndata, ntemplates) and $\mathbf{S}$ is the diagonal prior variance matrix. Uninformative marginalization requires $S\rightarrow \infty$, but a large number will be enough. The template matrix is also compressed to non-degenerate modes using SVD, which takes a while to calculate.

Co-pilot can summarize code and config changes more effectively.

codecov · 2025-11-20T20:12:44Z

Codecov Report

❌ Patch coverage is 20.51282% with 186 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.56%. Comparing base (d329eab) to head (f53e270).
⚠️ Report is 67 commits behind head on master.

Files with missing lines	Patch %	Lines
vega/postprocess/fit_results.py	15.38%	44 Missing ⚠️
vega/correlation_item.py	19.14%	36 Missing and 2 partials ⚠️
vega/data.py	19.14%	37 Missing and 1 partial ⚠️
vega/output.py	13.33%	26 Missing ⚠️
vega/vega_interface.py	23.33%	21 Missing and 2 partials ⚠️
vega/build_config.py	0.00%	14 Missing ⚠️
vega/coordinates.py	77.77%	1 Missing and 1 partial ⚠️
vega/scripts/run_vega.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
- Coverage   38.11%   37.56%   -0.55%     
==========================================
  Files          30       30              
  Lines        3983     4102     +119     
  Branches      745      766      +21     
==========================================
+ Hits         1518     1541      +23     
- Misses       2317     2408      +91     
- Partials      148      153       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR implements mode marginalization for Lyman-alpha forest correlation function analysis by updating the covariance matrix using the transformation C → C + F S F^T, where F is a template matrix and S is a diagonal prior variance matrix. This replaces the previous parameter-based marginalization approach.

Key Changes

Covariance-based marginalization replaces fitting marginalization parameters directly
SVD compression removes degenerate modes from the template matrix
Configuration now uses scale-based parameters (rtmax, rtmin, rpmax, rpmin) instead of individual bin parameters

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
vega/model.py	Removed old parameter-based marginalization code and unused numpy import
vega/correlation_item.py	Added template generation logic with configuration parsing for marginalization scales
vega/data.py	Implemented SVD-based template compression, covariance updates, and migrated from csr_matrix to csr_array
vega/vega_interface.py	Added global covariance matrix updates for marginalization with memory management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vega/data.py

vega/vega_interface.py

Copilot · 2025-11-20T20:58:41Z

vega/data.py

+
+        # Compress using svd to remove degenerate modes
+        print("  SVD of template matrix to remove degenerate modes.")
+        u, s, vh = np.linalg.svd(templates.toarray(), full_matrices=False)


[nitpick] Converting the sparse template matrix to a dense array with .toarray() before SVD could be memory-intensive for large datasets. Consider using scipy.sparse.linalg.svds for sparse SVD if memory becomes an issue, though it may be slower and less stable for small matrices.

We may want to revisit this. The scipy.sparse.linalg.svds function differs from the numpy version. You need to specify the number of singular values to be computed.

Copilot · 2025-11-20T20:58:41Z

vega/correlation_item.py

+    def get_undist_xi_marg_templates(self):
+        """Calculate undistorted correlation function marginalization templates.
+        Degenerate modes are removed in the (relevant) distorted space in
+            data.get_dist_xi_marg_templates function.
+
+        Returns
+        -------
+        sparse array, likely csc_array
+            Prior sigma is multiplied to each vector.
+        """
+        templates = []
+        N = self.model_coordinates.rt_regular_grid.size
+        d = np.ones(1)  # required in coo_array construction
+
+        if 'rtmax' in self.marginalize_small_scales:
+            rtmax = self.marginalize_small_scales['rtmax']
+            idx = np.nonzero(
+                self.model_coordinates.rt_regular_grid < rtmax
+            )[0]
+            for i in idx:
+                templates.append(coo_array((d, ([0], [i])), shape=(1, N)))
+
+        if 'rtmin' in self.marginalize_small_scales:
+            rtmin = self.marginalize_small_scales['rtmin']
+            idx = np.nonzero(
+                self.model_coordinates.rt_regular_grid > rtmin
+            )[0]
+            for i in idx:
+                templates.append(coo_array((d, ([0], [i])), shape=(1, N)))
+
+        if 'rpmax' in self.marginalize_small_scales:
+            rpmax = self.marginalize_small_scales['rpmax']
+            idx = np.nonzero(
+                self.model_coordinates.rp_regular_grid < rpmax
+            )[0]
+            for i in idx:
+                templates.append(coo_array((d, ([0], [i])), shape=(1, N)))
+
+        if 'rpmin' in self.marginalize_small_scales:
+            rpmin = self.marginalize_small_scales['rpmin']
+            idx = np.nonzero(
+                self.model_coordinates.rp_regular_grid > rpmin
+            )[0]
+            for i in idx:
+                templates.append(coo_array((d, ([0], [i])), shape=(1, N)))
+
+        a = self.marginalize_small_scales_prior_sigma
+        return a * sparse_vstack(templates).tocsr().T


If self.marginalize_small_scales is an empty dictionary (when all config parameters are 0 or not set), this function will create an empty template list and then call sparse_vstack([]), which will fail. Consider adding a check at the beginning: if not self.marginalize_small_scales: raise ValueError("No marginalization scales configured") or ensure this function is only called when marginalization is actually configured.

vega/data.py

Copilot · 2025-11-20T20:58:42Z

vega/data.py

+            cov_update = cov_update[self.model_mask, :][:, self.model_mask]
+            w = np.logical_and.outer(self.data_mask, self.data_mask)
+            self.cov_marg_update = cov_update.ravel()
+            self._cov_mat[w] += self.cov_marg_update


The masking logic appears incorrect. Line 83 applies model_mask using slicing [self.model_mask, :][:, self.model_mask], but this doesn't correctly extract the submatrix - the first indexing returns rows where model_mask is True, then columns where model_mask is True from those rows. This is not the same as extracting the submatrix cov_update[np.ix_(self.model_mask, self.model_mask)].

More critically, there's a shape mismatch: cov_update after model_mask has shape approximately (n_model, n_model), while w is created from data_mask with shape (full_data_size, full_data_size). When raveled and added, the sizes may not match. The relationship between model_mask, data_mask, and the intended covariance update needs clarification.

Suggested change

cov_update = cov_update[self.model_mask, :][:, self.model_mask]

w = np.logical_and.outer(self.data_mask, self.data_mask)

self.cov_marg_update = cov_update.ravel()

self._cov_mat[w] += self.cov_marg_update

cov_update_model = cov_update[np.ix_(self.model_mask, self.model_mask)]

self.cov_marg_update = cov_update_model.ravel()

self._cov_mat[np.ix_(self.model_mask, self.model_mask)] += cov_update_model

vega/vega_interface.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

… sigma in Data

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (1)

vega/output.py:207

Nested for statement uses loop variable 'name' of enclosing for statement.

                        for name, v in zip(names, val):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vega/correlation_item.py

vega/output.py

vega/postprocess/fit_results.py

Copilot · 2025-12-19T00:38:07Z

vega/data.py

+            S = np.diag(np.full(
+                ntemps, self.corr_item.marginalize_small_scales_prior_sigma**-2
+            ))
+            Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S)


The expression 'templates_masked.T.dot(G.T).T' can be simplified to 'G.dot(templates_masked)' for better readability. Both are mathematically equivalent since (A.T @ B.T).T = B @ A, but the simpler form is clearer.

Suggested change

Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S)

Ainv = np.linalg.inv(G.dot(templates_masked) + S)

vega/output.py

vega/correlation_item.py

vega/postprocess/fit_results.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

p-slash added 16 commits November 20, 2025 10:40

update to scipy.sparse.csr_array. This is compatible with numpy arrays

5fd00f6

initial redesign of marginalization templates

618a9cc

fixes to _init_marg_xi_templates

e687600

global cov updated using templates

759f259

moved marginalization templates from model to corr item and data

5c4a6d0

fix at slicing of cov matrix when marginalization

a9b6b88

syntax fix to previous commit

fab1163

fix to previous commit. need to ravel update cov

655dacb

fix and print status

728d940

svd to stabilize template matrix

1761a47

need to use model_coordinates not dist_model_coordinates

7c5ad21

print SVD stats

e1fad95

fix factor input for get_dist_xi_marg_templates

bcf60a2

update global cov block by block

e029682

cache cov_marg_update and use it in global cov

89ba1c0

clean up and add comments

558cabb

p-slash added 2 commits November 20, 2025 15:13

more clean up (old fitting code)

a22439d

more clean up. remove unused scipy.sparse.block_array

5e208dc

andreicuceu requested a review from Copilot November 20, 2025 20:52

Copilot started reviewing on behalf of andreicuceu November 20, 2025 20:52 View session

Copilot finished reviewing on behalf of andreicuceu November 20, 2025 20:57

Copilot AI reviewed Nov 20, 2025

View reviewed changes

p-slash and others added 7 commits November 20, 2025 21:06

Remove unused wm in vega/vega_interface.py

42cc50c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

replace assert with raise ValueError in vega/data.py

f6dcbef

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove unused enumerate in vega/vega_interface.py

42d64fa

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

use np.ix_

94f00d5

use ix. return full template matrix

56ab149

Remove ravel to match dimensions

dd90ae4

Update template collection to intersection of input cuts

8c1c7cf

andreicuceu and others added 15 commits November 25, 2025 17:07

Update rp and rt min/max to 300 Mpc/h

cdd1393

Update templates

43055d2

Merge branch 'master' into mode-marginalization

bd7d3f3

return templates as they are from CorrelationItem. Multiply them with…

71e0d17

… sigma in Data

save the matrix to recover marginalized params

797aa08

use data.variance instead of cov_mat.diagonal()

2444656

proposed solution when marginalizing: add in-place to bestfit model.

86728e4

marginalization array shape fixes

db01e3a

save original covariance matrix for plotting purposes

6beb133

Propagate marg besfit pars to output and fit results

b9a40ba

Fix function call in output

81e2382

Add explicit pad_value=0 for NB column

3ddc9f7

Fix marg coeff read name

7e79855

mInor updates to make output more uniform

07f9bd4

Improve handling of marg coeff

0f818f3

andreicuceu requested a review from Copilot December 19, 2025 00:31

Copilot started reviewing on behalf of andreicuceu December 19, 2025 00:32 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

andreicuceu and others added 10 commits December 18, 2025 16:58

Update vega/correlation_item.py

433e5ba

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update vega/output.py

f10f192

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update vega/postprocess/fit_results.py

275545f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add protection when no marginalization is done

4a5727b

Update vega/data.py

aa14b87

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update vega/correlation_item.py

bca7f91

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update vega/postprocess/fit_results.py

58a53f7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add option to skip arinyo for peak component

5d6bb76

Change default of nl skip to False

97be9b2

Add skip nl flag to config builder

f53e270

andreicuceu merged commit 8f20656 into master Jan 6, 2026
4 of 6 checks passed

andreicuceu deleted the mode-marginalization branch January 6, 2026 21:40

	Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S)
	Ainv = np.linalg.inv(G.dot(templates_masked) + S)

Mode marginalization #145

Mode marginalization #145

Uh oh!

Conversation

p-slash commented Nov 20, 2025

Uh oh!

codecov bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

p-slash Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 20, 2025 •

edited

Loading