Skip to content

Conversation

@p-slash
Copy link
Collaborator

@p-slash p-slash commented Nov 20, 2025

Implemented mode marginalization. I am using the following transformation: $$\mathbf{C} \rightarrow \mathbf{C} + \mathbf{F} \mathbf{S}\mathbf{F}^\mathrm{T}$$, where $\mathbf{F}$ is the template matrix of shape (ndata, ntemplates) and $\mathbf{S}$ is the diagonal prior variance matrix. Uninformative marginalization requires $S\rightarrow \infty$, but a large number will be enough. The template matrix is also compressed to non-degenerate modes using SVD, which takes a while to calculate.

Co-pilot can summarize code and config changes more effectively.

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 20.51282% with 186 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.56%. Comparing base (d329eab) to head (f53e270).
⚠️ Report is 67 commits behind head on master.

Files with missing lines Patch % Lines
vega/postprocess/fit_results.py 15.38% 44 Missing ⚠️
vega/correlation_item.py 19.14% 36 Missing and 2 partials ⚠️
vega/data.py 19.14% 37 Missing and 1 partial ⚠️
vega/output.py 13.33% 26 Missing ⚠️
vega/vega_interface.py 23.33% 21 Missing and 2 partials ⚠️
vega/build_config.py 0.00% 14 Missing ⚠️
vega/coordinates.py 77.77% 1 Missing and 1 partial ⚠️
vega/scripts/run_vega.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
- Coverage   38.11%   37.56%   -0.55%     
==========================================
  Files          30       30              
  Lines        3983     4102     +119     
  Branches      745      766      +21     
==========================================
+ Hits         1518     1541      +23     
- Misses       2317     2408      +91     
- Partials      148      153       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements mode marginalization for Lyman-alpha forest correlation function analysis by updating the covariance matrix using the transformation C → C + F S F^T, where F is a template matrix and S is a diagonal prior variance matrix. This replaces the previous parameter-based marginalization approach.

Key Changes

  • Covariance-based marginalization replaces fitting marginalization parameters directly
  • SVD compression removes degenerate modes from the template matrix
  • Configuration now uses scale-based parameters (rtmax, rtmin, rpmax, rpmin) instead of individual bin parameters

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
vega/model.py Removed old parameter-based marginalization code and unused numpy import
vega/correlation_item.py Added template generation logic with configuration parsing for marginalization scales
vega/data.py Implemented SVD-based template compression, covariance updates, and migrated from csr_matrix to csr_array
vega/vega_interface.py Added global covariance matrix updates for marginalization with memory management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vega/data.py Outdated

# Compress using svd to remove degenerate modes
print(" SVD of template matrix to remove degenerate modes.")
u, s, vh = np.linalg.svd(templates.toarray(), full_matrices=False)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Converting the sparse template matrix to a dense array with .toarray() before SVD could be memory-intensive for large datasets. Consider using scipy.sparse.linalg.svds for sparse SVD if memory becomes an issue, though it may be slower and less stable for small matrices.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to revisit this. The scipy.sparse.linalg.svds function differs from the numpy version. You need to specify the number of singular values to be computed.

Comment on lines 131 to 178
def get_undist_xi_marg_templates(self):
"""Calculate undistorted correlation function marginalization templates.
Degenerate modes are removed in the (relevant) distorted space in
data.get_dist_xi_marg_templates function.

Returns
-------
sparse array, likely csc_array
Prior sigma is multiplied to each vector.
"""
templates = []
N = self.model_coordinates.rt_regular_grid.size
d = np.ones(1) # required in coo_array construction

if 'rtmax' in self.marginalize_small_scales:
rtmax = self.marginalize_small_scales['rtmax']
idx = np.nonzero(
self.model_coordinates.rt_regular_grid < rtmax
)[0]
for i in idx:
templates.append(coo_array((d, ([0], [i])), shape=(1, N)))

if 'rtmin' in self.marginalize_small_scales:
rtmin = self.marginalize_small_scales['rtmin']
idx = np.nonzero(
self.model_coordinates.rt_regular_grid > rtmin
)[0]
for i in idx:
templates.append(coo_array((d, ([0], [i])), shape=(1, N)))

if 'rpmax' in self.marginalize_small_scales:
rpmax = self.marginalize_small_scales['rpmax']
idx = np.nonzero(
self.model_coordinates.rp_regular_grid < rpmax
)[0]
for i in idx:
templates.append(coo_array((d, ([0], [i])), shape=(1, N)))

if 'rpmin' in self.marginalize_small_scales:
rpmin = self.marginalize_small_scales['rpmin']
idx = np.nonzero(
self.model_coordinates.rp_regular_grid > rpmin
)[0]
for i in idx:
templates.append(coo_array((d, ([0], [i])), shape=(1, N)))

a = self.marginalize_small_scales_prior_sigma
return a * sparse_vstack(templates).tocsr().T
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If self.marginalize_small_scales is an empty dictionary (when all config parameters are 0 or not set), this function will create an empty template list and then call sparse_vstack([]), which will fail. Consider adding a check at the beginning: if not self.marginalize_small_scales: raise ValueError("No marginalization scales configured") or ensure this function is only called when marginalization is actually configured.

Copilot uses AI. Check for mistakes.
vega/data.py Outdated
Comment on lines 83 to 86
cov_update = cov_update[self.model_mask, :][:, self.model_mask]
w = np.logical_and.outer(self.data_mask, self.data_mask)
self.cov_marg_update = cov_update.ravel()
self._cov_mat[w] += self.cov_marg_update
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The masking logic appears incorrect. Line 83 applies model_mask using slicing [self.model_mask, :][:, self.model_mask], but this doesn't correctly extract the submatrix - the first indexing returns rows where model_mask is True, then columns where model_mask is True from those rows. This is not the same as extracting the submatrix cov_update[np.ix_(self.model_mask, self.model_mask)].

More critically, there's a shape mismatch: cov_update after model_mask has shape approximately (n_model, n_model), while w is created from data_mask with shape (full_data_size, full_data_size). When raveled and added, the sizes may not match. The relationship between model_mask, data_mask, and the intended covariance update needs clarification.

Suggested change
cov_update = cov_update[self.model_mask, :][:, self.model_mask]
w = np.logical_and.outer(self.data_mask, self.data_mask)
self.cov_marg_update = cov_update.ravel()
self._cov_mat[w] += self.cov_marg_update
cov_update_model = cov_update[np.ix_(self.model_mask, self.model_mask)]
self.cov_marg_update = cov_update_model.ravel()
self._cov_mat[np.ix_(self.model_mask, self.model_mask)] += cov_update_model

Copilot uses AI. Check for mistakes.
p-slash and others added 7 commits November 20, 2025 21:06
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (1)

vega/output.py:207

  • Nested for statement uses loop variable 'name' of enclosing for statement.
                        for name, v in zip(names, val):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

S = np.diag(np.full(
ntemps, self.corr_item.marginalize_small_scales_prior_sigma**-2
))
Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression 'templates_masked.T.dot(G.T).T' can be simplified to 'G.dot(templates_masked)' for better readability. Both are mathematically equivalent since (A.T @ B.T).T = B @ A, but the simpler form is clearer.

Suggested change
Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S)
Ainv = np.linalg.inv(G.dot(templates_masked) + S)

Copilot uses AI. Check for mistakes.
andreicuceu and others added 10 commits December 18, 2025 16:58
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@andreicuceu andreicuceu merged commit 8f20656 into master Jan 6, 2026
4 of 6 checks passed
@andreicuceu andreicuceu deleted the mode-marginalization branch January 6, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants