-
Notifications
You must be signed in to change notification settings - Fork 5
Mode marginalization #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mode marginalization #145
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #145 +/- ##
==========================================
- Coverage 38.11% 37.56% -0.55%
==========================================
Files 30 30
Lines 3983 4102 +119
Branches 745 766 +21
==========================================
+ Hits 1518 1541 +23
- Misses 2317 2408 +91
- Partials 148 153 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements mode marginalization for Lyman-alpha forest correlation function analysis by updating the covariance matrix using the transformation C → C + F S F^T, where F is a template matrix and S is a diagonal prior variance matrix. This replaces the previous parameter-based marginalization approach.
Key Changes
- Covariance-based marginalization replaces fitting marginalization parameters directly
- SVD compression removes degenerate modes from the template matrix
- Configuration now uses scale-based parameters (rtmax, rtmin, rpmax, rpmin) instead of individual bin parameters
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| vega/model.py | Removed old parameter-based marginalization code and unused numpy import |
| vega/correlation_item.py | Added template generation logic with configuration parsing for marginalization scales |
| vega/data.py | Implemented SVD-based template compression, covariance updates, and migrated from csr_matrix to csr_array |
| vega/vega_interface.py | Added global covariance matrix updates for marginalization with memory management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
vega/data.py
Outdated
|
|
||
| # Compress using svd to remove degenerate modes | ||
| print(" SVD of template matrix to remove degenerate modes.") | ||
| u, s, vh = np.linalg.svd(templates.toarray(), full_matrices=False) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Converting the sparse template matrix to a dense array with .toarray() before SVD could be memory-intensive for large datasets. Consider using scipy.sparse.linalg.svds for sparse SVD if memory becomes an issue, though it may be slower and less stable for small matrices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to revisit this. The scipy.sparse.linalg.svds function differs from the numpy version. You need to specify the number of singular values to be computed.
vega/correlation_item.py
Outdated
| def get_undist_xi_marg_templates(self): | ||
| """Calculate undistorted correlation function marginalization templates. | ||
| Degenerate modes are removed in the (relevant) distorted space in | ||
| data.get_dist_xi_marg_templates function. | ||
|
|
||
| Returns | ||
| ------- | ||
| sparse array, likely csc_array | ||
| Prior sigma is multiplied to each vector. | ||
| """ | ||
| templates = [] | ||
| N = self.model_coordinates.rt_regular_grid.size | ||
| d = np.ones(1) # required in coo_array construction | ||
|
|
||
| if 'rtmax' in self.marginalize_small_scales: | ||
| rtmax = self.marginalize_small_scales['rtmax'] | ||
| idx = np.nonzero( | ||
| self.model_coordinates.rt_regular_grid < rtmax | ||
| )[0] | ||
| for i in idx: | ||
| templates.append(coo_array((d, ([0], [i])), shape=(1, N))) | ||
|
|
||
| if 'rtmin' in self.marginalize_small_scales: | ||
| rtmin = self.marginalize_small_scales['rtmin'] | ||
| idx = np.nonzero( | ||
| self.model_coordinates.rt_regular_grid > rtmin | ||
| )[0] | ||
| for i in idx: | ||
| templates.append(coo_array((d, ([0], [i])), shape=(1, N))) | ||
|
|
||
| if 'rpmax' in self.marginalize_small_scales: | ||
| rpmax = self.marginalize_small_scales['rpmax'] | ||
| idx = np.nonzero( | ||
| self.model_coordinates.rp_regular_grid < rpmax | ||
| )[0] | ||
| for i in idx: | ||
| templates.append(coo_array((d, ([0], [i])), shape=(1, N))) | ||
|
|
||
| if 'rpmin' in self.marginalize_small_scales: | ||
| rpmin = self.marginalize_small_scales['rpmin'] | ||
| idx = np.nonzero( | ||
| self.model_coordinates.rp_regular_grid > rpmin | ||
| )[0] | ||
| for i in idx: | ||
| templates.append(coo_array((d, ([0], [i])), shape=(1, N))) | ||
|
|
||
| a = self.marginalize_small_scales_prior_sigma | ||
| return a * sparse_vstack(templates).tocsr().T |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If self.marginalize_small_scales is an empty dictionary (when all config parameters are 0 or not set), this function will create an empty template list and then call sparse_vstack([]), which will fail. Consider adding a check at the beginning: if not self.marginalize_small_scales: raise ValueError("No marginalization scales configured") or ensure this function is only called when marginalization is actually configured.
vega/data.py
Outdated
| cov_update = cov_update[self.model_mask, :][:, self.model_mask] | ||
| w = np.logical_and.outer(self.data_mask, self.data_mask) | ||
| self.cov_marg_update = cov_update.ravel() | ||
| self._cov_mat[w] += self.cov_marg_update |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The masking logic appears incorrect. Line 83 applies model_mask using slicing [self.model_mask, :][:, self.model_mask], but this doesn't correctly extract the submatrix - the first indexing returns rows where model_mask is True, then columns where model_mask is True from those rows. This is not the same as extracting the submatrix cov_update[np.ix_(self.model_mask, self.model_mask)].
More critically, there's a shape mismatch: cov_update after model_mask has shape approximately (n_model, n_model), while w is created from data_mask with shape (full_data_size, full_data_size). When raveled and added, the sizes may not match. The relationship between model_mask, data_mask, and the intended covariance update needs clarification.
| cov_update = cov_update[self.model_mask, :][:, self.model_mask] | |
| w = np.logical_and.outer(self.data_mask, self.data_mask) | |
| self.cov_marg_update = cov_update.ravel() | |
| self._cov_mat[w] += self.cov_marg_update | |
| cov_update_model = cov_update[np.ix_(self.model_mask, self.model_mask)] | |
| self.cov_marg_update = cov_update_model.ravel() | |
| self._cov_mat[np.ix_(self.model_mask, self.model_mask)] += cov_update_model |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 14 comments.
Comments suppressed due to low confidence (1)
vega/output.py:207
- Nested for statement uses loop variable 'name' of enclosing for statement.
for name, v in zip(names, val):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| S = np.diag(np.full( | ||
| ntemps, self.corr_item.marginalize_small_scales_prior_sigma**-2 | ||
| )) | ||
| Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S) |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression 'templates_masked.T.dot(G.T).T' can be simplified to 'G.dot(templates_masked)' for better readability. Both are mathematically equivalent since (A.T @ B.T).T = B @ A, but the simpler form is clearer.
| Ainv = np.linalg.inv(templates_masked.T.dot(G.T).T + S) | |
| Ainv = np.linalg.inv(G.dot(templates_masked) + S) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Implemented mode marginalization. I am using the following transformation:$$\mathbf{C} \rightarrow \mathbf{C} + \mathbf{F} \mathbf{S}\mathbf{F}^\mathrm{T}$$ , where $\mathbf{F}$ is the template matrix of shape (ndata, ntemplates) and $\mathbf{S}$ is the diagonal prior variance matrix. Uninformative marginalization requires $S\rightarrow \infty$ , but a large number will be enough. The template matrix is also compressed to non-degenerate modes using SVD, which takes a while to calculate.
Co-pilot can summarize code and config changes more effectively.