Skip to content

183 add dask parallelism for cube single field imaging#206

Closed
Jan-Willem wants to merge 773 commits intomainfrom
183-add-dask-parallelism-for-cube-single-field-imaging
Closed

183 add dask parallelism for cube single field imaging#206
Jan-Willem wants to merge 773 commits intomainfrom
183-add-dask-parallelism-for-cube-single-field-imaging

Conversation

@Jan-Willem
Copy link
Copy Markdown
Member

No description provided.

kgolap and others added 30 commits October 16, 2025 17:18
Merged the latest changes from 45-fit_gaussian, modified it to return
the max PSF sidelobe. This is now returned in the return dictionary from
the deconvolver.

The stopcode is not yet populated, this is intentional. Once the
stopcodes are finalized and documented for iteration control, it will be
straightforward to add it to the deconvolver.
The previous change added an extra return value to the main lobe fitting
routine, and psf_gaussian_fit was not updated to reflect that.
There is still a single failing test, but fixed a few tests that were
failing due to the added return value from the find_main_lobe function.
When peak intensity was zero, there was a separate return call that
returned 3 values (rather than the updated 4 values). This has been
fixed, and all tests now pass.
Fixed a handful of bugs in the deconvolve() function, related to moving
the per-plane iteration outside of hogbom_clean and into deconvolve()
where it it more appropriate.

Added tests for the deconvolve() function, but at the moment they are
failing.

The hogbom clean function and python bindings need to be refactored to
accept only 2D arrays, and move all the cube slicing to be outside the
actual deconvolution part. This will be done in a subsequent commit.
The hogbom clean C++ implementation and associated Python bindings have
been refactored to remove the iteration over polarization. This is the
initial version of the refactor, and has not been tested for correctness
etc.

The deconvolve() function will do all the iteration, and the lower level
deconvolution functions will only accept a single plane to run the
deconvolution on.
The refactor to pull out the iteration over all planes into the
deconvolve() function is complete. Hogbom clean now expects single plane
numpy arrays (rather than xarray cubelets).

The unit tests have been updated to reflect this change, and have been
verified to work correctly.
Added a few tests to check the return dictionary structure, indexing,
and consistency.

Fixed some bugs with the masks in deconvolve().
Uses the deconvolve() function now, with `algorithm='hogbom'` passed in to it.
Some minor fixes for bugs that surfaced while fixing the notebook.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Dask-parallel cube imaging for the “single field” workflow by introducing new distributed entrypoints and node tasks, plus utilities to pre-create Zarr outputs and to chunk work based on available threads/memory.

Changes:

  • Introduces a new distributed cube-imaging driver (image_cube_single_field) that builds and executes a GraphViper map/reduce workflow and writes per-chunk results to Zarr.
  • Adds low-level Zarr utilities (create_empty_data_variables_on_disk) and expands data-partitioning docs for choosing chunking based on cluster resources.
  • Refactors selection-param validation and imaging-weight calculation to work with ProcessingSetIterator rather than a DataTree.

Reviewed changes

Copilot reviewed 19 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/core/imaging/test_imaging_weights.py Updates test import path for imaging-weights task module.
src/astroviper/utils/io.py Adds helper to create empty Zarr arrays for imaging data variables.
src/astroviper/utils/data_partitioning.py Adds/expands docstrings for thread info, factorization, and chunking logic.
src/astroviper/utils/check_params.py Adds ProcessingSetIterator-based selection checking and iterator reset.
src/astroviper/task/imaging/calculate_imaging_weights.py Refactors imaging-weights calculation to accept a PS iterator + image dataset.
src/astroviper/task/imaging/add_uv_sampling_grid.py Renames/adjusts UV sampling gridding tasks and data-group defaults.
src/astroviper/task/imaging/add_visibility_grid.py Renames/adjusts visibility gridding tasks and data-group defaults.
src/astroviper/task/imaging/make_image_single_field.py Adds a single-field node task for chunk processing and Zarr writes.
src/astroviper/task/imaging/make_image_mosaic.py Adds a mosaic node task for chunk processing and Zarr writes.
src/astroviper/task/imaging/image_cube_single_field_node_task.py Adds node task implementation for cube imaging per frequency chunk.
src/astroviper/task/init.py Introduces astroviper.task package initializer/version exposure.
src/astroviper/distributed/imaging/image_cube_single_field.py Adds distributed entrypoint for cube single-field imaging + map/reduce wiring.
src/astroviper/distributed/imaging/cube_imaging_niter0.py Switches node-task import to new task namespace.
src/astroviper/distributed/imaging/feather.py Moves feather_core import inside function (lazy import).
src/astroviper/distributed/imaging/utils/init.py Removes exports of legacy node tasks from distributed utils.
src/astroviper/core/imaging/imaging_weighting/grid_imaging_weights.py Updates gridding API to pass n_uv/delta_lm directly.
src/astroviper/core/imaging/imaging_utils/gcf_prolate_spheroidal.py Adds 1D correcting-image helper for prolate-spheroidal gridder.
src/astroviper/core/imaging/fft_normalize_prolate_spheriodal_gridder.py Adds memory-aware per-plane IFFT normalization pipeline.
src/astroviper/core/imaging/fft_norm_img_xds_v2.py Adds a memory-efficient IFFT+crop path and normalization refactor.
src/astroviper/core/imaging/init.py Stops re-exporting some gridder helpers from the core imaging namespace.
docs/tutorials/cube_single_field_imaging.ipynb Adds a tutorial notebook demonstrating cube single-field imaging.
Comments suppressed due to low confidence (5)

src/astroviper/task/imaging/calculate_imaging_weights.py:151

  • ms_xdt.data_groups[...] = ... is inconsistent with the rest of the codebase, which stores data groups under xds.attrs["data_groups"] (e.g., core/visibility_manipulation/phase_shift.py:117). If ms_xdt is an xarray DataTree node (as implied by check_sel_params_ps_iter using ms_xdt.ds), this will raise AttributeError. Write to ms_xdt.ds.attrs["data_groups"][...] (or ms_xdt.attrs[...] if appropriate) instead.
    src/astroviper/task/imaging/calculate_imaging_weights.py:42
  • The function signature/docstring/return type are out of sync after switching from ps_xdt to ps_iter + img_xds: the type annotation still references returning DataTree, and the Examples section still shows a grid_params= argument (which no longer exists). Updating the type hints/docstring to match the actual API will prevent confusion for callers.
    src/astroviper/task/imaging/add_visibility_grid.py:145
  • Avoid print() in library/task code (it will spam worker logs and bypass the project logger). Use logger.debug(...) (or remove entirely) so output is controllable by log level and handler configuration.
    src/astroviper/task/imaging/add_visibility_grid.py:230
  • This data-group assignment is duplicated twice in a row; the second block is redundant and can be removed to avoid confusion.
    src/astroviper/task/imaging/calculate_imaging_weights.py:12
  • There are a few unused/duplicate imports at the top (from tracemalloc import start is unused; numpy and xarray are imported multiple times; check_grid_params is imported but not used). Cleaning these up will reduce lint noise and speed module import slightly on workers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +408 to +414
3 * n_pixels_single_frequency * bytes_in_dtype["complex64"] / (1024**3)
+ 3 * n_pixels_single_frequency * bytes_in_dtype["float32"] / (1024**3)
)
else:
memory_singleton_chunk = fudge_factor * (
3 * n_pixels_single_frequency * bytes_in_dtype["complex128"] / (1024**3)
+ 3 * n_pixels_single_frequency * bytes_in_dtype["float64"] / (1024**3)
from graphviper.graph_tools import generate_dask_workflow, generate_airflow_workflow
from graphviper.graph_tools import map, reduce
from astroviper.distributed.imaging.utils import make_image_mosaic
from astroviper.task.imaging import make_image_mosaic
Comment on lines 11 to 15
freq_chan: np.ndarray,
grid_parms: dict,
# grid_parms: dict,
n_uv: list,
delta_lm: list,
):
Comment on lines +408 to +414
3 * n_pixels_single_frequency * bytes_in_dtype["complex64"] / (1024**3)
+ 3 * n_pixels_single_frequency * bytes_in_dtype["float32"] / (1024**3)
)
else:
memory_singleton_chunk = fudge_factor * (
3 * n_pixels_single_frequency * bytes_in_dtype["complex128"] / (1024**3)
+ 3 * n_pixels_single_frequency * bytes_in_dtype["float64"] / (1024**3)
@@ -0,0 +1,16 @@
# src/astroviper/__init__.py
Comment on lines +150 to +152
fill_value=np.nan,
)

Comment on lines +127 to +131
pass
if data_variable_definitions == "imaging" and double_precision:
data_variable_definitions = imaging_data_variables_and_dims_double_precision
elif data_variable_definitions == "imaging" and not double_precision:
data_variable_definitions = imaging_data_variables_and_dims_single_precision
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Jan-Willem
❌ Jan-Willem Steeb


Jan-Willem Steeb seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Dask parallelism for cube single field imaging