dask_regrid provides 2D spatial regridding utilities for xarray data (regrid_2d.py) and supporting scripts for:
- generating synthetic/Zarr datasets,
- benchmarking serial vs parallel execution with Dask,
- validating backend differences (
xarrayvsxesmf), - running scientific-correctness tests for radio-astro-like image data.
Current goals in active workflows:
- keep numerical correctness constraints stable (tests in
tests/), - keep parallel performance behavior understood and reproducible (benchmark scripts),
- support both
Jy/pixelandJy/beamsemantics with explicit test expectations.
Assume Python >=3.9.
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e .
python -m pip install -e .[dev]If you need XRADIO fixture generation/plotting utilities:
python -m pip install toolviper astropy s3fs casaconfig casatools matplotlibIf Zarr writes stall or fail with v3, prefer v2 runtime:
python -m pip install "zarr<3"Run all default correctness tests (xarray path; xesmf tests are gated and skip by default):
python -m pytest -q testsRun selected modules:
python -m pytest -q \
tests/test_point_source_jy_per_beam_correctness.py \
tests/test_point_source_jy_per_pixel_correctness.py \
tests/test_extended_gaussian_jy_per_pix_correctness.py \
tests/test_extended_gaussian_jy_per_beam_correctness.pyRun opt-in xesmf tests:
RUN_XESMF_TESTS=1 python -m pytest -q tests -m xesmfExpected pattern:
- default run: xarray tests pass, xesmf-marked tests skip
RUN_XESMF_TESTS=1: xesmf-marked tests execute (requires working ESMF/xesmf runtime)
Project benchmark script:
python benchmark_regrid.py --helpRecent workflow for large XRADIO-backed benchmark image generation:
python tests/util/generate_xradio_test_images.py \
--output-dir /tmp/xradio_perf_big \
--n-l 1024 --n-m 1024 --n-chan 256 --n-pol 1 --n-time 1 \
--cases extended_gaussian_jy_per_pixel \
--overwriteDirect serial-vs-parallel timing pattern used:
python - <<'PY'
import time, numpy as np, xarray as xr, dask
from regrid_2d import regrid_2d_planes
da = xr.open_zarr('/tmp/xradio_perf_big/extended_gaussian_jy_per_pixel.zarr')['SKY'].isel(time=0, polarization=0).chunk({'frequency':1,'l':1024,'m':1024})
new_l = np.linspace(float(da['l'].min()), float(da['l'].max()), 640)
new_m = np.linspace(float(da['m'].min()), float(da['m'].max()), 640)
def run(scheduler, workers=None):
obj = regrid_2d_planes(da, 'l', 'm', new_l, new_m, regridder_name='xarray', method='linear')
cfg={'scheduler':scheduler}
if workers is not None: cfg['num_workers']=workers
t0=time.perf_counter()
with dask.config.set(cfg): obj.compute()
return time.perf_counter()-t0
print('sync', run('sync'))
print('threads8', run('threads', 8))
PYRecent observed outcome (reference only): on large 1024x1024 images, threads(8) outperformed sync; speedups increased when frequency planes increased (256 -> 512).
No repository-standard lint/format/typecheck tooling is currently configured in root config files.
Assumption for contributors:
- do not introduce new mandatory tooling unless requested,
- keep style consistent with existing code and pass tests.
regrid_2d.py: core regridding API (xarrayandxesmfbackends)run_correctness_checks.py: CLI correctness metrics/provenance reportbenchmark_regrid.py: scheduler/backend benchmark CLIvalidate_regridders.py: backend output comparison utilitygenerate_zarr_data.py: synthetic large-array generatortests/: scientific correctness teststests/util/generate_xradio_test_images.py: XRADIO fixture generatortests/util/plot_xradio_image.py: quick visualizer for Zarr image slicesregridding_session_summary.md: running session log with benchmark/correctness notes
- Prefer explicit, typed function signatures where practical.
- Keep behavior deterministic in tests (fixed fixture paths, explicit thresholds).
- For backend-specific behavior:
- use public function params (
dim_a,dim_b), - avoid hardcoded spatial names at call sites.
- use public function params (
- Add actionable assertion messages in tests.
- Keep comments concise and focused on scientific/algorithmic intent.
Error handling expectations:
- fail fast on missing required metadata (for example beam metadata in
Jy/beamworkflows), - skip tests explicitly when optional runtime dependencies are not enabled.
- The default performance baseline is
syncvsthreadson Dask-backed arrays. - Parallelism scales with task count (notably frequency-plane chunking).
- For large XRADIO-like workloads tested recently:
threads(8)was near-optimal,threads(16)could regress vsthreads(8).
- Avoid changes that significantly reduce observed thread speedup on large workloads without justification.
When changing compute paths, include:
- timings for
syncand at leastthreads(2/4/8), - shape/chunk details,
- backend and method used.
Based on current tests:
- Point source
Jy/beam:- identity: preserve peak and sum to tight tolerance,
- resample: non-negativity and centroid bounds,
- round-trip: bounded peak degradation and centroid drift.
- Point source
Jy/pixel:- identity: preserve area-weighted integrated flux,
- resample/round-trip: bounded integrated flux ratio and centroid drift.
- Extended Gaussian (
Jy/pixel):- strong integrated-flux stability expectations (
~0.2%rel in current tests), - bounded peak attenuation and RMS residual.
- strong integrated-flux stability expectations (
- Extended Gaussian (
Jy/beam):- emphasize peak/centroid/RMS behavior and beam metadata presence.
Any tolerance changes require rationale in PR notes.
Include in PR description:
- What changed and why.
- Exact commands run.
- Relevant output snippets (pass/fail + benchmark numbers).
- Any changed thresholds and justification.
Minimum evidence template:
Tests:
- python -m pytest -q tests/...
- Result: <summary>
Benchmarks (if compute path changed):
- command: <exact>
- shape/chunks: <exact>
- sync: <time>
- threads(2/4/8): <times>
- Do keep imports pointing to
regrid_2d.py(not legacy names). - Do run module-specific tests before committing touched areas.
- Do keep xesmf tests opt-in via
RUN_XESMF_TESTS=1. - Do preserve existing test semantics for
Jy/pixelvsJy/beam.
- Don’t commit generated artifacts or local datasets by default.
- Specifically avoid committing local fixture/output directories such as
xradio_test_images/.
- Specifically avoid committing local fixture/output directories such as
- Don’t commit external cloned repos used only for local exploration (for example local
xradio/clone). - Don’t hardcode personal paths, tokens, or environment-specific secrets.
- Don’t silently relax correctness tolerances without documenting why.
- Root repository currently has no authoritative
README.mddespitepyproject.tomlreferencing one. - No root CI workflow or Makefile is currently used as the canonical project runner.
xesmfexecution may require less-restricted runtime permissions depending on environment (MPI/UCX behavior).- Commands above are intended to run from repository root on a clean checkout with dependencies installed.