Msv2 dask lazy read by r-xue · Pull Request #565 · casangi/xradio

r-xue · 2026-04-03T17:04:01Z

While Zarr-backed lazy reading after paying a one-time conversion cost is certainly the right performance choice, there might still be scientific use cases where directly opening an MSv2 lazily — without converting — is preferable -- even it might be slightly sluggish:

Interactive quick inspection
Simple numer crunching in memory-constrained environments
Streaming MSv2 to Zarr with on-the-fly manipulation (e.g. cal-application)
...

This is a draft proposal and still a work in progress. And certain ideas are borrowed from xarray-ms. I vaguely remember there was earlier support, but it was later dropped? If this doesn't fit the library design, I could also spin it off to a separate repo.

On the other hand, I am not sure about the plan or what has been done on profiling and adding the arcea backend. Some elements from that proposal could be relevant here.

…py` to include it in the API

Add `open_msv2()` and supporting infrastructure for lazily opening a CASA MSv2 as an MSv4-schema `xr.DataTree` backed by dask arrays, without triggering a full msv2→msv4 conversion. - `open_msv2.py`: new entry point; builds partitions lazily via `_build_partition_lazy()`; adds a TTL-based partition cache to avoid re-scanning the MS on repeated calls; hardens OBSERVATION subtable and FIELD/SOURCE table reads with graceful fallbacks. - `read.py`: add `read_col_conversion_dask_sparse()` and its per-chunk helper `_load_col_chunk_sparse()` for MSv2 files where not every time step contains every baseline (antenna dropouts, flagged-row removal, etc.). Uses `np.bincount` + cumulative row offsets to correctly map rows to `(time, baseline)` slots without assuming a constant stride. Thread-safety: move `SerializableLock` import to module level and pass the lock through to `load_col_chunk`. - `conversion.py`: extend `get_read_col_conversion_function()` to dispatch to the new sparse reader when `parallel_mode="sparse"`; update docstring to reflect three modes (`none`, `time`, `sparse`). - `_msv2_backend.py`: new xarray `BackendEntrypoint` registering the `xradio:msv2` engine, so `xr.open_datatree(path, engine="xradio:msv2")` works out of the box.

codecov · 2026-04-03T17:13:46Z

Codecov Report

❌ Patch coverage is 14.28571% with 216 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/xradio/measurement_set/open_msv2.py	14.72%	139 Missing ⚠️
...radio/measurement_set/_utils/_msv2/_tables/read.py	7.57%	61 Missing ⚠️
src/xradio/measurement_set/_msv2_backend.py	0.00%	15 Missing ⚠️
.../xradio/measurement_set/_utils/_msv2/conversion.py	85.71%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

r-xue added 2 commits April 3, 2026 00:26

Add the initial implementation of open_msv2() and update `__init__.…

6398526

…py` to include it in the API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Msv2 dask lazy read#565

Msv2 dask lazy read#565
r-xue wants to merge 2 commits intocasangi:mainfrom
r-xue:msv2-dask-lazy-read

r-xue commented Apr 3, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r-xue commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

r-xue commented Apr 3, 2026 •

edited

Loading