Skip to content

Align chunking to tile manager size by default when opening casacore images? #567

@r-xue

Description

@r-xue

There may be a small benefit to aligning dask array chunk boundaries with the casacore image tile size (TiledStMan) by default when lazily opening a casacore image.

The thinking here is that each worker must read all casacore table tiles that overlap its dask chunk region, which means multiple workers may redundantly read the same tiles, introducing slight I/O overhead. Although this overhead can often be absorbed by the OS page cache, using tile-aligned chunks as a default — rather than the current single unchunked array - is still worth exploring - relying on the page cache introduces memory pressure that could cause issues elsewhere.

I guess that the effect is only significant for very large images (e.g., those that exceed the OS page cache capacity), high worker counts, or slow storage scenarios (network storage or spinning disks). If the user explicitly chooses chunks much larger than the tile size, alignment likely also doesn't matter. In addition, the potential benefit is mostly on initial I/O when the cache is cold.

Nevertheless, this is a relevant consideration since we may need to handle ~TB-scale cubes...

Metadata

Metadata

Assignees

No one assigned

    Labels

    imageImage related issueoptimisationThe computation time has been decreasedquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions