Skip to content

memory cost of deleting "outside" chunks on resize is unbounded #3650

@dcherian

Description

@dcherian

Zarr version

3.1.5

Description

if delete_outside_chunks:
# Remove all chunks outside of the new shape
old_chunk_coords = set(self.metadata.chunk_grid.all_chunk_coords(self.metadata.shape))
new_chunk_coords = set(self.metadata.chunk_grid.all_chunk_coords(new_shape))

def all_chunk_coords(self, array_shape: tuple[int, ...]) -> Iterator[tuple[int, ...]]:
return itertools.product(
*(range(ceildiv(s, c)) for s, c in zip(array_shape, self.chunk_shape, strict=False))
)

For very large arrays (even if unpopulated), the set in resize (which has delete_outside_chunks=True by default) will take down a machine :/ .

  1. I propose delete_outside_chunks=False by default. This is a "maintenance op" and has unbounded cost with high latency stores. It should not be called in a "normal" workflow.
  2. delete_outside_chunks can be a LOT smarter. We know new shape and old shape; so we can construct "outside" chunk coords directly from the diff of the shapes. In particular, simply extending the size of a dimension requires no deletion (!!!).

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
import numpy as np

store = zarr.storage.MemoryStore()

# Create the array with specified shape and chunks
arr = zarr.create(
    shape=(143, 7, 668160, 1336320),
    chunks=(1, 1, 928, 928),
    dtype=np.float32,
    store=store,
    overwrite=True
)
arr.resize((144, *arr.shape[1:]))

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions