-
-
Notifications
You must be signed in to change notification settings - Fork 378
Open
Labels
bugPotential issues with the zarr-python libraryPotential issues with the zarr-python library
Description
Zarr version
3.1.5
Description
zarr-python/src/zarr/core/array.py
Lines 1856 to 1859 in 1c05c1a
| if delete_outside_chunks: | |
| # Remove all chunks outside of the new shape | |
| old_chunk_coords = set(self.metadata.chunk_grid.all_chunk_coords(self.metadata.shape)) | |
| new_chunk_coords = set(self.metadata.chunk_grid.all_chunk_coords(new_shape)) |
zarr-python/src/zarr/core/chunk_grids.py
Lines 195 to 198 in 1c05c1a
| def all_chunk_coords(self, array_shape: tuple[int, ...]) -> Iterator[tuple[int, ...]]: | |
| return itertools.product( | |
| *(range(ceildiv(s, c)) for s, c in zip(array_shape, self.chunk_shape, strict=False)) | |
| ) |
For very large arrays (even if unpopulated), the set in resize (which has delete_outside_chunks=True by default) will take down a machine :/ .
- I propose
delete_outside_chunks=Falseby default. This is a "maintenance op" and has unbounded cost with high latency stores. It should not be called in a "normal" workflow. delete_outside_chunkscan be a LOT smarter. We know new shape and old shape; so we can construct "outside" chunk coords directly from the diff of the shapes. In particular, simply extending the size of a dimension requires no deletion (!!!).
Steps to reproduce
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
import numpy as np
store = zarr.storage.MemoryStore()
# Create the array with specified shape and chunks
arr = zarr.create(
shape=(143, 7, 668160, 1336320),
chunks=(1, 1, 928, 928),
dtype=np.float32,
store=store,
overwrite=True
)
arr.resize((144, *arr.shape[1:]))Additional output
No response
Metadata
Metadata
Assignees
Labels
bugPotential issues with the zarr-python libraryPotential issues with the zarr-python library